刘悦的技术分享

笔精墨妙,妙手丹青,微软开源可视化版本的ChatGPT:Visual ChatGPT,人工智能AI聊天发图片,Python3.10实现

说时迟那时快，微软第一时间发布开源库Visual ChatGPT，把 ChatGPT 的人工智能AI能力和Stable Diffusion以及ControlNet进行了整合。常常被互联网人挂在嘴边的“赋能”一词，几乎已经变成了笑话，但这回，微软玩了一次真真正正的AI“赋能”，彻底打通了人工智能“闭环”。

配置Visual ChatGPT环境

老规矩，运行Git命令拉取Visual ChatGPT项目：

git clone https://github.com/microsoft/visual-chatgpt.git

进入项目目录：

cd visual-chatgpt

确保本机的Python版本不低于Python3.10.9

随后安装依赖文件：

pip3 install -r requirement.txt

这里有几个问题，一个是官方的Pytorch版本不是最新的，这里推荐1.13.1：

pip3 install torch==1.13.1

另外langchain的版本也推荐最新的107版本。

pip3 install langchain==0.0.107

安装好依赖之后，官方要求运行项目中的download.sh文件：

bash download.sh

这个shell脚本主要就是构建子项目ControlNet，同时下载所有的ControlNet模型，如果之前已经下载过相关模型，直接将模型文件拷贝到项目目录即可：

.  
├── cldm_v15.yaml  
├── cldm_v21.yaml  
├── control_sd15_canny.pth  
├── control_sd15_depth.pth  
├── control_sd15_hed.pth  
├── control_sd15_mlsd.pth  
├── control_sd15_normal.pth  
├── control_sd15_openpose.pth  
├── control_sd15_scribble.pth  
└── control_sd15_seg.pth

关于ControlNet，请移玉步至：登峰造极,师出造化,Pytorch人工智能AI图像增强框架ControlNet绘画实践,基于Python3.10，这里不再赘述。

接着配置Openai的环境变量：

export OPENAI_API_KEY={你的openaik key}

如果是Windows用户，遵循下列步骤，配置好OPENAI_API_KEY：

打开“控制面板”，然后选择“系统和安全”。  
选择“系统”，然后点击“高级系统设置”。  
在“高级”选项卡下，点击“环境变量”。  
在“用户变量”或“系统变量”下，选择要配置的变量，然后点击“编辑”。  
在“变量值”字段中，输入要配置的值。  
点击“确定”保存更改。

至此，大体上环境就配置好了。

Visual ChatGPT部分代码修改：

和ControlNet一样，Visual ChatGPT将运行方式写死为cuda，这对于不支持cuda模式的电脑不太友好，比如苹果M系列芯片的Mac系统，如果我们直接运行程序：

python3 visual_chatgpt.py

就会报这个错误：

AssertionError: Torch not compiled with CUDA enabled

这里需要将visual-chatgpt.py文件中写死的cuda模式改写为mps模式：

print("Initializing VisualChatGPT")  
self.llm = OpenAI(temperature=0)  
self.edit = ImageEditing(device="mps")  
self.i2t = ImageCaptioning(device="mps")  
self.t2i = T2I(device="mps")

关于MPS模式，请参照：闻其声而知雅意,M1 Mac基于PyTorch(mps/cpu/cuda)的人工智能AI本地语音识别库Whisper(Python3.10) ，这里不再赘述。

接着创建训练图片的文件夹：

mkdir image

随后还可能触发langchain库的内存溢出问题，需要将这行代码屏蔽：

# self.agent.memory.buffer = cut_dialogue_history(self.agent.memory.buffer, keep_last_n_words=500)

接着将内存缓冲区替换为保存上下文逻辑：

self.agent.memory.buffer = self.agent.memory.buffer + Human_prompt + 'AI: ' + AI_prompt  
self.agent.memory.save_context({"input": Human_prompt}, {"output": AI_prompt})

当我们以为万事俱备只欠东风的时候，发现每次运行都会内存溢出，对此，官方给出了解释：

Here we list the GPU memory usage of each visual foundation model, one can modify self.tools with fewer visual foundation models to save your GPU memory:  
  
Foundation Model	Memory Usage (MB)  
ImageEditing	6667  
ImageCaption	1755  
T2I	6677  
canny2image	5540  
line2image	6679  
hed2image	6679  
scribble2image	6679  
pose2image	6681  
BLIPVQA	2709  
seg2image	5540  
depth2image	6677  
normal2image	3974  
InstructPix2Pix	2795

这就是加载了所有模型之后的显存占用，整整70个G的显存占用，这是给人玩的吗？人们不禁要问。

没办法，只能另辟蹊径，将非必要的模型加载代码进行屏蔽操作，一顿修改，修改后的完整代码：

import sys  
import os  
sys.path.append(os.path.dirname(os.path.realpath(__file__)))  
sys.path.append(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))  
import gradio as gr  
from transformers import AutoModelForCausalLM, AutoTokenizer, CLIPSegProcessor, CLIPSegForImageSegmentation  
import torch  
from diffusers import StableDiffusionPipeline  
from diffusers import StableDiffusionInstructPix2PixPipeline, EulerAncestralDiscreteScheduler  
import os  
from langchain.agents.initialize import initialize_agent  
from langchain.agents.tools import Tool  
from langchain.chains.conversation.memory import ConversationBufferMemory  
from langchain.llms.openai import OpenAI  
import re  
import uuid  
from diffusers import StableDiffusionInpaintPipeline  
from PIL import Image  
import numpy as np  
from omegaconf import OmegaConf  
from transformers import pipeline, BlipProcessor, BlipForConditionalGeneration, BlipForQuestionAnswering  
import cv2  
import einops  
from pytorch_lightning import seed_everything  
import random  
from ldm.util import instantiate_from_config  
from ControlNet.cldm.model import create_model, load_state_dict  
from ControlNet.cldm.ddim_hacked import DDIMSampler  
from ControlNet.annotator.canny import CannyDetector  
from ControlNet.annotator.mlsd import MLSDdetector  
from ControlNet.annotator.util import HWC3, resize_image  
from ControlNet.annotator.hed import HEDdetector, nms  
from ControlNet.annotator.openpose import OpenposeDetector  
from ControlNet.annotator.uniformer import UniformerDetector  
from ControlNet.annotator.midas import MidasDetector  
  
VISUAL_CHATGPT_PREFIX = """Visual ChatGPT is designed to be able to assist with a wide range of text and visual related tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. Visual ChatGPT is able to generate human-like text based on the input it receives, allowing it to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.  
  
Visual ChatGPT is able to process and understand large amounts of text and images. As a language model, Visual ChatGPT can not directly read images, but it has a list of tools to finish different visual tasks. Each image will have a file name formed as "image/xxx.png", and Visual ChatGPT can invoke different tools to indirectly understand pictures. When talking about images, Visual ChatGPT is very strict to the file name and will never fabricate nonexistent files. When using tools to generate new image files, Visual ChatGPT is also known that the image may not be the same as the user's demand, and will use other visual question answering tools or description tools to observe the real image. Visual ChatGPT is able to use tools in a sequence, and is loyal to the tool observation outputs rather than faking the image content and image file name. It will remember to provide the file name from the last tool observation, if a new image is generated.  
  
Human may provide new figures to Visual ChatGPT with a description. The description helps Visual ChatGPT to understand this image, but Visual ChatGPT should use tools to finish following tasks, rather than directly imagine from the description.  
  
Overall, Visual ChatGPT is a powerful visual dialogue assistant tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics.   
  
  
TOOLS:  
------  
  
Visual ChatGPT  has access to the following tools:"""  
  
VISUAL_CHATGPT_FORMAT_INSTRUCTIONS = """To use a tool, please use the following format:

Thought: Do I need to use a tool? Yes
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action

  
When you have a response to say to the Human, or if you do not need to use a tool, you MUST use the format:

Thought: Do I need to use a tool? No
{ai_prefix}: [your response here]

"""  
  
VISUAL_CHATGPT_SUFFIX = """You are very strict to the filename correctness and will never fake a file name if it does not exist.  
You will remember to provide the image file name loyally if it's provided in the last tool observation.  
  
Begin!  
  
Previous conversation history:  
{chat_history}  
  
New input: {input}  
Since Visual ChatGPT is a text language model, Visual ChatGPT must use tools to observe images rather than imagination.  
The thoughts and observations are only visible for Visual ChatGPT, Visual ChatGPT should remember to repeat important information in the final response for Human.   
Thought: Do I need to use a tool? {agent_scratchpad}"""  
  
def cut_dialogue_history(history_memory, keep_last_n_words=500):  
    tokens = history_memory.split()  
    n_tokens = len(tokens)  
    print(f"hitory_memory:{history_memory}, n_tokens: {n_tokens}")  
    if n_tokens < keep_last_n_words:  
        return history_memory  
    else:  
        paragraphs = history_memory.split('\n')  
        last_n_tokens = n_tokens  
        while last_n_tokens >= keep_last_n_words:  
            last_n_tokens = last_n_tokens - len(paragraphs[0].split(' '))  
            paragraphs = paragraphs[1:]  
        return '\n' + '\n'.join(paragraphs)  
  
def get_new_image_name(org_img_name, func_name="update"):  
    head_tail = os.path.split(org_img_name)  
    head = head_tail[0]  
    tail = head_tail[1]  
    name_split = tail.split('.')[0].split('_')  
    this_new_uuid = str(uuid.uuid4())[0:4]  
    if len(name_split) == 1:  
        most_org_file_name = name_split[0]  
        recent_prev_file_name = name_split[0]  
        new_file_name = '{}_{}_{}_{}.png'.format(this_new_uuid, func_name, recent_prev_file_name, most_org_file_name)  
    else:  
        assert len(name_split) == 4  
        most_org_file_name = name_split[3]  
        recent_prev_file_name = name_split[0]  
        new_file_name = '{}_{}_{}_{}.png'.format(this_new_uuid, func_name, recent_prev_file_name, most_org_file_name)  
    return os.path.join(head, new_file_name)  
  
def create_model(config_path, device):  
    config = OmegaConf.load(config_path)  
    OmegaConf.update(config, "model.params.cond_stage_config.params.device", device)  
    model = instantiate_from_config(config.model).to('mps')  
    print(f'Loaded model config from [{config_path}]')  
    return model  
  
class MaskFormer:  
    def __init__(self, device):  
        self.device = device  
        self.processor = CLIPSegProcessor.from_pretrained("CIDAS/clipseg-rd64-refined")  
        self.model = CLIPSegForImageSegmentation.from_pretrained("CIDAS/clipseg-rd64-refined").to(device)  
  
    def inference(self, image_path, text):  
        threshold = 0.5  
        min_area = 0.02  
        padding = 20  
        original_image = Image.open(image_path)  
        image = original_image.resize((512, 512))  
        inputs = self.processor(text=text, images=image, padding="max_length", return_tensors="pt",).to(self.device)  
        with torch.no_grad():  
            outputs = self.model(**inputs)  
        mask = torch.sigmoid(outputs[0]).squeeze().cuda().numpy() > threshold  
        area_ratio = len(np.argwhere(mask)) / (mask.shape[0] * mask.shape[1])  
        if area_ratio < min_area:  
            return None  
        true_indices = np.argwhere(mask)  
        mask_array = np.zeros_like(mask, dtype=bool)  
        for idx in true_indices:  
            padded_slice = tuple(slice(max(0, i - padding), i + padding + 1) for i in idx)  
            mask_array[padded_slice] = True  
        visual_mask = (mask_array * 255).astype(np.uint8)  
        image_mask = Image.fromarray(visual_mask)  
        return image_mask.resize(image.size)  
  
class ImageEditing:  
    def __init__(self, device):  
        print("Initializing StableDiffusionInpaint to %s" % device)  
        self.device = device  
        self.mask_former = MaskFormer(device=self.device)  
        self.inpainting = StableDiffusionInpaintPipeline.from_pretrained("runwayml/stable-diffusion-inpainting",).to(device)  
  
    def remove_part_of_image(self, input):  
        image_path, to_be_removed_txt = input.split(",")  
        print(f'remove_part_of_image: to_be_removed {to_be_removed_txt}')  
        return self.replace_part_of_image(f"{image_path},{to_be_removed_txt},background")  
  
    def replace_part_of_image(self, input):  
        image_path, to_be_replaced_txt, replace_with_txt = input.split(",")  
        print(f'replace_part_of_image: replace_with_txt {replace_with_txt}')  
        original_image = Image.open(image_path)  
        mask_image = self.mask_former.inference(image_path, to_be_replaced_txt)  
        updated_image = self.inpainting(prompt=replace_with_txt, image=original_image, mask_image=mask_image).images[0]  
        updated_image_path = get_new_image_name(image_path, func_name="replace-something")  
        updated_image.save(updated_image_path)  
        return updated_image_path  
  
class Pix2Pix:  
    def __init__(self, device):  
        print("Initializing Pix2Pix to %s" % device)  
        self.device = device  
        self.pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained("timbrooks/instruct-pix2pix", torch_dtype=torch.float16, safety_checker=None).to(device)  
        self.pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(self.pipe.scheduler.config)  
  
    def inference(self, inputs):  
        """Change style of image."""  
        print("===>Starting Pix2Pix Inference")  
        image_path, instruct_text = inputs.split(",")[0], ','.join(inputs.split(',')[1:])  
        original_image = Image.open(image_path)  
        image = self.pipe(instruct_text,image=original_image,num_inference_steps=40,image_guidance_scale=1.2,).images[0]  
        updated_image_path = get_new_image_name(image_path, func_name="pix2pix")  
        image.save(updated_image_path)  
        return updated_image_path  
  
class T2I:  
    def __init__(self, device):  
        print("Initializing T2I to %s" % device)  
        self.device = device  
        self.pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)  
        self.text_refine_tokenizer = AutoTokenizer.from_pretrained("Gustavosta/MagicPrompt-Stable-Diffusion")  
        self.text_refine_model = AutoModelForCausalLM.from_pretrained("Gustavosta/MagicPrompt-Stable-Diffusion")  
        self.text_refine_gpt2_pipe = pipeline("text-generation", model=self.text_refine_model, tokenizer=self.text_refine_tokenizer, device=self.device)  
        self.pipe.to(device)  
  
    def inference(self, text):  
        image_filename = os.path.join('image', str(uuid.uuid4())[0:8] + ".png")  
        refined_text = self.text_refine_gpt2_pipe(text)[0]["generated_text"]  
        print(f'{text} refined to {refined_text}')  
        image = self.pipe(refined_text).images[0]  
        image.save(image_filename)  
        print(f"Processed T2I.run, text: {text}, image_filename: {image_filename}")  
        return image_filename  
  
class ImageCaptioning:  
    def __init__(self, device):  
        print("Initializing ImageCaptioning to %s" % device)  
        self.device = device  
        self.processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")  
        self.model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base").to(self.device)  
  
    def inference(self, image_path):  
        inputs = self.processor(Image.open(image_path), return_tensors="pt").to(self.device)  
        out = self.model.generate(**inputs)  
        captions = self.processor.decode(out[0], skip_special_tokens=True)  
        return captions  
  
class image2canny:  
    def __init__(self):  
        print("Direct detect canny.")  
        self.detector = CannyDetector()  
        self.low_thresh = 100  
        self.high_thresh = 200  
  
    def inference(self, inputs):  
        print("===>Starting image2canny Inference")  
        image = Image.open(inputs)  
        image = np.array(image)  
        canny = self.detector(image, self.low_thresh, self.high_thresh)  
        canny = 255 - canny  
        image = Image.fromarray(canny)  
        updated_image_path = get_new_image_name(inputs, func_name="edge")  
        image.save(updated_image_path)  
        return updated_image_path  
  
class canny2image:  
    def __init__(self, device):  
        print("Initialize the canny2image model.")  
        model = create_model('ControlNet/models/cldm_v15.yaml', device=device).to(device)  
        model.load_state_dict(load_state_dict('ControlNet/models/control_sd15_canny.pth', location='mps'))  
        self.model = model.to(device)  
        self.device = device  
        self.ddim_sampler = DDIMSampler(self.model)  
        self.ddim_steps = 20  
        self.image_resolution = 512  
        self.num_samples = 1  
        self.save_memory = False  
        self.strength = 1.0  
        self.guess_mode = False  
        self.scale = 9.0  
        self.seed = -1  
        self.a_prompt = 'best quality, extremely detailed'  
        self.n_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality'  
  
    def inference(self, inputs):  
        print("===>Starting canny2image Inference")  
        image_path, instruct_text = inputs.split(",")[0], ','.join(inputs.split(',')[1:])  
        image = Image.open(image_path)  
        image = np.array(image)  
        image = 255 - image  
        prompt = instruct_text  
        img = resize_image(HWC3(image), self.image_resolution)  
        H, W, C = img.shape  
        control = torch.from_numpy(img.copy()).float().to(device=self.device) / 255.0  
        control = torch.stack([control for _ in range(self.num_samples)], dim=0)  
        control = einops.rearrange(control, 'b h w c -> b c h w').clone()  
        self.seed = random.randint(0, 65535)  
        seed_everything(self.seed)  
        if self.save_memory:  
            self.model.low_vram_shift(is_diffusing=False)  
        cond = {"c_concat": [control], "c_crossattn": [self.model.get_learned_conditioning([prompt + ', ' + self.a_prompt] * self.num_samples)]}  
        un_cond = {"c_concat": None if self.guess_mode else [control], "c_crossattn": [self.model.get_learned_conditioning([self.n_prompt] * self.num_samples)]}  
        shape = (4, H // 8, W // 8)  
        self.model.control_scales = [self.strength * (0.825 ** float(12 - i)) for i in range(13)] if self.guess_mode else ([self.strength] * 13)  # Magic number. IDK why. Perhaps because 0.825**12<0.01 but 0.826**12>0.01  
        samples, intermediates = self.ddim_sampler.sample(self.ddim_steps, self.num_samples, shape, cond, verbose=False, eta=0., unconditional_guidance_scale=self.scale, unconditional_conditioning=un_cond)  
        if self.save_memory:  
            self.model.low_vram_shift(is_diffusing=False)  
        x_samples = self.model.decode_first_stage(samples)  
        x_samples = (einops.rearrange(x_samples, 'b c h w -> b h w c') * 127.5 + 127.5).cuda().numpy().clip(0, 255).astype(np.uint8)  
        updated_image_path = get_new_image_name(image_path, func_name="canny2image")  
        real_image = Image.fromarray(x_samples[0])  # get default the index0 image  
        real_image.save(updated_image_path)  
        return updated_image_path  
  
class image2line:  
    def __init__(self):  
        print("Direct detect straight line...")  
        self.detector = MLSDdetector()  
        self.value_thresh = 0.1  
        self.dis_thresh = 0.1  
        self.resolution = 512  
  
    def inference(self, inputs):  
        print("===>Starting image2hough Inference")  
        image = Image.open(inputs)  
        image = np.array(image)  
        image = HWC3(image)  
        hough = self.detector(resize_image(image, self.resolution), self.value_thresh, self.dis_thresh)  
        updated_image_path = get_new_image_name(inputs, func_name="line-of")  
        hough = 255 - cv2.dilate(hough, np.ones(shape=(3, 3), dtype=np.uint8), iterations=1)  
        image = Image.fromarray(hough)  
        image.save(updated_image_path)  
        return updated_image_path  
  
  
class line2image:  
    def __init__(self, device):  
        print("Initialize the line2image model...")  
        model = create_model('ControlNet/models/cldm_v15.yaml', device=device).to(device)  
        model.load_state_dict(load_state_dict('ControlNet/models/control_sd15_mlsd.pth', location='mps'))  
        self.model = model.to(device)  
        self.device = device  
        self.ddim_sampler = DDIMSampler(self.model)  
        self.ddim_steps = 20  
        self.image_resolution = 512  
        self.num_samples = 1  
        self.save_memory = False  
        self.strength = 1.0  
        self.guess_mode = False  
        self.scale = 9.0  
        self.seed = -1  
        self.a_prompt = 'best quality, extremely detailed'  
        self.n_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality'  
  
    def inference(self, inputs):  
        print("===>Starting line2image Inference")  
        image_path, instruct_text = inputs.split(",")[0], ','.join(inputs.split(',')[1:])  
        image = Image.open(image_path)  
        image = np.array(image)  
        image = 255 - image  
        prompt = instruct_text  
        img = resize_image(HWC3(image), self.image_resolution)  
        H, W, C = img.shape  
        img = cv2.resize(img, (W, H), interpolation=cv2.INTER_NEAREST)  
        control = torch.from_numpy(img.copy()).float().to(device=self.device) / 255.0  
        control = torch.stack([control for _ in range(self.num_samples)], dim=0)  
        control = einops.rearrange(control, 'b h w c -> b c h w').clone()  
        self.seed = random.randint(0, 65535)  
        seed_everything(self.seed)  
        if self.save_memory:  
            self.model.low_vram_shift(is_diffusing=False)  
        cond = {"c_concat": [control], "c_crossattn": [self.model.get_learned_conditioning([prompt + ', ' + self.a_prompt] * self.num_samples)]}  
        un_cond = {"c_concat": None if self.guess_mode else [control], "c_crossattn": [self.model.get_learned_conditioning([self.n_prompt] * self.num_samples)]}  
        shape = (4, H // 8, W // 8)  
        self.model.control_scales = [self.strength * (0.825 ** float(12 - i)) for i in range(13)] if self.guess_mode else ([self.strength] * 13)  # Magic number. IDK why. Perhaps because 0.825**12<0.01 but 0.826**12>0.01  
        samples, intermediates = self.ddim_sampler.sample(self.ddim_steps, self.num_samples, shape, cond, verbose=False, eta=0., unconditional_guidance_scale=self.scale, unconditional_conditioning=un_cond)  
        if self.save_memory:  
            self.model.low_vram_shift(is_diffusing=False)  
        x_samples = self.model.decode_first_stage(samples)  
        x_samples = (einops.rearrange(x_samples, 'b c h w -> b h w c') * 127.5 + 127.5).\  
            cuda().numpy().clip(0,255).astype(np.uint8)  
        updated_image_path = get_new_image_name(image_path, func_name="line2image")  
        real_image = Image.fromarray(x_samples[0])  # default the index0 image  
        real_image.save(updated_image_path)  
        return updated_image_path  
  
  
class image2hed:  
    def __init__(self):  
        print("Direct detect soft HED boundary...")  
        self.detector = HEDdetector()  
        self.resolution = 512  
  
    def inference(self, inputs):  
        print("===>Starting image2hed Inference")  
        image = Image.open(inputs)  
        image = np.array(image)  
        image = HWC3(image)  
        hed = self.detector(resize_image(image, self.resolution))  
        updated_image_path = get_new_image_name(inputs, func_name="hed-boundary")  
        image = Image.fromarray(hed)  
        image.save(updated_image_path)  
        return updated_image_path  
  
  
class hed2image:  
    def __init__(self, device):  
        print("Initialize the hed2image model...")  
        model = create_model('ControlNet/models/cldm_v15.yaml', device=device).to(device)  
        model.load_state_dict(load_state_dict('ControlNet/models/control_sd15_hed.pth', location='mps'))  
        self.model = model.to(device)  
        self.device = device  
        self.ddim_sampler = DDIMSampler(self.model)  
        self.ddim_steps = 20  
        self.image_resolution = 512  
        self.num_samples = 1  
        self.save_memory = False  
        self.strength = 1.0  
        self.guess_mode = False  
        self.scale = 9.0  
        self.seed = -1  
        self.a_prompt = 'best quality, extremely detailed'  
        self.n_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality'  
  
    def inference(self, inputs):  
        print("===>Starting hed2image Inference")  
        image_path, instruct_text = inputs.split(",")[0], ','.join(inputs.split(',')[1:])  
        image = Image.open(image_path)  
        image = np.array(image)  
        prompt = instruct_text  
        img = resize_image(HWC3(image), self.image_resolution)  
        H, W, C = img.shape  
        img = cv2.resize(img, (W, H), interpolation=cv2.INTER_NEAREST)  
        control = torch.from_numpy(img.copy()).float().to(device=self.device) / 255.0  
        control = torch.stack([control for _ in range(self.num_samples)], dim=0)  
        control = einops.rearrange(control, 'b h w c -> b c h w').clone()  
        self.seed = random.randint(0, 65535)  
        seed_everything(self.seed)  
        if self.save_memory:  
            self.model.low_vram_shift(is_diffusing=False)  
        cond = {"c_concat": [control], "c_crossattn": [self.model.get_learned_conditioning([prompt + ', ' + self.a_prompt] * self.num_samples)]}  
        un_cond = {"c_concat": None if self.guess_mode else [control], "c_crossattn": [self.model.get_learned_conditioning([self.n_prompt] * self.num_samples)]}  
        shape = (4, H // 8, W // 8)  
        self.model.control_scales = [self.strength * (0.825 ** float(12 - i)) for i in range(13)] if self.guess_mode else ([self.strength] * 13)  
        samples, intermediates = self.ddim_sampler.sample(self.ddim_steps, self.num_samples, shape, cond, verbose=False, eta=0., unconditional_guidance_scale=self.scale, unconditional_conditioning=un_cond)  
        if self.save_memory:  
            self.model.low_vram_shift(is_diffusing=False)  
        x_samples = self.model.decode_first_stage(samples)  
        x_samples = (einops.rearrange(x_samples, 'b c h w -> b h w c') * 127.5 + 127.5).cuda().numpy().clip(0, 255).astype(np.uint8)  
        updated_image_path = get_new_image_name(image_path, func_name="hed2image")  
        real_image = Image.fromarray(x_samples[0])  # default the index0 image  
        real_image.save(updated_image_path)  
        return updated_image_path  
  
class image2scribble:  
    def __init__(self):  
        print("Direct detect scribble.")  
        self.detector = HEDdetector()  
        self.resolution = 512  
  
    def inference(self, inputs):  
        print("===>Starting image2scribble Inference")  
        image = Image.open(inputs)  
        image = np.array(image)  
        image = HWC3(image)  
        detected_map = self.detector(resize_image(image, self.resolution))  
        detected_map = HWC3(detected_map)  
        image = resize_image(image, self.resolution)  
        H, W, C = image.shape  
        detected_map = cv2.resize(detected_map, (W, H), interpolation=cv2.INTER_LINEAR)  
        detected_map = nms(detected_map, 127, 3.0)  
        detected_map = cv2.GaussianBlur(detected_map, (0, 0), 3.0)  
        detected_map[detected_map > 4] = 255  
        detected_map[detected_map < 255] = 0  
        detected_map = 255 - detected_map  
        updated_image_path = get_new_image_name(inputs, func_name="scribble")  
        image = Image.fromarray(detected_map)  
        image.save(updated_image_path)  
        return updated_image_path  
  
class scribble2image:  
    def __init__(self, device):  
        print("Initialize the scribble2image model...")  
        model = create_model('ControlNet/models/cldm_v15.yaml', device=device).to(device)  
        model.load_state_dict(load_state_dict('ControlNet/models/control_sd15_scribble.pth', location='mps'))  
        self.model = model.to(device)  
        self.device = device  
        self.ddim_sampler = DDIMSampler(self.model)  
        self.ddim_steps = 20  
        self.image_resolution = 512  
        self.num_samples = 1  
        self.save_memory = False  
        self.strength = 1.0  
        self.guess_mode = False  
        self.scale = 9.0  
        self.seed = -1  
        self.a_prompt = 'best quality, extremely detailed'  
        self.n_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality'  
  
    def inference(self, inputs):  
        print("===>Starting scribble2image Inference")  
        print(f'sketch device {self.device}')  
        image_path, instruct_text = inputs.split(",")[0], ','.join(inputs.split(',')[1:])  
        image = Image.open(image_path)  
        image = np.array(image)  
        prompt = instruct_text  
        image = 255 - image  
        img = resize_image(HWC3(image), self.image_resolution)  
        H, W, C = img.shape  
        img = cv2.resize(img, (W, H), interpolation=cv2.INTER_NEAREST)  
        control = torch.from_numpy(img.copy()).float().to(device=self.device) / 255.0  
        control = torch.stack([control for _ in range(self.num_samples)], dim=0)  
        control = einops.rearrange(control, 'b h w c -> b c h w').clone()  
        self.seed = random.randint(0, 65535)  
        seed_everything(self.seed)  
        if self.save_memory:  
            self.model.low_vram_shift(is_diffusing=False)  
        cond = {"c_concat": [control], "c_crossattn": [self.model.get_learned_conditioning([prompt + ', ' + self.a_prompt] * self.num_samples)]}  
        un_cond = {"c_concat": None if self.guess_mode else [control], "c_crossattn": [self.model.get_learned_conditioning([self.n_prompt] * self.num_samples)]}  
        shape = (4, H // 8, W // 8)  
        self.model.control_scales = [self.strength * (0.825 ** float(12 - i)) for i in range(13)] if self.guess_mode else ([self.strength] * 13)  
        samples, intermediates = self.ddim_sampler.sample(self.ddim_steps, self.num_samples, shape, cond, verbose=False, eta=0., unconditional_guidance_scale=self.scale, unconditional_conditioning=un_cond)  
        if self.save_memory:  
            self.model.low_vram_shift(is_diffusing=False)  
        x_samples = self.model.decode_first_stage(samples)  
        x_samples = (einops.rearrange(x_samples, 'b c h w -> b h w c') * 127.5 + 127.5).cuda().numpy().clip(0, 255).astype(np.uint8)  
        updated_image_path = get_new_image_name(image_path, func_name="scribble2image")  
        real_image = Image.fromarray(x_samples[0])  # default the index0 image  
        real_image.save(updated_image_path)  
        return updated_image_path  
  
class image2pose:  
    def __init__(self):  
        print("Direct human pose.")  
        self.detector = OpenposeDetector()  
        self.resolution = 512  
  
    def inference(self, inputs):  
        print("===>Starting image2pose Inference")  
        image = Image.open(inputs)  
        image = np.array(image)  
        image = HWC3(image)  
        detected_map, _ = self.detector(resize_image(image, self.resolution))  
        detected_map = HWC3(detected_map)  
        image = resize_image(image, self.resolution)  
        H, W, C = image.shape  
        detected_map = cv2.resize(detected_map, (W, H), interpolation=cv2.INTER_LINEAR)  
        updated_image_path = get_new_image_name(inputs, func_name="human-pose")  
        image = Image.fromarray(detected_map)  
        image.save(updated_image_path)  
        return updated_image_path  
  
class pose2image:  
    def __init__(self, device):  
        print("Initialize the pose2image model...")  
        model = create_model('ControlNet/models/cldm_v15.yaml', device=device).to(device)  
        model.load_state_dict(load_state_dict('ControlNet/models/control_sd15_openpose.pth', location='mps'))  
        self.model = model.to(device)  
        self.device = device  
        self.ddim_sampler = DDIMSampler(self.model)  
        self.ddim_steps = 20  
        self.image_resolution = 512  
        self.num_samples = 1  
        self.save_memory = False  
        self.strength = 1.0  
        self.guess_mode = False  
        self.scale = 9.0  
        self.seed = -1  
        self.a_prompt = 'best quality, extremely detailed'  
        self.n_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality'  
  
    def inference(self, inputs):  
        print("===>Starting pose2image Inference")  
        image_path, instruct_text = inputs.split(",")[0], ','.join(inputs.split(',')[1:])  
        image = Image.open(image_path)  
        image = np.array(image)  
        prompt = instruct_text  
        img = resize_image(HWC3(image), self.image_resolution)  
        H, W, C = img.shape  
        img = cv2.resize(img, (W, H), interpolation=cv2.INTER_NEAREST)  
        control = torch.from_numpy(img.copy()).float().to(device=self.device) / 255.0  
        control = torch.stack([control for _ in range(self.num_samples)], dim=0)  
        control = einops.rearrange(control, 'b h w c -> b c h w').clone()  
        self.seed = random.randint(0, 65535)  
        seed_everything(self.seed)  
        if self.save_memory:  
            self.model.low_vram_shift(is_diffusing=False)  
        cond = {"c_concat": [control], "c_crossattn": [ self.model.get_learned_conditioning([prompt + ', ' + self.a_prompt] * self.num_samples)]}  
        un_cond = {"c_concat": None if self.guess_mode else [control], "c_crossattn": [self.model.get_learned_conditioning([self.n_prompt] * self.num_samples)]}  
        shape = (4, H // 8, W // 8)  
        self.model.control_scales = [self.strength * (0.825 ** float(12 - i)) for i in range(13)] if self.guess_mode else ([self.strength] * 13)  
        samples, intermediates = self.ddim_sampler.sample(self.ddim_steps, self.num_samples, shape, cond, verbose=False, eta=0., unconditional_guidance_scale=self.scale, unconditional_conditioning=un_cond)  
        if self.save_memory:  
            self.model.low_vram_shift(is_diffusing=False)  
        x_samples = self.model.decode_first_stage(samples)  
        x_samples = (einops.rearrange(x_samples, 'b c h w -> b h w c') * 127.5 + 127.5).cuda().numpy().clip(0, 255).astype(np.uint8)  
        updated_image_path = get_new_image_name(image_path, func_name="pose2image")  
        real_image = Image.fromarray(x_samples[0])  # default the index0 image  
        real_image.save(updated_image_path)  
        return updated_image_path  
  
class image2seg:  
    def __init__(self):  
        print("Direct segmentations.")  
        self.detector = UniformerDetector()  
        self.resolution = 512  
  
    def inference(self, inputs):  
        print("===>Starting image2seg Inference")  
        image = Image.open(inputs)  
        image = np.array(image)  
        image = HWC3(image)  
        detected_map = self.detector(resize_image(image, self.resolution))  
        detected_map = HWC3(detected_map)  
        image = resize_image(image, self.resolution)  
        H, W, C = image.shape  
        detected_map = cv2.resize(detected_map, (W, H), interpolation=cv2.INTER_LINEAR)  
        updated_image_path = get_new_image_name(inputs, func_name="segmentation")  
        image = Image.fromarray(detected_map)  
        image.save(updated_image_path)  
        return updated_image_path  
  
class seg2image:  
    def __init__(self, device):  
        print("Initialize the seg2image model...")  
        model = create_model('ControlNet/models/cldm_v15.yaml', device=device).to(device)  
        model.load_state_dict(load_state_dict('ControlNet/models/control_sd15_seg.pth', location='mps'))  
        self.model = model.to(device)  
        self.device = device  
        self.ddim_sampler = DDIMSampler(self.model)  
        self.ddim_steps = 20  
        self.image_resolution = 512  
        self.num_samples = 1  
        self.save_memory = False  
        self.strength = 1.0  
        self.guess_mode = False  
        self.scale = 9.0  
        self.seed = -1  
        self.a_prompt = 'best quality, extremely detailed'  
        self.n_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality'  
  
    def inference(self, inputs):  
        print("===>Starting seg2image Inference")  
        image_path, instruct_text = inputs.split(",")[0], ','.join(inputs.split(',')[1:])  
        image = Image.open(image_path)  
        image = np.array(image)  
        prompt = instruct_text  
        img = resize_image(HWC3(image), self.image_resolution)  
        H, W, C = img.shape  
        img = cv2.resize(img, (W, H), interpolation=cv2.INTER_NEAREST)  
        control = torch.from_numpy(img.copy()).float().to(device=self.device) / 255.0  
        control = torch.stack([control for _ in range(self.num_samples)], dim=0)  
        control = einops.rearrange(control, 'b h w c -> b c h w').clone()  
        self.seed = random.randint(0, 65535)  
        seed_everything(self.seed)  
        if self.save_memory:  
            self.model.low_vram_shift(is_diffusing=False)  
        cond = {"c_concat": [control], "c_crossattn": [self.model.get_learned_conditioning([prompt + ', ' + self.a_prompt] * self.num_samples)]}  
        un_cond = {"c_concat": None if self.guess_mode else [control], "c_crossattn": [self.model.get_learned_conditioning([self.n_prompt] * self.num_samples)]}  
        shape = (4, H // 8, W // 8)  
        self.model.control_scales = [self.strength * (0.825 ** float(12 - i)) for i in range(13)] if self.guess_mode else ([self.strength] * 13)  
        samples, intermediates = self.ddim_sampler.sample(self.ddim_steps, self.num_samples, shape, cond, verbose=False, eta=0., unconditional_guidance_scale=self.scale, unconditional_conditioning=un_cond)  
        if self.save_memory:  
            self.model.low_vram_shift(is_diffusing=False)  
        x_samples = self.model.decode_first_stage(samples)  
        x_samples = (einops.rearrange(x_samples, 'b c h w -> b h w c') * 127.5 + 127.5).cuda().numpy().clip(0, 255).astype(np.uint8)  
        updated_image_path = get_new_image_name(image_path, func_name="segment2image")  
        real_image = Image.fromarray(x_samples[0])  # default the index0 image  
        real_image.save(updated_image_path)  
        return updated_image_path  
  
class image2depth:  
    def __init__(self):  
        print("Direct depth estimation.")  
        self.detector = MidasDetector()  
        self.resolution = 512  
  
    def inference(self, inputs):  
        print("===>Starting image2depth Inference")  
        image = Image.open(inputs)  
        image = np.array(image)  
        image = HWC3(image)  
        detected_map, _ = self.detector(resize_image(image, self.resolution))  
        detected_map = HWC3(detected_map)  
        image = resize_image(image, self.resolution)  
        H, W, C = image.shape  
        detected_map = cv2.resize(detected_map, (W, H), interpolation=cv2.INTER_LINEAR)  
        updated_image_path = get_new_image_name(inputs, func_name="depth")  
        image = Image.fromarray(detected_map)  
        image.save(updated_image_path)  
        return updated_image_path  
  
class depth2image:  
    def __init__(self, device):  
        print("Initialize depth2image model...")  
        model = create_model('ControlNet/models/cldm_v15.yaml', device=device).to(device)  
        model.load_state_dict(load_state_dict('ControlNet/models/control_sd15_depth.pth', location='mps'))  
        self.model = model.to(device)  
        self.device = device  
        self.ddim_sampler = DDIMSampler(self.model)  
        self.ddim_steps = 20  
        self.image_resolution = 512  
        self.num_samples = 1  
        self.save_memory = False  
        self.strength = 1.0  
        self.guess_mode = False  
        self.scale = 9.0  
        self.seed = -1  
        self.a_prompt = 'best quality, extremely detailed'  
        self.n_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality'  
  
    def inference(self, inputs):  
        print("===>Starting depth2image Inference")  
        image_path, instruct_text = inputs.split(",")[0], ','.join(inputs.split(',')[1:])  
        image = Image.open(image_path)  
        image = np.array(image)  
        prompt = instruct_text  
        img = resize_image(HWC3(image), self.image_resolution)  
        H, W, C = img.shape  
        img = cv2.resize(img, (W, H), interpolation=cv2.INTER_NEAREST)  
        control = torch.from_numpy(img.copy()).float().to(device=self.device) / 255.0  
        control = torch.stack([control for _ in range(self.num_samples)], dim=0)  
        control = einops.rearrange(control, 'b h w c -> b c h w').clone()  
        self.seed = random.randint(0, 65535)  
        seed_everything(self.seed)  
        if self.save_memory:  
            self.model.low_vram_shift(is_diffusing=False)  
        cond = {"c_concat": [control], "c_crossattn": [ self.model.get_learned_conditioning([prompt + ', ' + self.a_prompt] * self.num_samples)]}  
        un_cond = {"c_concat": None if self.guess_mode else [control], "c_crossattn": [self.model.get_learned_conditioning([self.n_prompt] * self.num_samples)]}  
        shape = (4, H // 8, W // 8)  
        self.model.control_scales = [self.strength * (0.825 ** float(12 - i)) for i in range(13)] if self.guess_mode else ([self.strength] * 13)  # Magic number. IDK why. Perhaps because 0.825**12<0.01 but 0.826**12>0.01  
        samples, intermediates = self.ddim_sampler.sample(self.ddim_steps, self.num_samples, shape, cond, verbose=False, eta=0., unconditional_guidance_scale=self.scale, unconditional_conditioning=un_cond)  
        if self.save_memory:  
            self.model.low_vram_shift(is_diffusing=False)  
        x_samples = self.model.decode_first_stage(samples)  
        x_samples = (einops.rearrange(x_samples, 'b c h w -> b h w c') * 127.5 + 127.5).cuda().numpy().clip(0, 255).astype(np.uint8)  
        updated_image_path = get_new_image_name(image_path, func_name="depth2image")  
        real_image = Image.fromarray(x_samples[0])  # default the index0 image  
        real_image.save(updated_image_path)  
        return updated_image_path  
  
class image2normal:  
    def __init__(self):  
        print("Direct normal estimation.")  
        self.detector = MidasDetector()  
        self.resolution = 512  
        self.bg_threshold = 0.4  
  
    def inference(self, inputs):  
        print("===>Starting image2 normal Inference")  
        image = Image.open(inputs)  
        image = np.array(image)  
        image = HWC3(image)  
        _, detected_map = self.detector(resize_image(image, self.resolution), bg_th=self.bg_threshold)  
        detected_map = HWC3(detected_map)  
        image = resize_image(image, self.resolution)  
        H, W, C = image.shape  
        detected_map = cv2.resize(detected_map, (W, H), interpolation=cv2.INTER_LINEAR)  
        updated_image_path = get_new_image_name(inputs, func_name="normal-map")  
        image = Image.fromarray(detected_map)  
        image.save(updated_image_path)  
        return updated_image_path  
  
class normal2image:  
    def __init__(self, device):  
        print("Initialize normal2image model...")  
        model = create_model('ControlNet/models/cldm_v15.yaml', device=device).to(device)  
        model.load_state_dict(load_state_dict('ControlNet/models/control_sd15_normal.pth', location='mps'))  
        self.model = model.to(device)  
        self.device = device  
        self.ddim_sampler = DDIMSampler(self.model)  
        self.ddim_steps = 20  
        self.image_resolution = 512  
        self.num_samples = 1  
        self.save_memory = False  
        self.strength = 1.0  
        self.guess_mode = False  
        self.scale = 9.0  
        self.seed = -1  
        self.a_prompt = 'best quality, extremely detailed'  
        self.n_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality'  
  
    def inference(self, inputs):  
        print("===>Starting normal2image Inference")  
        image_path, instruct_text = inputs.split(",")[0], ','.join(inputs.split(',')[1:])  
        image = Image.open(image_path)  
        image = np.array(image)  
        prompt = instruct_text  
        img = image[:, :, ::-1].copy()  
        img = resize_image(HWC3(img), self.image_resolution)  
        H, W, C = img.shape  
        img = cv2.resize(img, (W, H), interpolation=cv2.INTER_NEAREST)  
        control = torch.from_numpy(img.copy()).float().to(device=self.device) / 255.0  
        control = torch.stack([control for _ in range(self.num_samples)], dim=0)  
        control = einops.rearrange(control, 'b h w c -> b c h w').clone()  
        self.seed = random.randint(0, 65535)  
        seed_everything(self.seed)  
        if self.save_memory:  
            self.model.low_vram_shift(is_diffusing=False)  
        cond = {"c_concat": [control], "c_crossattn": [self.model.get_learned_conditioning([prompt + ', ' + self.a_prompt] * self.num_samples)]}  
        un_cond = {"c_concat": None if self.guess_mode else [control], "c_crossattn": [self.model.get_learned_conditioning([self.n_prompt] * self.num_samples)]}  
        shape = (4, H // 8, W // 8)  
        self.model.control_scales = [self.strength * (0.825 ** float(12 - i)) for i in range(13)] if self.guess_mode else ([self.strength] * 13)  
        samples, intermediates = self.ddim_sampler.sample(self.ddim_steps, self.num_samples, shape, cond, verbose=False, eta=0., unconditional_guidance_scale=self.scale, unconditional_conditioning=un_cond)  
        if self.save_memory:  
            self.model.low_vram_shift(is_diffusing=False)  
        x_samples = self.model.decode_first_stage(samples)  
        x_samples = (einops.rearrange(x_samples, 'b c h w -> b h w c') * 127.5 + 127.5).cuda().numpy().clip(0, 255).astype(np.uint8)  
        updated_image_path = get_new_image_name(image_path, func_name="normal2image")  
        real_image = Image.fromarray(x_samples[0])  # default the index0 image  
        real_image.save(updated_image_path)  
        return updated_image_path  
  
class BLIPVQA:  
    def __init__(self, device):  
        print("Initializing BLIP VQA to %s" % device)  
        self.device = device  
        self.processor = BlipProcessor.from_pretrained("Salesforce/blip-vqa-base")  
        self.model = BlipForQuestionAnswering.from_pretrained("Salesforce/blip-vqa-base").to(self.device)  
  
    def get_answer_from_question_and_image(self, inputs):  
        image_path, question = inputs.split(",")  
        raw_image = Image.open(image_path).convert('RGB')  
        print(F'BLIPVQA :question :{question}')  
        inputs = self.processor(raw_image, question, return_tensors="pt").to(self.device)  
        out = self.model.generate(**inputs)  
        answer = self.processor.decode(out[0], skip_special_tokens=True)  
        return answer  
  
class ConversationBot:  
    def __init__(self):  
        print("Initializing VisualChatGPT")  
        self.llm = OpenAI(temperature=0)  
        #self.edit = ImageEditing(device="mps")  
        self.i2t = ImageCaptioning(device="mps")  
        self.t2i = T2I(device="mps")  
        # self.image2canny = image2canny()  
        # self.canny2image = canny2image(device="mps")  
        # self.image2line = image2line()  
        # self.line2image = line2image(device="mps")  
        # self.image2hed = image2hed()  
        # self.hed2image = hed2image(device="mps")  
        # self.image2scribble = image2scribble()  
        # self.scribble2image = scribble2image(device="mps")  
        # self.image2pose = image2pose()  
        # self.pose2image = pose2image(device="mps")  
        # self.BLIPVQA = BLIPVQA(device="mps")  
        # self.image2seg = image2seg()  
        # self.seg2image = seg2image(device="mps")  
        # self.image2depth = image2depth()  
        # self.depth2image = depth2image(device="mps")  
        # self.image2normal = image2normal()  
        # self.normal2image = normal2image(device="mps")  
        #self.pix2pix = Pix2Pix(device="mps")  
        self.memory = ConversationBufferMemory(memory_key="chat_history", output_key='output')  
        self.tools = [  
            Tool(name="Get Photo Description", func=self.i2t.inference,  
                 description="useful when you want to know what is inside the photo. receives image_path as input. "  
                             "The input to this tool should be a string, representing the image_path. "),  
            Tool(name="Generate Image From User Input Text", func=self.t2i.inference,  
                 description="useful when you want to generate an image from a user input text and save it to a file. like: generate an image of an object or something, or generate an image that includes some objects. "  
                             "The input to this tool should be a string, representing the text used to generate image. "),  
            # Tool(name="Get Photo Description", func=self.i2t.inference,  
            #      description="useful when you want to know what is inside the photo. receives image_path as input. "  
            #                  "The input to this tool should be a string, representing the image_path. "),  
            # Tool(name="Generate Image From User Input Text", func=self.t2i.inference,  
            #      description="useful when you want to generate an image from a user input text and save it to a file. like: generate an image of an object or something, or generate an image that includes some objects. "  
            #                  "The input to this tool should be a string, representing the text used to generate image. "),  
            # Tool(name="Remove Something From The Photo", func=self.edit.remove_part_of_image,  
            #      description="useful when you want to remove and object or something from the photo from its description or location. "  
            #                  "The input to this tool should be a comma seperated string of two, representing the image_path and the object need to be removed. "),  
            # Tool(name="Replace Something From The Photo", func=self.edit.replace_part_of_image,  
            #      description="useful when you want to replace an object from the object description or location with another object from its description. "  
            #                  "The input to this tool should be a comma seperated string of three, representing the image_path, the object to be replaced, the object to be replaced with "),  
  
            # Tool(name="Instruct Image Using Text", func=self.pix2pix.inference,  
            #      description="useful when you want to the style of the image to be like the text. like: make it look like a painting. or make it like a robot. "  
            #                  "The input to this tool should be a comma seperated string of two, representing the image_path and the text. "),  
            # Tool(name="Answer Question About The Image", func=self.BLIPVQA.get_answer_from_question_and_image,  
            #      description="useful when you need an answer for a question based on an image. like: what is the background color of the last image, how many cats in this figure, what is in this figure. "  
            #     "The input to this tool should be a comma seperated string of two, representing the image_path and the question"),  
            # Tool(name="Edge Detection On Image", func=self.image2canny.inference,  
            #      description="useful when you want to detect the edge of the image. like: detect the edges of this image, or canny detection on image, or peform edge detection on this image, or detect the canny image of this image. "  
            #                  "The input to this tool should be a string, representing the image_path"),  
            # Tool(name="Generate Image Condition On Canny Image", func=self.canny2image.inference,  
            #      description="useful when you want to generate a new real image from both the user desciption and a canny image. like: generate a real image of a object or something from this canny image, or generate a new real image of a object or something from this edge image. "  
            #                  "The input to this tool should be a comma seperated string of two, representing the image_path and the user description. "),  
            # Tool(name="Line Detection On Image", func=self.image2line.inference,  
            #      description="useful when you want to detect the straight line of the image. like: detect the straight lines of this image, or straight line detection on image, or peform straight line detection on this image, or detect the straight line image of this image. "  
            #                  "The input to this tool should be a string, representing the image_path"),  
            # Tool(name="Generate Image Condition On Line Image", func=self.line2image.inference,  
            #      description="useful when you want to generate a new real image from both the user desciption and a straight line image. like: generate a real image of a object or something from this straight line image, or generate a new real image of a object or something from this straight lines. "  
            #                  "The input to this tool should be a comma seperated string of two, representing the image_path and the user description. "),  
            # Tool(name="Hed Detection On Image", func=self.image2hed.inference,  
            #      description="useful when you want to detect the soft hed boundary of the image. like: detect the soft hed boundary of this image, or hed boundary detection on image, or peform hed boundary detection on this image, or detect soft hed boundary image of this image. "  
            #                  "The input to this tool should be a string, representing the image_path"),  
            # Tool(name="Generate Image Condition On Soft Hed Boundary Image", func=self.hed2image.inference,  
            #      description="useful when you want to generate a new real image from both the user desciption and a soft hed boundary image. like: generate a real image of a object or something from this soft hed boundary image, or generate a new real image of a object or something from this hed boundary. "  
            #                  "The input to this tool should be a comma seperated string of two, representing the image_path and the user description"),  
            # Tool(name="Segmentation On Image", func=self.image2seg.inference,  
            #      description="useful when you want to detect segmentations of the image. like: segment this image, or generate segmentations on this image, or peform segmentation on this image. "  
            #                  "The input to this tool should be a string, representing the image_path"),  
            # Tool(name="Generate Image Condition On Segmentations", func=self.seg2image.inference,  
            #      description="useful when you want to generate a new real image from both the user desciption and segmentations. like: generate a real image of a object or something from this segmentation image, or generate a new real image of a object or something from these segmentations. "  
            #                  "The input to this tool should be a comma seperated string of two, representing the image_path and the user description"),  
            # Tool(name="Predict Depth On Image", func=self.image2depth.inference,  
            #      description="useful when you want to detect depth of the image. like: generate the depth from this image, or detect the depth map on this image, or predict the depth for this image. "  
            #                  "The input to this tool should be a string, representing the image_path"),  
            # Tool(name="Generate Image Condition On Depth",  func=self.depth2image.inference,  
            #      description="useful when you want to generate a new real image from both the user desciption and depth image. like: generate a real image of a object or something from this depth image, or generate a new real image of a object or something from the depth map. "  
            #                  "The input to this tool should be a comma seperated string of two, representing the image_path and the user description"),  
            # Tool(name="Predict Normal Map On Image", func=self.image2normal.inference,  
            #      description="useful when you want to detect norm map of the image. like: generate normal map from this image, or predict normal map of this image. "  
            #                  "The input to this tool should be a string, representing the image_path"),  
            # Tool(name="Generate Image Condition On Normal Map", func=self.normal2image.inference,  
            #      description="useful when you want to generate a new real image from both the user desciption and normal map. like: generate a real image of a object or something from this normal map, or generate a new real image of a object or something from the normal map. "  
            #                  "The input to this tool should be a comma seperated string of two, representing the image_path and the user description"),  
            # Tool(name="Sketch Detection On Image", func=self.image2scribble.inference,  
            #      description="useful when you want to generate a scribble of the image. like: generate a scribble of this image, or generate a sketch from this image, detect the sketch from this image. "  
            #                  "The input to this tool should be a string, representing the image_path"),  
            # Tool(name="Generate Image Condition On Sketch Image", func=self.scribble2image.inference,  
            #      description="useful when you want to generate a new real image from both the user desciption and a scribble image or a sketch image. "  
            #                  "The input to this tool should be a comma seperated string of two, representing the image_path and the user description"),  
            # Tool(name="Pose Detection On Image", func=self.image2pose.inference,  
            #      description="useful when you want to detect the human pose of the image. like: generate human poses of this image, or generate a pose image from this image. "  
            #                  "The input to this tool should be a string, representing the image_path"),  
            # Tool(name="Generate Image Condition On Pose Image", func=self.pose2image.inference,  
            #      description="useful when you want to generate a new real image from both the user desciption and a human pose image. like: generate a real image of a human from this human pose image, or generate a new real image of a human from this pose. "  
            #                  "The input to this tool should be a comma seperated string of two, representing the image_path and the user description")  
              
            ]  
        self.agent = initialize_agent(  
            self.tools,  
            self.llm,  
            agent="conversational-react-description",  
            verbose=True,  
            memory=self.memory,  
            return_intermediate_steps=True,  
            agent_kwargs={'prefix': VISUAL_CHATGPT_PREFIX, 'format_instructions': VISUAL_CHATGPT_FORMAT_INSTRUCTIONS, 'suffix': VISUAL_CHATGPT_SUFFIX}, )  
  
    def run_text(self, text, state):  
        print("===============Running run_text =============")  
        print("Inputs:", text, state)  
        print("======>Previous memory:\n %s" % self.agent.memory)  
        #self.agent.memory.buffer = cut_dialogue_history(self.agent.memory.buffer, keep_last_n_words=500)  
        res = self.agent({"input": text})  
        print("======>Current memory:\n %s" % self.agent.memory)  
        response = re.sub('(image/\S*png)', lambda m: f'![](/file={m.group(0)})*{m.group(0)}*', res['output'])  
        state = state + [(text, response)]  
        print("Outputs:", state)  
        return state, state  
  
    def run_image(self, image, state, txt):  
        print("===============Running run_image =============")  
        print("Inputs:", image, state)  
        print("======>Previous memory:\n %s" % self.agent.memory)  
        image_filename = os.path.join('image', str(uuid.uuid4())[0:8] + ".png")  
        print("======>Auto Resize Image...")  
        img = Image.open(image.name)  
        width, height = img.size  
        ratio = min(512 / width, 512 / height)  
        width_new, height_new = (round(width * ratio), round(height * ratio))  
        img = img.resize((width_new, height_new))  
        img = img.convert('RGB')  
        img.save(image_filename, "PNG")  
        print(f"Resize image form {width}x{height} to {width_new}x{height_new}")  
        description = self.i2t.inference(image_filename)  
        Human_prompt = "\nHuman: provide a figure named {}. The description is: {}. This information helps you to understand this image, but you should use tools to finish following tasks, " \  
                       "rather than directly imagine from my description. If you understand, say \"Received\". \n".format(image_filename, description)  
        AI_prompt = "Received.  "  
        #self.agent.memory.buffer = self.agent.memory.buffer + Human_prompt + 'AI: ' + AI_prompt  
        self.agent.memory.buffer.save_context({"input": Human_prompt}, {"output": AI_prompt})  
        print("======>Current memory:\n %s" % self.agent.memory)  
        state = state + [(f"![](/file={image_filename})*{image_filename}*", AI_prompt)]  
        print("Outputs:", state)  
        return state, state, txt + ' ' + image_filename + ' '  
  
if __name__ == '__main__':  
    bot = ConversationBot()  
    with gr.Blocks(css="#chatbot .overflow-y-auto{height:500px}") as demo:  
        chatbot = gr.Chatbot(elem_id="chatbot", label="Visual ChatGPT")  
        state = gr.State([])  
        with gr.Row():  
            with gr.Column(scale=0.7):  
                txt = gr.Textbox(show_label=False, placeholder="Enter text and press enter, or upload an image").style(container=False)  
            with gr.Column(scale=0.15, min_width=0):  
                clear = gr.Button("Clear️")  
            with gr.Column(scale=0.15, min_width=0):  
                btn = gr.UploadButton("Upload", file_types=["image"])  
  
        txt.submit(bot.run_text, [txt, state], [chatbot, state])  
        txt.submit(lambda: "", None, txt)  
        btn.upload(bot.run_image, [btn, state, txt], [chatbot, state, txt])  
        clear.click(bot.memory.clear)  
        clear.click(lambda: [], None, chatbot)  
        clear.click(lambda: [], None, state)  
        demo.launch(server_name="0.0.0.0", server_port=7860)

注意，以上代码是修改了MPS模式、langchain库bug以及屏蔽了多个模型后的修改版本。

运行Visual ChatGPT

折腾了大半天，终于可以无错误运行了：

python3 visual_chatgpt.py

程序返回：

➜  visual-chatgpt git:(main) ✗ python visual_chatgpt.py                                                   
Initializing VisualChatGPT  
Initializing ImageCaptioning to mps  
Initializing T2I to mps  
/opt/homebrew/lib/python3.10/site-packages/transformers/models/clip/feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead.  
  warnings.warn(  
Running on local URL:  http://0.0.0.0:7860

编程的乐趣就在于，当你为了运行某个程序经历了千难万险，甚至濒临绝望的时候，突然，程序调通了，此时大脑皮层会大量分泌多巴胺（dopamine），那感觉，就像突然领悟了人生妙谛，又像是终于明白了天人化生、万物滋长的要道，简而言之，白日飞升，快乐加倍，那种精神上的享受，绝对比玩电子游戏或者享受美食更加的高级。

随后访问http://localhost:7860：

直接用中文开聊即可，不需要ControlNet那些令人厌烦的引导词。

后台程序逻辑：

Inputs: 给我一只大金毛 []  
======>Previous memory:  
 chat_memory=ChatMessageHistory(messages=[]) output_key='output' input_key=None return_messages=False human_prefix='Human' ai_prefix='AI' memory_key='chat_history'  
  
  
> Entering new AgentExecutor chain...  
 Yes  
Action: Generate Image From User Input Text  
Action Input: A golden retrieverSetting `pad_token_id` to `eos_token_id`:50256 for open-end generation.  
A golden retriever refined to A golden retriever,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,  
100%|█████████████████████████████████████████████████████████████████████████████████| 50/50 [00:47<00:00,  1.05it/s]  
Processed T2I.run, text: A golden retriever, image_filename: image/865c561f.png  
  
Observation: image/865c561f.png  
Thought: Do I need to use a tool? No  
AI: Here is a golden retriever for you: image/865c561f.png  
  
> Finished chain.  
======>Current memory:  
 chat_memory=ChatMessageHistory(messages=[HumanMessage(content='给我一只大金毛', additional_kwargs={}), AIMessage(content='Here is a golden retriever for you: image/865c561f.png', additional_kwargs={})]) output_key='output' input_key=None return_messages=False human_prefix='Human' ai_prefix='AI' memory_key='chat_history'  
Outputs: [('给我一只大金毛', 'Here is a golden retriever for you: ![](/file=image/865c561f.png)*image/865c561f.png*')]

通过观察，我们可以得知，虽然是中文聊天，但其实ChatGPT会把中文翻译为英文，将“给我一只大金毛”翻译为：“a golden retriever”。

随后通过模型训练生成图片，再将聊天记录添加到上下文列表中，关于ChatGPT的聊天上下文，请参照：重新定义性价比!人工智能AI聊天ChatGPT新接口模型gpt-3.5-turbo闪电更新,成本降90%,Python3.10接入

当然，为了可以线下单机环境将Visual ChatGPT成功跑起来，所以屏蔽了多个ControlNet图像模型，因此有些图片场景并不那么尽如人意：

结语

有的时候，当我们称赞一项技术的时候，我们会称其为这样或者那样的行业标杆、教科书之类，但是对于ChatGPT来说，它已经超越了所谓的什么标杆，或者说得更准确一些，它是标杆中的标杆，其他的所谓的类ChatGPT产品，别说望其项背了，就连ChatGPT的尾气也闻不到，说白了，想碰瓷都不知道该怎么碰，因为神明早已在ChatGPT的命格中写下八个大字：前无古人，后无来者！最后，奉上修改后的项目代码，与众乡亲同飨：github.com/zcxey2911/visual_chatgpt_mps_cut

你可能感兴趣的:(人工智能,microsoft,开源,python,后端)

计算机毕业设计Python+uniapp校园兼职系统小程序(小程序+源码+LW) Python毕设源码程序高学长 python 课程设计 uni-app
计算机毕业设计Python+uniapp校园兼职系统小程序(小程序+源码+LW)该项目含有源码、文档、程序、数据库、配套开发软件、软件安装教程项目运行环境配置：Pychram社区版+python3.7.7+Mysql5.7+uni+HBuilderX+listpip+Navicat11+Django+nodejs。项目技术：django+python+UNI等等组成，B/S模式+pychram管理
Python-Django毕业设计养老院老人日常生活管理系统（程序+Lw) Python计算机毕设程序源码_ python django 课程设计
该项目含有源码、文档、程序、数据库、配套开发软件、软件安装教程项目运行环境配置：Pychram社区版+python3.7.7+Mysql5.7+HBuilderX+listpip+Navicat11+Django+nodejs。项目技术：django+python+Vue等等组成，B/S模式+pychram管理等等。环境需要1.运行环境：最好是python3.7.7，我们在这个版本上开发的。其他版
Supervisor 入门指南一篇就够 —— 安装、项目配置与常见报错速查逻极 python 开发工具笔记 python 运维工具开发 supervisor
Supervisor入门指南一篇就够——安装、项目配置与常见报错速查一、Supervisor是什么在服务器进程管理中，Supervisor是一款用Python编写的进程守护与管理工具。它的核心功能是将普通的命令行进程转变为后台daemon进程，并且在进程因意外情况退出时，能够自动将其重启，保证进程的持续运行。在实际应用中，它常出现在多层架构里。比如在Nginx→Gunicorn/Django→Su
LVS----DR模式配置 KellenKellenHao 服务器运维 lvs DR模式
LVS-DR模式配置通过负载均衡器配置：将负载均衡器的IP地址设置为Web节点的虚拟IP（VIP）地址。这样，客户端的请求将发送到负载均衡器的VIP地址。客户端请求到达负载均衡器：当客户端发送请求到负载均衡器的VIP地址时，负载均衡器会接收到请求。负载均衡器将请求转发到后端Web节点：负载均衡器将请求转发到其中一个后端Web节点。它通过修改目标IP地址为相应的Web节点的IP地址，并修改源IP地址
LVS-----TUN模式配置 KellenKellenHao 服务器 linux 网络 LVS TUN模式
一、实验环境主机名称ip角色lvs-tun192.168.100.100lvs调度器web1192.168.100.1后端web服务器1web2192.168.100.2后端web服务器2client192.168.100.200客户端二、搭建lvs调度器的TUN模式1、lvs-tun调度器配置#清除DR模式的策略ipvsadm-C#查看策略ipvsadm-L[root@lvs-tun~]#ipv
Python基础（字符串的切片与断言）日暮凡尘 python 开发语言 pycharm
'''1.输入一个字符串，判断是否只包含英文字母（大写或小写）。输出True或False。2.输入一个字符串，统计里面数字字符（0-9）的数量。3.输入两个字符串，第一个是主串，第二个是要查找的字符，判断字符是否在主串中。4.输入一个字符串，将所有数字字符转换成整数后求和。5.统计字符串中空格的数量6.输入字符串和数字n，判断字符串是否只包含数字且长度等于n。7.验证用户输入的手机号格式（中国手机
python 变量进阶（理解）程序员同行者
变量进阶（理解）目标变量的引用可变和不可变类型局部变量和全局变量01.变量的引用变量和数据都是保存在内存中的在Python中函数的参数传递以及返回值都是靠引用传递的1.1引用的概念在Python中变量和数据是分开存储的数据保存在内存中的一个位置变量中保存着数据在内存中的地址变量中记录数据的地址，就叫做引用使用id()函数可以查看变量中保存数据所在的内存地址注意：如果变量已经被定义，当给一个变量赋值
PHP 性能优化全攻略：提升 Web 应用速度的关键来恩1003 PHP 从入门到精通 php 性能优化前端
PHP学习资料PHP学习资料PHP学习资料在Web开发领域，PHP凭借其简单易用、开源免费等特性，成为众多开发者构建网站和应用的首选语言。然而，随着业务的发展和用户量的增加，PHP应用的性能问题逐渐凸显。性能不佳不仅会导致用户体验下降，还可能影响业务的发展。因此，对PHP代码进行性能优化至关重要。本文将深入探讨PHP性能优化的各个方面，包括缓存的使用、代码优化策略以及服务器配置优化等，帮助开发者打
基于SpringBoot+Vue的在线学习系统的设计与实现
一、项目背景与选题动因随着在线教育的快速发展，传统的教学模式已逐渐无法满足现代学习者“随时随地”获取知识的需求。在线学习平台凭借其强大的可扩展性和资源整合能力，在教育信息化浪潮中日益重要。本项目旨在基于SpringBoot+Vue实现一个结构清晰、功能完善的在线学习系统，满足不同用户角色（学生、教师、管理员）在教学、学习、管理等方面的实际需求。适合学习SpringBoot、Vue前后端分离、权限管
python——for_in循环何处望天明CS python
#Nico#时间：2021/4/2021:09#for-in循环'''in表达式从（字符串、序列等）中依次取值，又称为遍历for-in遍历的对象必须是可迭代对象''''''for-in的语法结构for自定义变量in可迭代对象:循环体'''#字符串中取值foritemin'python':print(item)#range产生一个整数序列，也是一个可迭代对象foriinrange(10):print
一步一步学Python3(小学生也适用) 第十七篇:循环语句for in循环
一、Pythonforin循环Pythonforin循环，是用来遍历任何数据序列，如一个列表，一个字符串，一个字典，一个元组等。forin循环的一般语法如下：foritemin序列:语句块else:语句块forin字符串：把每个字符循环出来'''字符串：把每个字符循环出来'''str1='老树Python''''把字符串str1元素进行循环，每循环出一个元素，就把该元素赋值给item'''fori
智慧水库信息化系统建设产品需求文档V2.0 小赖同学啊 test Technology Precious 物联网
智慧水库信息化系统建设产品需求文档1.引言1.1文档目的本文档旨在明确智慧水库信息化系统的建设需求，为系统设计、开发和实施提供全面依据，确保系统功能满足水库管理业务需求，提升水库管理的智能化水平和决策效率。1.2背景介绍传统水库管理面临数据采集不及时、分析手段有限、决策依赖经验等问题，难以应对复杂多变的水文情势和日益增长的管理需求。随着物联网、大数据、人工智能等技术的发展，智慧水库建设成为必然趋势
OpenCV中常用特征提取算法（SURF、ORB、SIFT和AKAZE）用法示例（C++和Python）点云SLAM 图形图像处理 opencv 算法 ORB算法 SIFT算法 SURF算法 AKAZE算法计算机视觉
OpenCV中提供了多种常用的特征提取算法，广泛应用于图像匹配、拼接、SLAM、物体识别等任务。以下是OpenCV中几个主流特征提取算法的用法总结与代码示例，涵盖C++和Python两个版本。常用特征提取算法列表算法特点是否需额外模块SIFT（尺度不变特征）稳定性强、可旋转缩放xfeatures2d模块SURF（加速稳健特征）快速但专利保护xfeatures2d模块ORB（OrientedFAST
python 循环结构(for-in) 编程小僧 python基础
循环结构(for-in)说明：也是循环结构的一种，经常用于遍历字符串、列表，元组，字典等格式：forxiny:循环体执行流程：x依次表示y中的一个元素，遍历完所有元素循环结束示例1：遍历字符串s='Iloveyoumorethanicansay'foriins:print(i)示例2：遍历列表l=['鹅鹅鹅','曲项向天歌','锄禾日当午','春种一粒粟']foriinl:print(i)#可以
IM即时通讯源码/im源码基于uniapp框架从0开始设计搭建在线聊天系统宠友信息 uni-app mysql spring boot java 小程序
文章目录前言一、确定技术栈二、数据库设计：1.引入库2.使用SpringBoot创建后端项目3.实现WebSocket通信：3.1创建WebSocket配置类：3.2创建ChatWebSocketHandler类：3.3前端WebSocket连接与通信：总结前言随着人社交产品的不断发展，即时通讯聊天这门技术也越来越重要，很多人都开启了学习通讯技术，本文就介绍了即时通讯的基础内容。一、确定技术栈在开
从零开始学 Linux：循序渐进的学习指南我爱学嵌入式 Linux基础 linux 服务器
Linux作为一款开源、稳定且安全的操作系统，在服务器领域、嵌入式开发、云计算等场景中占据着举足轻重的地位。对于程序员、运维工程师或IT爱好者而言，掌握Linux技能已成为一项核心竞争力。但面对命令行界面和复杂的系统架构，很多初学者往往感到无从下手。本文将为你梳理一条清晰的Linux学习路径，助你从入门到精通。一、明确学习目标：为什么学Linux？学习Linux前需明确目标，不同目标对应不同的学习
Python学习笔记 cherishSpring python python 学习笔记
目录一、名词解释二、数据类型（变量名无类型，变量值有类型）三、数据类型转换(万物皆可转字符串)四、标识符五、运算符六、字符串扩展七、数据输入八、if语句九、while语句十、for循环语句十一、函数十二、数据容器1、List列表2、tuple元组3、字符串4、序列的常用操作-切片5、set集合6、dict字典7、数据容器相互转换8、通用操作十三、文件编码一、名词解释1、字面量被写在代码中的固定的值
Python for循环 dengdieli5313 python
Pythonfor循环可以遍历任何序列的项目，如一个列表或者一个字符串。for循环的语法结构如下：foriterating_varinsequence:statements(s)最简单的形式如下，循环10次。1foriinrange(10):2print("loop:",i)输出为1loop:02loop:13loop:24loop:35loop:46loop:57loop:68loop:79lo
Microsoft 紧急修补 SharePoint 远程代码执行漏洞，应对持续网络攻击
微软紧急修复SharePoint高危漏洞并提供安全建议周日，微软发布了一个关键补丁，用于修复SharePoint中正在被恶意利用的安全漏洞，同时公布了另一个已通过"增强防护措施"修复的漏洞细节。这家科技巨头确认，目前已发现攻击者正针对本地部署的SharePointServer用户发起攻击，利用的是7月安全更新中未完全修复的漏洞。漏洞详情当前被利用的漏洞编号为CVE-2025-53770（CVSS评
黄仁勋链博会演讲实录：脱掉皮衣，穿上唐装，中文开场
黄仁勋一度尝试用中文开场，他说，“我在美国长大，学到了很多汉语。”他表示，像DeepSeek、阿里巴巴、MiniMax、百度，他们开发的产品都是世界级的，推动了全球人工智能的发展。中国的开源AI是全球进步的催化剂，以至于全世界各个行业都有机会加入到AI革命当中。7月16日，黄仁勋身着唐装出席了第三届链博会，在此之前，他身着标志性皮衣出席多个场合活动。在此之前，英伟达官宣获得H20芯片对华的出口许可
python的for-in循环小白L. 入门 python numpy 开发语言
‘’‘for-in循环in表达从（字符串序列）中依次取值，又称为遍历for-in遍历的对象必须是可迭代对象for-in的语法结构for自定义的变量in可迭代对象:循环体循环体内不需要访问自定义变量，可以将自定义变量替代为下划线’‘’#第一次取出来的是P，将P赋值item，将item的值输出foritemin'python':print(item)#range（）产生一个整数序列，–》也是一个可迭代
从零开发推客小程序系统：完整技术方案与实战经验 wx_ywyy6798 小程序推客小程序开发推客系统开发微信小程序推客小程序推客系统推客分销系统开发
一、推客小程序的市场价值社交电商爆发式增长背景推客模式的优势：低成本获客、用户裂变小程序作为推客系统载体的天然优势二、技术架构设计text1.前端技术栈：-微信小程序原生开发/uni-app跨平台方案-自定义分享组件开发-可视化数据看板实现2.后端技术选型：-Node.js/PHP/JavaSpringBoot等后端框架对比-高性能分销关系链存储方案-佣金结算系统的设计要点3.数据库设计：-用户层
Python-for-in循环難釋懷 python windows 服务器
一、前言在Python编程中，循环结构（LoopStructure）是程序控制流的重要组成部分。其中，for...in循环是Python中最常用、最简洁的迭代工具之一。与传统的C风格语言中的for不同，Python的for...in循环专门用于遍历可迭代对象（Iterable），如列表、元组、字符串、字典、集合，甚至是生成器等。本文将带你深入了解：for...in循环的基本语法；如何高效地遍历各种
【AI 赋能：Python 人工智能应用实战】5. 梯度下降家族：SGD/Adam优化器对比实验与选择策略 AI_DL_CODE 人工智能 python 梯度下降优化器 SGD Adam PyTorch
摘要：本文系统解析梯度下降优化器的核心原理与演进脉络，构建从理论到实战的完整知识体系。理论部分梳理优化器发展里程碑，从1951年的SGD到2018年的AdamW，揭示技术迭代逻辑；通过数学公式对比SGD、Momentum、Adam等核心算法的更新机制，解析动量加速、自适应学习率的创新点。结合损失曲面分析，阐释Momentum如何逃离鞍点、Adam如何处理悬崖梯度。实战模块基于PyTorch在MNI
【人工智能之深度学习】6. 卷积核工作原理：从边缘检测到特征抽象的逐层演进（附可视化工具与行业实战代码） AI_DL_CODE 人工智能深度学习卷积核特征提取卷积神经网络边缘检测特征可视化
摘要：卷积核是卷积神经网络（CNN）的核心组件，其通过局部感受野与参数共享机制实现高效特征提取。本文从数学本质出发，揭示卷积操作的空域-频域对偶性：空域卷积等价于频域乘积（F{f∗g}=F{f}⋅F{g}F\{f*g\}=F\{f\}⋅F\{g\}F{f∗g}=F{f}⋅F{g}），解释边缘检测核（Sobel、Laplacian）的频域响应特性。通过特征可视化实验表明，CNN特征呈现逐层抽象规律：
颠覆未来：创新代码引领人工智能与量子计算深度融合金枝玉叶9 程序员知识储备1 程序员知识储备2 程序员知识储备3 人工智能量子计算
摘要在信息时代飞速演进的背景下，人工智能与量子计算正以前所未有的速度互相融合，推动着科技边界的不断拓展。本文回顾了经典算法的智慧，展示了前沿深度学习模型的构建，并通过量子电路设计探讨了创新代码的可能性，为探索未来科技变革提供了全新视角。1.引言当前，科技创新正处于高速迭代的关键阶段，传统计算方法与新型技术的交汇处正成为研究热点。人工智能的发展已渗透到各行各业，而量子计算的崛起则为解决复杂计算问题提
Python设计模式：适配模式 niuguangshuo python基础 python 设计模式开发语言
1.适配模式（AdapterPattern）详解适配模式（AdapterPattern）是一种结构型设计模式，它允许将一个类的接口转换成客户端所期望的另一种接口。适配模式使得原本由于接口不兼容而无法一起工作的类可以协同工作。换句话说，适配模式充当了一个桥梁，允许不同接口的类之间进行交互。在软件开发中，常常会遇到需要使用现有类的情况，但这些类的接口与我们需要的接口不匹配。适配模式提供了一种解决方案，
【大模型】结构化提示词：让AI高效完成复杂任务的“编程语言” JosieBook AI/大数据/云计算人工智能
文章目录前言：提示词一、不同提示词写作方法对比进阶技巧对比表实战组合策略二、三板斧：精准撰写提示词的黄金法则角色设定：为AI精准定位任务描述：明确行动指南输出要求：规范成果呈现三、魔法棒：零基础也能用的“AI需求翻译机”四、结构化：把提示词写成“可插拔的乐高”五、分治法：把“庞然大物”拆成可并行的小任务前言：提示词在人工智能时代，提示词（Prompt）已成为连接人类意图与AI能力的核心媒介。优质的
使用UV管理PyTorch项目
PyTorch是深度学习研究和开发的流行选择。可以使用uv管理PyTorch项目，包括不同Python版本依赖、管理环境、甚至加速器选择等。安装Pytorch从打包角度来看，PyTorch有几个不常见的特点：许多PyTorchwheel托管在专门的索引上，而非Python包索引（PyPI）。因此，安装PyTorch通常需要配置项目使用PyTorch专属索引。PyTorch为每种加速器生成不同的构建
数字图像处理（三：图像如果当作矩阵，那加减乘除处理了矩阵，那图像咋变）：从LED冬奥会、奥运会及春晚等等大屏，到手机小屏，快来挖一挖里面都有什么
数字图像处理（三）一、（准备工作：咋玩，用什么玩具）图像以矩阵形式存储，那矩阵一变、图像立刻跟着变？1.Python+JupyterNotebook/Lab+库(NumPy,OpenCV,Matplotlib,scikit-image)2.MATLAB+ImageProcessingToolbox3.JavaScript+HTML5Canvas+浏览器4.专业的图像处理软件(带脚本/插件功能)二、
ios内付费 374016526 ios 内付费
近年来写了很多IOS的程序，内付费也用到不少，使用IOS的内付费实现起来比较麻烦，这里我写了一个简单的内付费包，希望对大家有帮助。具体使用如下: 这里的sender其实就是调用者，这里主要是为了回调使用。 [KuroStoreApi kuroStoreProductId:@"产品ID" storeSender:self storeFinishCallBa
20 款优秀的 Linux 终端仿真器 brotherlamp linux linux视频 linux资料 linux自学 linux教程
终端仿真器是一款用其它显示架构重现可视终端的计算机程序。换句话说就是终端仿真器能使哑终端看似像一台连接上了服务器的客户机。终端仿真器允许最终用户用文本用户界面和命令行来访问控制台和应用程序。（LCTT 译注：终端仿真器原意指对大型机-哑终端方式的模拟，不过在当今的 Linux 环境中，常指通过远程或本地方式连接的伪终端，俗称“终端”。）你能从开源世界中找到大量的终端仿真器，它们
Solr Deep Paging(solr 深分页) eksliang solr深分页 solr分页性能问题
转载请出自出处：http://eksliang.iteye.com/blog/2148370 作者：eksliang(ickes) blg:http://eksliang.iteye.com/ 概述长期以来，我们一直有一个深分页问题。如果直接跳到很靠后的页数，查询速度会比较慢。这是因为Solr的需要为查询从开始遍历所有数据。直到Solr的4.7这个问题一直没有一个很好的解决方案。直到solr
数据库面试题 18289753290 面试题数据库
1.union ,union all 网络搜索出的最佳答案： union和union all的区别是,union会自动压缩多个结果集合中的重复结果，而union all则将所有的结果全部显示出来，不管是不是重复。 Union：对两个结果集进行并集操作，不包括重复行，同时进行默认规则的排序； Union All：对两个结果集进行并集操作，包括重复行，不进行排序； 2.索引有哪些分类？作用是
Android TV屏幕适配酷的飞上天空 android
先说下现在市面上TV分辨率的大概情况两种分辨率为主 1.720标清，分辨率为1280x720. 屏幕尺寸以32寸为主，部分电视为42寸 2.1080p全高清，分辨率为1920x1080 屏幕尺寸以42寸为主，此分辨率电视屏幕从32寸到50寸都有适配遇到问题，已1080p尺寸为例：分辨率固定不变，屏幕尺寸变化较大。如：效果图尺寸为1920x1080，如果使用d
Timer定时器与ActionListener联合应用永夜-极光 java
功能:在控制台每秒输出一次代码: package Main; import javax.swing.Timer; import java.awt.event.*; public class T { private static int count = 0; public static void main(String[] args){
Ubuntu14.04系统Tab键不能自动补全问题解决随便小屋 Ubuntu 14.04
Unbuntu 14.4安装之后就在终端中使用Tab键不能自动补全，解决办法如下： 1、利用vi编辑器打开/etc/bash.bashrc文件（需要root权限） sudo vi /etc/bash.bashrc 接下来会提示输入密码 2、找到文件中的下列代码 #enable bash completion in interactive shells #if
学会人际关系三招轻松走职场 aijuans 职场
要想成功，仅有专业能力是不够的，处理好与老板、同事及下属的人际关系也是门大学问。如何才能在职场如鱼得水、游刃有余呢？在此，教您简单实用的三个窍门。　　第一，多汇报最近，管理学又提出了一个新名词“追随力”。它告诉我们，做下属最关键的就是要多请示汇报，让上司随时了解你的工作进度，有了新想法也要及时建议。不知不觉，你就有了“追随力”，上司会越来越了解和信任你。　　第二，勤沟通团队的力
《O2O：移动互联网时代的商业革命》读书笔记 aoyouzi 读书笔记
移动互联网的未来：碎片化内容+碎片化渠道=各式精准、互动的新型社会化营销。 O2O：Online to OffLine 线上线下活动 O2O就是在移动互联网时代，生活消费领域通过线上和线下互动的一种新型商业模式。手机二维码本质：O2O商务行为从线下现实世界到线上虚拟世界的入口。线上虚拟世界创造的本意是打破信息鸿沟，让不同地域、不同需求的人
js实现图片随鼠标滚动的效果百合不是茶 JavaScript 滚动属性的获取图片滚动属性获取页面加载
1,获取样式属性值 top 与顶部的距离 left 与左边的距离 right 与右边的距离 bottom 与下边的距离 zIndex 层叠层次例子:获取左边的宽度,当css写在body标签中时 <div id="adver" style="position:absolute;top:50px;left:1000p
ajax同步异步参数async bijian1013 jquery Ajax async
开发项目开发过程中，需要将ajax的返回值赋到全局变量中，然后在该页面其他地方引用，因为ajax异步的原因一直无法成功，需将async:false，使其变成同步的。格式： $.ajax({ type: 'POST', ur
Webx3框架（1） Bill_chen eclipse spring maven 框架 ibatis
Webx是淘宝开发的一套Web开发框架，Webx3是其第三个升级版本；采用Eclipse的开发环境，现在支持java开发；采用turbine原型的MVC框架，扩展了Spring容器，利用Maven进行项目的构建管理，灵活的ibatis持久层支持，总的来说，还是一套很不错的Web框架。 Webx3遵循turbine风格，velocity的模板被分为layout/screen/control三部
【MongoDB学习笔记五】MongoDB概述 bit1129 mongodb
MongoDB是面向文档的NoSQL数据库，尽量业界还对MongoDB存在一些质疑的声音，比如性能尤其是查询性能、数据一致性的支持没有想象的那么好，但是MongoDB用户群确实已经够多。MongoDB的亮点不在于它的性能，而是它处理非结构化数据的能力以及内置对分布式的支持(复制、分片达到的高可用、高可伸缩)，同时它提供的近似于SQL的查询能力，也是在做NoSQL技术选型时，考虑的一个重要因素。Mo
spring/hibernate/struts2常见异常总结白糖_ Hibernate
Spring ①ClassNotFoundException: org.aspectj.weaver.reflect.ReflectionWorld$ReflectionWorldException 缺少aspectjweaver.jar，该jar包常用于spring aop中 ②java.lang.ClassNotFoundException: org.sprin
jquery easyui表单重置(reset)扩展思路 bozch form jquery easyui reset
在jquery easyui表单中尚未提供表单重置的功能，这就需要自己对其进行扩展。扩展的时候要考虑的控件有： combo,combobox,combogrid,combotree,datebox,datetimebox 需要对其添加reset方法，reset方法就是把初始化的值赋值给当前的组件，这就需要在组件的初始化时将值保存下来。在所有的reset方法添加完毕之后，就需要对fo
编程之美-烙饼排序 bylijinnan 编程之美
package beautyOfCoding; import java.util.Arrays; /* *《编程之美》的思路是：搜索+剪枝。有点像是写下棋程序：当前情况下，把所有可能的下一步都做一遍；在这每一遍操作里面，计算出如果按这一步走的话，能不能赢（得出最优结果）。 *《编程之美》上代码有很多错误，且每个变量的含义令人费解。因此我按我的理解写了以下代码： */
Struts1.X 源码分析之ActionForm赋值原理 chenbowen00 struts
struts1在处理请求参数之前，首先会根据配置文件action节点的name属性创建对应的ActionForm。如果配置了name属性，却找不到对应的ActionForm类也不会报错，只是不会处理本次请求的请求参数。如果找到了对应的ActionForm类，则先判断是否已经存在ActionForm的实例，如果不存在则创建实例，并将其存放在对应的作用域中。作用域由配置文件action节点的s
[空天防御与经济]在获得充足的外部资源之前,太空投资需有限度 comsci 资源
这里有一个常识性的问题: 地球的资源,人类的资金是有限的,而太空是无限的..... 就算全人类联合起来,要在太空中修建大型空间站,也不一定能够成功,因为资源和资金,技术有客观的限制.... &
ORACLE临时表—ON COMMIT PRESERVE ROWS daizj oracle 临时表
ORACLE临时表转临时表：像普通表一样，有结构，但是对数据的管理上不一样，临时表存储事务或会话的中间结果集，临时表中保存的数据只对当前会话可见，所有会话都看不到其他会话的数据，即使其他会话提交了，也看不到。临时表不存在并发行为，因为他们对于当前会话都是独立的。创建临时表时，ORACLE只创建了表的结构（在数据字典中定义），并没有初始化内存空间，当某一会话使用临时表时，ORALCE会
基于Nginx XSendfile+SpringMVC进行文件下载 denger 应用服务器 Web nginx 网络应用 lighttpd
在平常我们实现文件下载通常是通过普通 read-write方式，如下代码所示。 @RequestMapping("/courseware/{id}") public void download(@PathVariable("id") String courseID, HttpServletResp
scanf接受char类型的字符 dcj3sjt126com c
/* 2013年3月11日22:35:54 目的：学习char只接受一个字符 */ # include <stdio.h> int main(void) { int i; char ch; scanf("%d", &i); printf("i = %d\n", i); scanf("%
学编程的价值 dcj3sjt126com 编程
发一个人会编程, 想想以后可以教儿女, 是多么美好的事啊, 不管儿女将来从事什么样的职业, 教一教, 对他思维的开拓大有帮助像这位朋友学习: http://blog.sina.com.cn/s/articlelist_2584320772_0_1.html VirtualGS教程 (By @林泰前): 几十年的老程序员，资深的
二维数组（矩阵）对角线输出飞天奔月二维数组
今天在BBS里面看到这样的面试题目, 1，二维数组（N*N），沿对角线方向，从右上角打印到左下角如N=4： 4*4二维数组 { 1 2 3 4 } { 5 6 7 8 } { 9 10 11 12 } {13 14 15 16 } 打印顺序 4 3 8 2 7 12 1 6 11 16 5 10 15 9 14 13 要
Ehcache（08）——可阻塞的Cache——BlockingCache 234390216 并发 ehcache BlockingCache 阻塞
可阻塞的Cache—BlockingCache 在上一节我们提到了显示使用Ehcache锁的问题，其实我们还可以隐式的来使用Ehcache的锁，那就是通过BlockingCache。BlockingCache是Ehcache的一个封装类，可以让我们对Ehcache进行并发操作。其内部的锁机制是使用的net.
mysqldiff对数据库间进行差异比较 jackyrong mysqld
mysqldiff该工具是官方mysql-utilities工具集的一个脚本，可以用来对比不同数据库之间的表结构，或者同个数据库间的表结构如果在windows下，直接下载mysql-utilities安装就可以了，然后运行后，会跑到命令行下： 1）基本用法 mysqldiff --server1=admin:12345
spring data jpa 方法中可用的关键字 lawrence.li java spring
spring data jpa 支持以方法名进行查询/删除/统计。查询的关键字为find 删除的关键字为delete/remove (>=1.7.x) 统计的关键字为count (>=1.7.x) 修改需要使用@Modifying注解 @Modifying @Query("update User u set u.firstna
Spring的ModelAndView类 nicegege spring
项目中controller的方法跳转的到ModelAndView类，一直很好奇spring怎么实现的？ /* * Copyright 2002-2010 the original author or authors. * * Licensed under the Apache License, Version 2.0 (the "License"); * yo
搭建 CentOS 6 服务器(13) - rsync、Amanda rensanning centos
（一）rsync Server端 # yum install rsync # vi /etc/xinetd.d/rsync service rsync { disable = no flags = IPv6 socket_type = stream wait
Learn Nodejs 02 toknowme nodejs
（1）npm是什么 npm is the package manager for node 官方网站：https://www.npmjs.com/ npm上有很多优秀的nodejs包，来解决常见的一些问题，比如用node-mysql，就可以方便通过nodejs链接到mysql，进行数据库的操作在开发过程往往会需要用到其他的包，使用npm就可以下载这些包来供程序调用 &nb
Spring MVC 拦截器 xp9802 spring mvc
Controller层的拦截器继承于HandlerInterceptorAdapter HandlerInterceptorAdapter.java 1 public abstract class HandlerInterceptorAdapter implements HandlerIntercep