AI Agent(十一)-Camel基于AI的图像内容识别

AI Agent系列【十一】

  • 一. Camel库函数修复
  • 二、代码实现


一. Camel库函数修复

对于camel-ai 版本为0.2.22的安装包程序,base_model中函数 preprocess_messages ,此函数的作用是对消息列表进行预处理,主要目的是在将消息发送到模型 API 之前,移除消息中的“思考内容”(thinking content),并执行其他模型特定的预处理操作。
需要修改的文件地址为:…Lib\site-packages\camel\models
需要修改文件中的:
base_model.py函数的119行,修改为如下:

extract_text_content(msg.get('content', '')),

并在113行前增加如下函数:

        def extract_text_content(content):
            """Extract text content from a message content field."""
            if isinstance(content, str):
                return content
            elif isinstance(content, list):
                # 提取所有文本部分并拼接
                text_parts = []
                for item in content:
                    if isinstance(item, dict) and item.get('type') == 'text':
                        text_parts.append(item.get('text', ''))
                return ' '.join(text_parts)
            else:
                return ''

二、代码实现

通过如下的代码可以实现对图片的解析。

import argparse

from PIL import Image

from camel.agents import ChatAgent
from camel.generators import PromptTemplateGenerator
from camel.messages import BaseMessage
from camel.models import ModelFactory
from camel.types import (
    ModelPlatformType,
    ModelType,
    RoleType,
    TaskType,
)

parser = argparse.ArgumentParser(description="Arguments for object detection.")
parser.add_argument(
    "--image_paths",
    metavar='N',
    type=str,
    nargs='+',
    help="Path to the images for object detection.",
    default=[r'C:\Users\Administrator\Pictures\Test.jpg'],
    required=False,
)


def detect_image_obj(image_paths: str) -> None:
    sys_msg = PromptTemplateGenerator().get_prompt_from_key(
        TaskType.OBJECT_RECOGNITION, RoleType.ASSISTANT
    )
    print("=" * 20 + " SYS MSG " + "=" * 20)
    print(sys_msg)
    print("=" * 49)

    # 3. 加载图片
    image_paths = [r'C:\Users\Administrator\Pictures\test.jpeg']
    image_list = [Image.open(image_path) for image_path in image_paths]

    model = ModelFactory.create(
        model_platform=ModelPlatformType.OPENAI_COMPATIBLE_MODEL,
        model_type="deepseek-chat",
        url= "https://api.deepseek.com",
        api_key='API-Token'
    )

    agent = ChatAgent(
        model=model,
        output_language='中文'
    )


    user_msg = BaseMessage.make_user_message(
        role_name="User",
        content="请描述下图片内容!",
        image_list=image_list,
        image_detail="low",
    )
    # 打印调试信息
    '''
    print("User Message Content:", user_msg.content)
    print("Image List:", user_msg.image_list)
    print("First Image Type:", type(image_list[0]))  # 调试信息
    print("Image Detail:", user_msg.image_detail)
    '''

    try:
        assistant_response = agent.step(user_msg)
        print("=" * 20 + " RESULT " + "=" * 20)
        print(assistant_response.msgs[0].content)
        print("=" * 48)
    except Exception as e:
        print("模型调用失败,错误信息:", str(e))
        raise


def main(args: argparse.Namespace) -> None:
    detect_image_obj(args.image_paths)

if __name__ == "__main__":
    args = parser.parse_args()
    main(args=args)

你可能感兴趣的:(人工智能,人工智能,AI,Agent)