其实,你们可能已经猜到了,我要讲的正是AI大模型中的比较最常见的使用结构:Agent + Tool。
“Agent"这个词在计算机科学中有着悠久的历史。在早期,它被用于形容一种代理或者媒介,承担着在网络世界与现实世界之间建立桥梁的重任。以浏览器的"User Agent"为例,它实际上是浏览器在与网络服务器进行交互时,声明自己身份的一种方式。而在这个过程中,浏览器可以被看作是用户(User)和互联网内容(Web content)之间的"Agent”。
了解完Agent + Tool 的背景之后,我们下面切入正题看看LangChain中的Agent和Tool是如何运作的。
import os
from subprocess import Popen, PIPE
from langchain.llms import OpenAI
from langchain.tools import StructuredTool
from langchain.agents import initialize_agent, AgentType
def ssh(command:str, host: str, username: str = "root") -> str:
"""A tool that can connect to a remote server and execute commands to retrieve returned content."""
return os.popen(f"ssh {host} -l{username} '{command}'").read()
agent = initialize_agent(
agent.run("帮我看一下 这台机器运行多久了")
> Entering new AgentExecutor chain...
"action": "ssh",
"action_input": {
"command": "uptime",
"host": ""
Observation: 15:48:44 up 25 days, 41 min, 0 users, load average: 1.04, 1.48, 2.20
Thought: I have the answer
"action": "Final Answer",
"action_input": "This machine has been running for 25 days and 41 minutes."
> Finished chain.
,那是不是执行rm -rf /
System: Respond to the human as helpfully and accurately as possible. You have access to the following tools:
ssh: ssh(command: str, host: str, username: str = 'root') -> str - A tool that can connect to a remote server and execute commands to retrieve returned content., args: {{'command': {{'title': 'Command', 'type': 'string'}}, 'host': {{'title': 'Host', 'type': 'string'}}, 'username': {{'title': 'Username', 'default': 'root', 'type': 'string'}}}}
Use a json blob to specify a tool by providing an action key (tool name) and an action_input key (tool input).
Valid "action" values: "Final Answer" or ssh
Provide only ONE action per $JSON_BLOB, as shown:
"action": $TOOL_NAME,
"action_input": $INPUT
Follow this format:
Question: input question to answer
Thought: consider previous and subsequent steps
Observation: action result
... (repeat Thought/Action/Observation N times)
Thought: I know what to respond
"action": "Final Answer",
"action_input": "Final response to human"
Begin! Reminder to ALWAYS respond with a valid json blob of a single action. Use tools if necessary. Respond directly if appropriate. Format is Action:```$JSON_BLOB```then Observation:.
Human: 帮我看一下 这台机器运行多久了
"action": "ssh",
"action_input": {
"command": "uptime",
"host": ""
System: Respond to the human as helpfully and accurately as possible. You have access to the following tools:
ssh: ssh(command: str, host: str, username: str = 'root') -> str - A tool that can connect to a remote server and execute commands to retrieve returned content., args: {{'command': {{'title': 'Command', 'type': 'string'}}, 'host': {{'title': 'Host', 'type': 'string'}}, 'username': {{'title': 'Username', 'default': 'root', 'type': 'string'}}}}
Use a json blob to specify a tool by providing an action key (tool name) and an action_input key (tool input).
Valid "action" values: "Final Answer" or ssh
Provide only ONE action per $JSON_BLOB, as shown:
"action": $TOOL_NAME,
"action_input": $INPUT
Follow this format:
Question: input question to answer
Thought: consider previous and subsequent steps
Observation: action result
... (repeat Thought/Action/Observation N times)
Thought: I know what to respond
"action": "Final Answer",
"action_input": "Final response to human"
Begin! Reminder to ALWAYS respond with a valid json blob of a single action. Use tools if necessary. Respond directly if appropriate. Format is Action:```$JSON_BLOB```then Observation:.
Human: 帮我看一下 这台机器运行多久了
This was your previous work (but I haven't seen any of it! I only see what you return as final answer):
"action": "ssh",
"action_input": {
"command": "uptime",
"host": ""
Observation: 16:38:18 up 25 days, 1:30, 0 users, load average: 0.81, 0.79, 1.06
I can provide the human with the uptime of the machine
"action": "Final Answer",
"action_input": "The machine has been running for 25 days, 1 hour and 30 minutes."
Valid "action" values: "Final Answer" or ssh
,也就说要么直接给答案,要么寻求工具的帮助。action: ssh
,这样框架层就能知道这是要去调用哪个工具了,于是就把参数传给那个工具,并将执行之后的结果返回 追加到第二次的调用的尾部。Final Answer
看到这里,有些同学会有疑问,为什么我自己用提示词(Prompt)去问就没法形成这样的调用工具式的交互,为什么用了框架就能达到这个效果。这就是ReAct,ReAct是Reasoning and Acting缩写,意思是大模型可以根据逻辑推理(Reason),构建完整系列行动(Act),从而达成期望目标。ReAct方式的关键就是协调大语言模型和外部的信息获取,与其他功能交互:大模型是大脑,通过ReAct框架可以让大脑来控制手和脚。
思考(Thought): 思考是由大模型创建的,为其行为和决定提供理论支撑。我们可以通过分析大模型的思考过程,来评估其即将采取的行动是否符合逻辑。它作为一个关键指标,能够帮助我们判断其决策的合理性。相比于人类的决策,Thought的存在赋予了大模型更出色的可解释性和可信度。
行动(Act): 行动代表大模型认为需要采取的具体行为。行动一般由两个部分构成:动作和目标,这在编程中对应着API名称和其输入参数。大模型的一大优点在于,它可以根据思考的结果,选择合适的API并生成所需的参数。这确保了ReAct框架在执行方面的实用性。
观察(Obs): 观察代表大模型如何获取外部输入。它就像大模型的感知系统,将环境的反馈信息同步给大模型,帮助它进一步进行分析或者决策。
from typing import Dict
import sys
import traceback
from io import StringIO
from contextlib import redirect_stdout, redirect_stderr
from langchain.llms import OpenAI
from langchain.tools import StructuredTool
from langchain.agents import initialize_agent, AgentType
def python(code_str: str, return_context: bool=False) -> Dict:
"""This is a Python execution tool. You can input a piece of Python code, and it will return the corresponding execution results. By default, it returns the first 1000 characters of both stdout and stderr. It's recommended to use the print() function to directly display the results."""
# 为输出和错误创建StringIO对象,以便我们可以捕获它们
stdout = StringIO()
stderr = StringIO()
return_head = 1000
context = {}
# 重定向stdout和stderr,执行代码
with redirect_stdout(stdout), redirect_stderr(stderr):
exec(code_str, context)
except Exception:
# 获取执行后的stdout, stderr和context
stdout_value = stdout.getvalue()[0:return_head]
stderr_value = stderr.getvalue()[0:return_head]
if return_context == True:
return {"stdout": stdout_value, "stderr": stderr_value, "context": context}
return {"stdout": stdout_value, "stderr": stderr_value, "context": {}}
agent = initialize_agent([StructuredTool.from_function(python)], OpenAI(temperature=0.3), agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
resp = agent.run("帮我利用工具总结一下 https://elastic.aiops.work/blog/feed 这个xml的博客订最近有什么更新")
print("=== resp ===")
> Entering new AgentExecutor chain...
Thought: The user is asking for a summary of updates from an XML blog feed. I will need to fetch the XML from the provided URL and parse it to extract the relevant information. Python has several libraries that can help with this, such as 'requests' for fetching the data and 'xml.etree.ElementTree' for parsing the XML. I'll write a Python script to do this and use the execute_python tool to run it.
"action": "python",
"action_input": {
"code_str": "import requests\nfrom xml.etree import ElementTree as ET\n\n# Fetch the XML data\nresponse = requests.get('https://elastic.aiops.work/blog/feed')\n\n# Parse the XML\nroot = ET.fromstring(response.content)\n\n# Extract the relevant information\nupdates = [{'title': item.find('title').text, 'link': item.find('link').text, 'pubDate': item.find('pubDate').text} for item in root.findall('.//item')]\n\n# Print the updates\nfor update in updates:\n print(f"Title: {update['title']}\nLink: {update['link']}\nPublished Date: {update['pubDate']}\n")",
"return_context": false
Observation: {'stdout': '', 'stderr': 'Traceback (most recent call last):\n File "/Users/****/test.py", line 22, in python\n exec(code_str, context)\n File "", line 15\n print(f"Title: {update['title']}\n^\nSyntaxError: EOL while scanning string literal\n', 'context': {}}
Thought:The Python code failed to execute due to a SyntaxError. It seems like the print statement in the code is not properly formatted. I need to correct the print statement and try again.
"action": "python",
"action_input": {
"code_str": "import requests\nfrom xml.etree import ElementTree as ET\n\n# Fetch the XML data\nresponse = requests.get('https://elastic.aiops.work/blog/feed')\n\n# Parse the XML\nroot = ET.fromstring(response.content)\n\n# Extract the relevant information\nupdates = [{'title': item.find('title').text, 'link': item.find('link').text, 'pubDate': item.find('pubDate').text} for item in root.findall('.//item')]\n\n# Print the updates\nfor update in updates:\n print(f"Title: {update['title']}\nLink: {update['link']}\nPublished Date: {update['pubDate']}\n")",
"return_context": false
Observation: {'stdout': 'Title: Being one of the only woman in her computer science program didn’t stop Hannah Mudge from pursuing her dream of becoming a software engineer\nLink: https://www.elastic.co/blog/culture-hannah-mudge-dream-of-becoming-software-engineer\nPublished Date: Fri, 07 Jul 2023 00:00:00 GMT\n\nTitle: Accelerating R&D in pharma with Elasticsearch, ESRE, LLMs, and LangChain — Part 1\nLink: https://www.elastic.co/blog/research-development-pharma-elasticsearch-esre-llms-langchain-1\nPublished Date: Fri, 07 Jul 2023 00:00:00 GMT\n\nTitle: Being one of the only woman in her computer science program didn’t stop Hannah Mudge from pursuing her dream of becoming a software engineer\nLink: https://www.elastic.co/blog/culture-hannah-mudge-dream-of-becoming-software-engineer\nPublished Date: Fri, 07 Jul 2023 00:00:00 GMT\n\nTitle: Accelerating R&D in pharma with Elasticsearch, ESRE, LLMs, and LangChain — Part 1\nLink: https://www.elastic.co/blog/research-development-pharma-elasticsearch-esre-llms-langchain-1\nPublish', 'stderr': '', 'context': {}}
Thought:The Python code executed successfully and returned the recent updates from the XML blog feed. The updates include the title, link, and published date of the blog posts. I will now format this information into a readable format for the user.
"action": "Final Answer",
"action_input": "Here are the recent updates from the blog:\n\n1. Title: Being one of the only woman in her computer science program didn’t stop Hannah Mudge from pursuing her dream of becoming a software engineer\nLink: https://www.elastic.co/blog/culture-hannah-mudge-dream-of-becoming-software-engineer\nPublished Date: Fri, 07 Jul 2023 00:00:00 GMT\n\n2. Title: Accelerating R&D in pharma with Elasticsearch, ESRE, LLMs, and LangChain — Part 1\nLink: https://www.elastic.co/blog/research-development-pharma-elasticsearch-esre-llms-langchain-1\nPublished Date: Fri, 07 Jul 2023 00:00:00 GMT"
> Finished chain.
=== resp ===
Here are the recent updates from the blog:
1. Title: Being one of the only woman in her computer science program didn’t stop Hannah Mudge from pursuing her dream of becoming a software engineer
Link: https://www.elastic.co/blog/culture-hannah-mudge-dream-of-becoming-software-engineer
Published Date: Fri, 07 Jul 2023 00:00:00 GMT
2. Title: Accelerating R&D in pharma with Elasticsearch, ESRE, LLMs, and LangChain — Part 1
Link: https://www.elastic.co/blog/research-development-pharma-elasticsearch-esre-llms-langchain-1
Published Date: Fri, 07 Jul 2023 00:00:00 GMT
The Python code failed to execute due to a SyntaxError. It seems like the print statement in the code is not properly formatted. I need to correct the print statement and try again.
# 第一次代码
import requests
from xml.etree import ElementTree as ET
# Fetch the XML data
response = requests.get('https://elastic.aiops.work/blog/feed')
# Parse the XML
root = ET.fromstring(response.content)
# Extract the relevant information
updates = [{'title': item.find('title').text, 'link': item.find('link').text, 'pubDate': item.find('pubDate').text} for item in root.findall('.//item')]
# Print the updates
for update in updates:
print(f"Title: {update['title']}
Link: {update['link']}
Published Date: {update['pubDate']}
# 第二次代码
import requests
from xml.etree import ElementTree as ET
# Fetch the XML data
response = requests.get('https://elastic.aiops.work/blog/feed')
# Parse the XML
root = ET.fromstring(response.content)
# Extract the relevant information
updates = [{'title': item.find('title').text, 'link': item.find('link').text, 'pubDate': item.find('pubDate').text} for item in root.findall('.//item')]
# Print the updates
for update in updates:
print(f"Title: {update['title']}\nLink: {update['link']}\nPublished Date: {update['pubDate']}\n")
看到这里,是不是有些同学已经跃跃欲试,想要去试两把了,下面我们来讲讲在普通的编程中如何融合Agent 和 Tool。
在分布式计算领域,通过MapReduce这样的编程范式,让分布式计算变得简单,不需要了解分布式通信同步原理,写完mapper和reducer,就能在上千台服务器的集群上运行程序,还不用担心出现机器故障等各问题,而今mapper和reducer也不用写了,一个SELECT SQL下去就会自动被拆成若干个mapper和reducer去运行。
这时候,再让我们看看我们上文制作出来“工具人”,是不是就可以直接被安排进MapReduce模型里面? 乃至于这个“工具人”如果放在高层成为决策层,也没什么大问题?毕竟这个“工具人”的经验和知识可能都远超一些职业经理人。
def hello_world():
return 'Hello, World!'
那么 Agent 和 Tool 是不是也可以用装饰器,进入到我们编程中,是的,langchain也提供了 @tool
与langchain的相同,在一个函数外包了之后,就可以把一个函数转变成tool@agent(tools=[...], llm=..., ...)
import os
import sys
from subprocess import Popen, PIPE
sys.path.insert(0, os.path.split(os.path.realpath(__file__))[0] + "/../../")
from aibond import AI
from langchain import OpenAI
ai = AI()
def popen(command):
child = Popen(command, stdin = PIPE, stdout = PIPE, stderr = PIPE, shell = True)
out, err = child.communicate()
ret = child.wait()
return (ret, out.strip(), err.strip())
def k8sLabel(name: str, kind: str, namespace: str) -> str:
"""This tool can fetch the labels of Kubernetes objects."""
cmd = "kubectl get " + kind + " " + name + " -n " + namespace + " -o jsonpath='{.metadata.labels}'"
(ret, out, err) = popen(cmd)
return out
def k8sServiceSelectorList(namespace: str) -> str:
"""This tool can find all services within a namespace in Kubernetes and retrieve the label selectors for each service."""
cmd = "kubectl get svc -n " + namespace + " -o jsonpath="{range .items[*]}{@.metadata.name}:{@.spec.selector}{'\n'}{end}""
(ret, out, err) = popen(cmd)
return out
@ai.agent(tools=["k8sLabel", "k8sServiceSelectorList"], llm=OpenAI(temperature=0.2), verbose=True)
def k8sPodServiceFinder(name: str, namespace: str) -> str:
"""This tool can find the services associated with a Kubernetes pod resource."""
return f"帮我列出 {namespace} 这个ns下所有的service,在这个service list中找出与 pod {name} 的label相关的service,返回的结果只有service的名称即可"
a = ai.run("使用所有的工具去查找sreworks这个ns下 prod-health-health-6cbc46567-s6dqp 这个pod的关联的k8s资源", llm=OpenAI(temperature=0.2), agents=["k8sPodServiceFinder"], verbose=True)
上面的这些装饰器是一些语法糖组成的框架,核心部分是langchain,有兴趣想动手实践的同学可以参考框架代码 https://github.com/alibaba/sreworks-ext/tree/master/aibond
def demo_class_tool(func: str, args: Dict, instance_id: str = None) -> Dict:
This is a tool that requires instantiation. You can first call the '__init__' function to instantiate, this call will return an 'instance_id'. Subsequently, you can use this 'instance_id' to continue operating on this instance.
Below are the available funcs for this tool:
- func: __init__ args: {{'url': {{'title': 'Url', 'type': 'string'}} }}
- func: read args: {{'limit': {{'title': 'Limit', 'type': 'intger', 'default': '1000'}}
拿到实例化后的 instance_id
import os
import sys
import paramiko
sys.path.insert(0, os.path.split(os.path.realpath(__file__))[0] + "/../../")
from aibond import AI
from tools.Linux.SSH import SshClient
from langchain import OpenAI
ai = AI()
class SshClient():
"""A tool that can connect to a remote server and execute commands to retrieve returned content."""
_client = None
def __init__(self, host: str, username: str = "root", password: str = None):
self._client = paramiko.SSHClient()
self._client.connect(host, username=username, password=password)
def exec_command(self, command: str) -> Dict:
stdin, stdout, stderr = self._client.exec_command(command)
retcode = stdout.channel.recv_exit_status()
output_stdout = stdout.read().decode('utf-8')
output_stderr = stderr.read().decode('utf-8')
stdin = None
stdout = None
stderr = None
return {"stdout": output_stdout, "stderr": output_stderr, "exitStatus": retcode}
resp = ai.run("帮我看看 这台机器启动了多久了", llm=OpenAI(temperature=0.2), tools=[SshClient], verbose=True)
print("=== resp ===")
来执行命令。我们来看看大模型能否理解。> Entering new AgentExecutor chain...
"action": "SshClient",
"action_input": {
"sub_func": "__init__",
"sub_args": {
"host": "",
"username": "root",
"password": ""
Observation: {'instance_id': 'cbbb660c0bc3'}
Thought: I need to use the instance_id to execute a command
"action": "SshClient",
"action_input": {
"sub_func": "exec_command",
"sub_args": {
"command": "uptime"
"instance_id": "cbbb660c0bc3"
Observation: {'stdout': ' 23:18:55 up 25 days, 8:11, 0 users, load average: 0.29, 0.55, 0.84\n', 'stderr': '', 'exitStatus': 0}
Thought: I have the answer
"action": "Final Answer",
"action_input": "This machine has been up for 25 days, 8 hours, and 11 minutes."
> Finished chain.
=== resp ===
This machine has been up for 25 days, 8 hours, and 11 minutes.
有关类对象tool调用的功能实现,已经推送到了框架中,欢迎大家试用 https://github.com/alibaba/sreworks-ext/blob/master/aibond/aibond/core.py
是的,如前文所说,我们的“工具人”就能实现这一点。上文的例子让我们看到了一个类对象也能变成一个tool,供agent使用。那么在这个类对象,里面我们是不是能嵌入大模型驱动,让其真正地活起来?使得 agent 调用 agent 的时候,不仅只是要一个结果,更像是一种面向结果的沟通?
●《ReAct: Synergizing Reasoning and Acting in Language Models》https://arxiv.org/abs/2210.03629