用示例提升大语言模型的查询分析能力!

引言

随着查询分析的复杂性增加,大语言模型(LLM)在理解如何响应某些场景时可能会面临挑战。为了提高性能,我们可以在提示中添加示例,以便更好地引导模型。本文将详细介绍如何为我们在Quickstart中构建的LangChain YouTube视频查询分析器添加示例,以优化其响应准确性。

主要内容

设置环境

安装依赖项

我们需要安装langchain-corelangchain-openai库。

# %pip install -qU langchain-core langchain-openai

设置环境变量

我们将使用OpenAI,你需要配置API密钥。

import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()
# 使用API代理服务提高访问稳定性,可以根据需要配置API代理

查询架构定义

我们需要定义一个查询架构,让模型输出更结构化的数据。我们将增加一个sub_queries字段来包含从顶级问题中派生出的更具体的问题。

from typing import List, Optional
from langchain_core.pydantic_v1 import BaseModel, Field

sub_queries_description = """\
If the original question contains multiple distinct sub-questions, \
or if there are more generic questions that would be helpful to answer in \
order to answer the original question, write a list of all relevant sub-questions. \
Make sure this list is comprehensive and covers all parts of the original question. \
It's ok if there's redundancy in the sub-questions. \
Make sure the sub-questions are as narrowly focused as possible."""

class Search(BaseModel):
    """Search over a database of tutorial videos about a software library."""
    query: str = Field(
        ...,
        description="Primary similarity search query applied to video transcripts.",
    )
    sub_queries: List[str] = Field(
        default_factory=list, description=sub_queries_description
    )
    publish_year: Optional[int] = Field(None, description="Year video was published")

查询生成

定义提示模板和运行链

我们将使用OpenAI的功能调用,创建一个提示模板以及它的运行链。

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI

system = """You are an expert at converting user questions into database queries. \
You have access to a database of tutorial videos about a software library for building LLM-powered applications. \
Given a question, return a list of database queries optimized to retrieve the most relevant results.

If there are acronyms or words you are not familiar with, do not try to rephrase them."""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        MessagesPlaceholder("examples", optional=True),
        ("human", "{question}"),
    ]
)
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
structured_llm = llm.with_structured_output(Search)
query_analyzer = {"question": RunnablePassthrough()} | prompt | structured_llm

初步测试

我们不带任何示例来测试查询分析器:

query_analyzer.invoke(
    "what's the difference between web voyager and reflection agents? do both use langgraph?"
)
  • 输出的搜索结果:
Search(query='web voyager vs reflection agents', sub_queries=['difference between web voyager and reflection agents', 'do web voyager and reflection agents use langgraph'], publish_year=None)

代码示例

添加示例并调优提示

为提高查询生成效果,我们可以为提示添加输入问题和目标输出查询的示例。

import uuid
from typing import Dict

from langchain_core.messages import (
    AIMessage,
    BaseMessage,
    HumanMessage,
    SystemMessage,
    ToolMessage,
)

# 定义示例并生成相应的消息
def tool_example_to_messages(example: Dict) -> List[BaseMessage]:
    messages: List[BaseMessage] = [HumanMessage(content=example["input"])]
    openai_tool_calls = []
    for tool_call in example["tool_calls"]:
        openai_tool_calls.append(
            {
                "id": str(uuid.uuid4()),
                "type": "function",
                "function": {
                    "name": tool_call.__class__.__name__,
                    "arguments": tool_call.json(),
                },
            }
        )
    messages.append(
        AIMessage(content="", additional_kwargs={"tool_calls": openai_tool_calls})
    )
    tool_outputs = example.get("tool_outputs") or [
        "You have correctly called this tool."
    ] * len(openai_tool_calls)
    for output, tool_call in zip(tool_outputs, openai_tool_calls):
        messages.append(ToolMessage(content=output, tool_call_id=tool_call["id"]))
    return messages

examples = [
    {"input": "What's chat langchain, is it a langchain template?", "tool_calls": [Search(query="What is chat langchain and is it a langchain template?", sub_queries=["What is chat langchain", "What is a langchain template"])]},
    {"input": "How to build a multi-agent system and stream intermediate steps from it", "tool_calls": [Search(query="How to build multi-agent system and stream intermediate steps from it", sub_queries=["How to build multi-agent system", "How to stream intermediate steps from multi-agent system", "How to stream intermediate steps"])]}
]

example_msgs = [msg for ex in examples for msg in tool_example_to_messages(ex)]

# 更新提示模板以包含这些示例
query_analyzer_with_examples = (
    {"question": RunnablePassthrough()}
    | prompt.partial(examples=example_msgs)
    | structured_llm
)

# 测试包含示例的查询分析器
query_analyzer_with_examples.invoke(
    "what's the difference between web voyager and reflection agents? do both use langgraph?"
)

常见问题和解决方案

  1. 提示不够明确:如果模型的分解不够细化,可以通过增加更多详细的示例来提高。

  2. 响应时间较长:确保API服务的网络连接畅通,使用API代理服务可以提高访问稳定性。

  3. 示例管理复杂:使用自动化脚本来生成和管理示例是个好主意。

总结与进一步学习资源

本文介绍了如何通过添加示例来提高大语言模型在复杂查询分析中的表现。示例能够有效指导模型生成更精确的输出。进一步的提示工程和示例优化能帮助改善结果。

进一步学习资源

  • OpenAI API 文档
  • LangChain 介绍及文档

参考资料

  1. LangChain 官方文档
  2. OpenAI Python SDK 文档

如果这篇文章对你有帮助,欢迎点赞并关注我的博客。您的支持是我持续创作的动力!

—END—

你可能感兴趣的:(语言模型,windows,人工智能,python)