参考:
GitHub - mayooear/gpt4-pdf-chatbot-langchain: GPT4 & LangChain Chatbot for large PDF docs
使用新的GPT-4 api为多个大型PDF文件构建chatGPT聊天机器人。
使用的技术栈包括LangChain, Pinecone, Typescript, Openai和Next.js。LangChain是一个框架,可以更容易地构建可扩展的AI/LLM大语言模型应用程序和聊天机器人。Pinecone是一个矢量存储,用于存储嵌入和文本格式的PDF,以便以后检索类似的文档。
OpenAI API Key GPT-3.5或者GPT-4 openai
Pinecone API Key/Environment/Index pinecone
Pinecone Starter(免费)计划用户的Index在7天后被删除。为了防止这种情况,在7天之前向Pinecone发送API请求重置计数器。就可以继续免费使用了。
git clone https://github.com/mayooear/gpt4-pdf-chatbot-langchain.git
使用npm安装yarn,如果没有npm,参考安装
npm/Node.js介绍及快速安装 - Linux CentOS_Entropy-Go的博客-CSDN博客
npm install yarn -g
再使用yarn安装依赖包
进入项目根目录,执行命令
yarn install
安装成功后,可以看到 node_modules 目录
gpt4-pdf-chatbot-langchain-main$ ls -a
. declarations .eslintrc.json node_modules .prettierrc styles utils yarn.lock
.. docs .gitignore package.json public tailwind.config.cjs venv
components .env .idea pages README.md tsconfig.json visual-guide
config .env.example next.config.js postcss.config.cjs scripts types yarn-error.log
将.env.example复制成.env配置文件
OPENAI_API_KEY=sk-xxx
# Update these with your pinecone details from your dashboard.
# PINECONE_INDEX_NAME is in the indexes tab under "index name" in blue
# PINECONE_ENVIRONMENT is in indexes tab under "Environment". Example: "us-east1-gcp"
PINECONE_API_KEY=xxx
PINECONE_ENVIRONMENT=us-west1-gcp-free
PINECONE_INDEX_NAME=xxx
config/pinecone.ts修改
在config文件夹中,将PINECONE_NAME_SPACE替换为一个namespace,当你运行npm run ingest时,你想在这个namespace中存储嵌入到PINECONE_NAME_SPACE。这个namespace稍后将用于查询和检索。
修改聊天机器人的提示词和OpenAI模型
在utils/makechain.ts中为您自己的用例更改QA_PROMPT。
如果您可以访问gpt-4 api,请将新OpenAI中的modelName更改为gpt-4。请在此repo之外验证您是否可以访问gpt-4 api,否则应用程序将无法工作。
import { OpenAI } from 'langchain/llms/openai';
import { PineconeStore } from 'langchain/vectorstores/pinecone';
import { ConversationalRetrievalQAChain } from 'langchain/chains';
const CONDENSE_PROMPT = `Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.
Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:`;
const QA_PROMPT = `You are a helpful AI assistant. Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say you don't know. DO NOT try to make up an answer.
If the question is not related to the context, politely respond that you are tuned to only answer questions that are related to the context.
{context}
Question: {question}
Helpful answer in markdown:`;
export const makeChain = (vectorstore: PineconeStore) => {
const model = new OpenAI({
temperature: 0, // increase temepreature to get more creative answers
modelName: 'gpt-3.5-turbo', //change this to gpt-4 if you have access
});
const chain = ConversationalRetrievalQAChain.fromLLM(
model,
vectorstore.asRetriever(),
{
qaTemplate: QA_PROMPT,
questionGeneratorTemplate: CONDENSE_PROMPT,
returnSourceDocuments: true, //The number of source documents returned is 4 by default
},
);
return chain;
};
因为会和OpenAI和Pinecone有数据交互,建议上传文档之前,慎重考虑数据隐私和安全。
将1个或多个PDF文档上传到 docs 目录下
执行上传命令
npm run ingest
在Pinecone上检查是否上传成功
当你验证了嵌入和内容已经成功地添加到你的Pinecone中,你可以运行应用程序npm run dev来启动本地开发环境,然后在聊天界面中输入一个问题,进行对话。
执行命令:
npm run dev
https://github.com/mayooear/gpt4-pdf-chatbot-langchain#troubleshooting
In general, keep an eye out in the issues
and discussions
section of this repo for solutions.
General errors
node -v
Console.log
the env
variables and make sure they are exposed..env
file that contains your valid (and working) API keys, environment and index name.modelName
in OpenAI
, make sure you have access to the api for the appropriate model.env
file from the project will be overwritten by systems env
variable.process.env
variables if there are still issues.Pinecone errors
environment
and index
matches the one in the pinecone.ts
and .env
files.1536
.