import chromadb.utils.embedding_functions as embedding_functions
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
api_key="your api key",
api_base="https://open.bigmodel.cn/api/paas/v4",
model_name="embedding-3"
)
import chromadb
client = chromadb.Client()
collection = client.get_or_create_collection(name="my_collection", embedding_function=openai_ef)
collection.count()
0
collection.add(
documents=["lorem ipsum...", "doc2", "doc3"],
metadatas=[{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}, {"chapter": "29", "verse": "11"}],
ids=["id1", "id2", "id3"]
)
collection.get(
ids=["id1", "id2", "id3"]
)
{'ids': ['id1', 'id2', 'id3'],
'embeddings': None,
'documents': ['lorem ipsum...', 'doc2', 'doc3'],
'uris': None,
'data': None,
'metadatas': [{'chapter': '3', 'verse': '16'},
{'chapter': '3', 'verse': '5'},
{'chapter': '29', 'verse': '11'}],
'included': [,
]}
collection.update(
ids=["id1", "id2", "id3"],
metadatas=[{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}, {"chapter": "29", "verse": "11"}],
documents=["doc1", "doc2", "doc3"]
)
collection.get(
ids=["id1", "id2", "id3"]
)
{'ids': ['id1', 'id2', 'id3'],
'embeddings': None,
'documents': ['doc1', 'doc2', 'doc3'],
'uris': None,
'data': None,
'metadatas': [{'chapter': '3', 'verse': '16'},
{'chapter': '3', 'verse': '5'},
{'chapter': '29', 'verse': '11'}],
'included': [,
]}
collection.delete(
ids=["id1"]
)
collection.get(
ids=["id1"]
)
{'ids': [],
'embeddings': None,
'documents': [],
'uris': None,
'data': None,
'metadatas': [],
'included': [,
]}
import chromadb.utils.embedding_functions as embedding_functions
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
api_key="your api key",
api_base="https://open.bigmodel.cn/api/paas/v4",
model_name="embedding-3"
)
解释:
embedding_functions.OpenAIEmbeddingFunction
用于创建 chatGLM 的文本嵌入函数,API 请求会发送到 https://open.bigmodel.cn/api/paas/v4
,使用 embedding-3
作为嵌入模型。api_key
需要替换为你的实际 API 密钥。import chromadb
client = chromadb.Client()
collection = client.get_or_create_collection(name="my_collection", embedding_function=openai_ef)
解释:
chromadb.Client()
创建一个 ChromaDB 客户端。client.get_or_create_collection()
用于获取或创建一个名为 "my_collection"
的集合,并使用 openai_ef
作为嵌入函数。collection.count()
输出:
0
解释:
collection.count()
返回集合中的文档数量。0
。collection.add(
documents=["lorem ipsum...", "doc2", "doc3"],
metadatas=[{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}, {"chapter": "29", "verse": "11"}],
ids=["id1", "id2", "id3"]
)
解释:
documents
:存储要添加的文档内容,如 "lorem ipsum..."
。metadatas
:为每个文档添加元数据,例如 "chapter"
和 "verse"
。ids
:为每个文档指定唯一的 id
(id1
,id2
,id3
)。collection.get(
ids=["id1", "id2", "id3"]
)
输出:
{
'ids': ['id1', 'id2', 'id3'],
'embeddings': None,
'documents': ['lorem ipsum...', 'doc2', 'doc3'],
'uris': None,
'data': None,
'metadatas': [{'chapter': '3', 'verse': '16'},
{'chapter': '3', 'verse': '5'},
{'chapter': '29', 'verse': '11'}],
'included': ['documents', 'metadatas']
}
解释:
documents
:返回存储的文档内容。metadatas
:返回对应的元数据。ids
:返回请求的 id
。embeddings
为空,因为当前存储时没有计算嵌入(可能是 OpenAI API 限制或者设置问题)。collection.update(
ids=["id1", "id2", "id3"],
metadatas=[{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}, {"chapter": "29", "verse": "11"}],
documents=["doc1", "doc2", "doc3"]
)
解释:
id1
对应的 document
从 "lorem ipsum..."
变为 "doc1"
。id
保持不变。collection.get(
ids=["id1", "id2", "id3"]
)
输出:
{
'ids': ['id1', 'id2', 'id3'],
'embeddings': None,
'documents': ['doc1', 'doc2', 'doc3'],
'uris': None,
'data': None,
'metadatas': [{'chapter': '3', 'verse': '16'},
{'chapter': '3', 'verse': '5'},
{'chapter': '29', 'verse': '11'}],
'included': ['documents', 'metadatas']
}
解释:
id1
的 document
成功更新为 "doc1"
,其他数据未改变。collection.delete(
ids=["id1"]
)
解释:
id1
对应的文档。collection.get(
ids=["id1"]
)
输出:
{
'ids': [],
'embeddings': None,
'documents': [],
'uris': None,
'data': None,
'metadatas': [],
'included': ['documents', 'metadatas']
}
解释:
id1
的数据已经被删除,因此返回空列表。本代码展示了如何使用 ChromaDB 进行以下操作:
这样,你可以使用 ChromaDB 作为一个轻量级的向量数据库,结合 chatGLM 的嵌入模型进行信息存储和查询。