Elasticsearch 向量相似搜索的原理涉及使用密集向量(dense vector)来表示文档,并通过余弦相似性度量来计算文档之间的相似性。以下是 Elasticsearch 向量相似搜索的基本原理:
向量表示文档:
向量存储到 Elasticsearch:
dense_vector
类型的字段进行存储。查询向量表示:
相似性计算:
返回排序的结果:
脚本评分(Script Score):
1. 安装 Elasticsearch 8.X, 如下是docker-compose.yml:
version: '2.2'
services:
elasticsearch:
container_name: es01
image: docker.elastic.co/elasticsearch/elasticsearch:8.11.3
environment:
- discovery.type=single-node
- xpack.security.enabled=false
ports:
- "8088:9200"
volumes:
- ./elasticsearch-data:/usr/share/elasticsearch/data
mem_limit: 2g
networks:
- my-network
restart: always
networks:
my-network:
name: my-network-1
2.安装后测试一下Elasticsearch:
http://localhost:8088/_cat/health?v
3. 创建索引映射:
在 Postman 中,使用 HTTP PUT 请求创建索引及其映射。假设您的 Elasticsearch 服务运行在 http://localhost:9200
,创建名为 your_index
的索引:
发送请求以创建索引映射。
请求类型:PUT
URL:http://localhost:9200/your_index
Body(选择 raw
和 JSON (application/json)
):
{
"mappings": {
"properties": {
"text": {
"type": "text"
},
"embedding": {
"type": "dense_vector",
"dims": 768 // 替换为实际的嵌入向量维度
}
}
}
}
4. 插入文档:
请求类型:POST
URL:http://localhost:9200/your_index/_doc/1
Body(选择 raw
和 JSON (application/json)
)
{
"text": "淄博新建的一座占地100亩的烧烤城在短短20天内建成,吸引了众多烧烤爱好者,如今“烤位”已是一位难求。",
"embedding": [
0.24153212,0.20880528,0.030148063,-0.53177595,-0.16311283,-0.48528185,0.8071734,-0.5603691,-0.034782775,-0.010840773,0.20591497,-0.190546,0.0939277,-0.31472996,0.41703156,-0.31428546,0.32904455,-0.1818271,0.0828045,0.2891722,-0.12507804,0.44376546,-0.10610913,0.2950189,0.34206498,0.54851073,0.33173296,-0.50768775,-0.22573504,0.09621267,1.1528952,-0.13125856,0.06805208,0.75444174,0.28983256,-0.058324914,0.029754816,0.28223705,0.017140139,-0.20847563,-0.3175143,-0.6432414,0.13734575,-0.34154043,-0.7852689,-0.7646187,-0.08415885,0.27589658,0.037415426,-0.111104995,-0.7493051,0.13488679,-0.0021623205,-0.4228744,-0.5692682,0.37095323,-0.17621705,-0.029115338,0.41395468,-0.36694804,-0.21973066,-0.0684685,-0.4107971,0.17953752,-0.6013466,0.4058221,0.088796705,0.39943227,-0.0005312811,-0.011339925,-0.20651253,0.113913804,0.0025909252,0.3519917,-0.34478262,0.45721626,-0.75878835,0.13280198,-0.09654277,0.5451904,-0.5389396,0.2736914,0.07034891,0.002583282,0.075424306,0.33698198,0.7679384,0.46068242,-0.08456434,0.5998018,0.2