分析Elasticsearch Index文件是如何存储的?
主要是想看一下FST文件是以什么粒度创建的?
首先通过kibana找一个索引的shard,此处咱们就以logstash-2023.05.30索引为例
查看下shard分布情况
GET /_cat/shards/logstash-2023.05.30?v
index shard prirep state docs store ip node
logstash-2023.05.30 3 p STARTED 1520736 408.1mb 10.138.40.73 10.138.40.73-node1
logstash-2023.05.30 5 p STARTED 1520888 409.9mb 10.138.40.74 10.138.40.74-node1
logstash-2023.05.30 6 p STARTED 1518331 408.2mb 10.138.40.221 10.138.40.221-node1
logstash-2023.05.30 4 p STARTED 1518186 409.3mb 10.138.204.194 10.138.204.194-node1
logstash-2023.05.30 1 p STARTED 1519231 408.8mb 10.138.40.220 10.138.40.220-node1
logstash-2023.05.30 2 p STARTED 1519970 409.9mb 10.138.204.195 10.138.204.195-node1
logstash-2023.05.30 0 p STARTED 1520024 410.6mb 10.138.204.193 10.138.204.193-node1
这里以位于10.138.204.193上的shard 0为例分析。
要找到存储目录先要找到index的id
GET /logstash-2023.05.30/_settings
{
"logstash-2023.05.30" : {
"settings" : {
"index" : {
"codec" : "best_compression",
"routing" : {
"allocation" : {
"include" : {
"_tier_preference" : "data_content"
}
}
},
"refresh_interval" : "60s",
"number_of_shards" : "7",
"provided_name" : "logstash-2023.05.30",
"creation_date" : "1685376005206",
"number_of_replicas" : "0",
"uuid" : "FYWtFGTIS2CLB8yJhFXG9g",//这里就是索引的id
"version" : {
"created" : "7130499"
}
}
}
}
}
登录机器,找到存储索引文件的对应目录
/data3/10.138.204.193-node1/nodes/0/indices/FYWtFGTIS2CLB8yJhFXG9g
展开一下该目录下的文件
root@prd-paas-es-01:/data3/10.138.204.193-node1/nodes/0/indices/FYWtFGTIS2CLB8yJhFXG9g# tree -C -s
.
├── [ 4096] 0
│ ├── [ 20480] index
│ │ ├── [ 158] _17f.fdm
│ │ ├── [ 25578562] _17f.fdt
│ │ ├── [ 1939] _17f.fdx
│ │ ├── [ 4636] _17f.fnm
│ │ ├── [ 7981735] _17f.kdd
│ │ ├── [ 20898] _17f.kdi
│ │ ├── [ 716] _17f.kdm
│ │ ├── [ 7945983] _17f_Lucene80_0.dvd
│ │ ├── [ 3916] _17f_Lucene80_0.dvm
│ │ ├── [ 6230127] _17f_Lucene84_0.doc
│ │ ├── [ 3875001] _17f_Lucene84_0.pos
│ │ ├── [ 7448815] _17f_Lucene84_0.tim
│ │ ├── [ 108786] _17f_Lucene84_0.tip
│ │ ├── [ 1637] _17f_Lucene84_0.tmd
│ │ ├── [ 593] _17f.si
│ │ ├── [ 158] _3uv.fdm
│ │ ├── [ 33652243] _3uv.fdt
│ │ ├── [ 2555] _3uv.fdx
│ │ ├── [ 4636] _3uv.fnm
│ │ ├── [ 10520395] _3uv.kdd
│ │ ├── [ 27689] _3uv.kdi
│ │ ├── [ 716] _3uv.kdm
│ │ ├── [ 10573208] _3uv_Lucene80_0.dvd
│ │ ├── [ 3916] _3uv_Lucene80_0.dvm
│ │ ├── [ 8298061] _3uv_Lucene84_0.doc
│ │ ├── [ 5154427] _3uv_Lucene84_0.pos
│ │ ├── [ 9716222] _3uv_Lucene84_0.tim
│ │ ├── [ 142063] _3uv_Lucene84_0.tip
│ │ ├── [ 1620] _3uv_Lucene84_0.tmd
│ │ ├── [ 593] _3uv.si
│ │ ├── [ 158] _5bg.fdm
│ │ ├── [ 16433011] _5bg.fdt
│ │ ├── [ 1259] _5bg.fdx
│ │ ├── [ 4636] _5bg.fnm
│ │ ├── [ 5158094] _5bg.kdd
│ │ ├── [ 13396] _5bg.kdi
│ │ ├── [ 716] _5bg.kdm
│ │ ├── [ 5140762] _5bg_Lucene80_0.dvd
│ │ ├── [ 3916] _5bg_Lucene80_0.dvm
│ │ ├── [ 4005897] _5bg_Lucene84_0.doc
│ │ ├── [ 2583880] _5bg_Lucene84_0.pos
│ │ ├── [ 4873082] _5bg_Lucene84_0.tim
│ │ ├── [ 70979] _5bg_Lucene84_0.tip
│ │ ├── [ 1593] _5bg_Lucene84_0.tmd
│ │ ├── [ 593] _5bg.si
│ │ ├── [ 158] _60h.fdm
│ │ ├── [ 24664753] _60h.fdt
│ │ ├── [ 1886] _60h.fdx
│ │ ├── [ 4636] _60h.fnm
│ │ ├── [ 7640438] _60h.kdd
│ │ ├── [ 19996] _60h.kdi
│ │ ├── [ 716] _60h.kdm
│ │ ├── [ 7754954] _60h_Lucene80_0.dvd
│ │ ├── [ 3916] _60h_Lucene80_0.dvm
│ │ ├── [ 6147241] _60h_Lucene84_0.doc
│ │ ├── [ 3998559] _60h_Lucene84_0.pos
│ │ ├── [ 7254035] _60h_Lucene84_0.tim
│ │ ├── [ 105673] _60h_Lucene84_0.tip
│ │ ├── [ 1719] _60h_Lucene84_0.tmd
│ │ ├── [ 593] _60h.si
│ │ ├── [ 200] _7jq.fdm
│ │ ├── [ 63208093] _7jq.fdt
│ │ ├── [ 4692] _7jq.fdx
│ │ ├── [ 4636] _7jq.fnm
│ │ ├── [ 19306117] _7jq.kdd
│ │ ├── [ 51562] _7jq.kdi
│ │ ├── [ 716] _7jq.kdm
│ │ ├── [ 20228561] _7jq_Lucene80_0.dvd
│ │ ├── [ 3916] _7jq_Lucene80_0.dvm
│ │ ├── [ 15606568] _7jq_Lucene84_0.doc
│ │ ├── [ 9581341] _7jq_Lucene84_0.pos
│ │ ├── [ 17383473] _7jq_Lucene84_0.tim
│ │ ├── [ 272615] _7jq_Lucene84_0.tip
│ │ ├── [ 1592] _7jq_Lucene84_0.tmd
│ │ ├── [ 593] _7jq.si
│ │ ├── [ 437] _82w.cfe
│ │ ├── [ 4489379] _82w.cfs
│ │ ├── [ 408] _82w.si
│ │ ├── [ 437] _87w.cfe
│ │ ├── [ 4932636] _87w.cfs
│ │ ├── [ 408] _87w.si
│ │ ├── [ 437] _8ao.cfe
│ │ ├── [ 13905317] _8ao.cfs
│ │ ├── [ 408] _8ao.si
│ │ ├── [ 437] _8ls.cfe
│ │ ├── [ 20181047] _8ls.cfs
│ │ ├── [ 408] _8ls.si
│ │ ├── [ 437] _8nq.cfe
│ │ ├── [ 1234712] _8nq.cfs
│ │ ├── [ 408] _8nq.si
│ │ ├── [ 437] _8oa.cfe
│ │ ├── [ 872798] _8oa.cfs
│ │ ├── [ 408] _8oa.si
│ │ ├── [ 437] _8pp.cfe
│ │ ├── [ 1593677] _8pp.cfs
│ │ ├── [ 408] _8pp.si
│ │ ├── [ 437] _8r5.cfe
│ │ ├── [ 914008] _8r5.cfs
│ │ ├── [ 408] _8r5.si
│ │ ├── [ 437] _8rf.cfe
│ │ ├── [ 940473] _8rf.cfs
│ │ ├── [ 408] _8rf.si
│ │ ├── [ 437] _8rz.cfe
│ │ ├── [ 1315312] _8rz.cfs
│ │ ├── [ 408] _8rz.si
│ │ ├── [ 437] _8s9.cfe
│ │ ├── [ 1121692] _8s9.cfs
│ │ ├── [ 408] _8s9.si
│ │ ├── [ 437] _8sk.cfe
│ │ ├── [ 243476] _8sk.cfs
│ │ ├── [ 408] _8sk.si
│ │ ├── [ 1678] segments_6
│ │ └── [ 0] write.lock
│ ├── [ 4096] _state
│ │ ├── [ 186] retention-leases-2865.st
│ │ └── [ 125] state-0.st
│ └── [ 4096] translog
│ ├── [ 55] translog-29.tlog
│ └── [ 88] translog.ckp
└── [ 4096] _state
└── [ 1230] state-2.st
5 directories, 118 files
有了文件信息,我们再来看下,segment信息
GET /logstash-2023.05.30/_segments
// 这里为了直观 只展示shard 0对应的segment
{
"_shards": {
"total": 7,
"successful": 7,
"failed": 0
},
"indices": {
"logstash-2023.05.30": {
"shards": {
"0": [
{
"routing": {
"state": "STARTED",
"primary": true,
"node": "4hEWcF8hRFWTEkQxlKQmqg"
},
"num_committed_segments": 17,
"num_search_segments": 17,
"segments": {
"_17f": {
"generation": 1563,
"num_docs": 210331,
"deleted_docs": 0,
"size_in_bytes": 59203502,
"memory_in_bytes": 5140,
"committed": true,
"search": true,
"version": "8.8.2",
"compound": false,
"attributes": {
"Lucene87StoredFieldsFormat.mode": "BEST_COMPRESSION"
}
},
"_3uv": {
"generation": 4999,
"num_docs": 278411,
"deleted_docs": 0,
"size_in_bytes": 78098502,
"memory_in_bytes": 5140,
"committed": true,
"search": true,
"version": "8.8.2",
"compound": false,
"attributes": {
"Lucene87StoredFieldsFormat.mode": "BEST_COMPRESSION"
}
},
"_5bg": {
"generation": 6892,
"num_docs": 132645,
"deleted_docs": 0,
"size_in_bytes": 38291972,
"memory_in_bytes": 5140,
"committed": true,
"search": true,
"version": "8.8.2",
"compound": false,
"attributes": {
"Lucene87StoredFieldsFormat.mode": "BEST_COMPRESSION"
}
},
"_60h": {
"generation": 7793,
"num_docs": 199809,
"deleted_docs": 0,
"size_in_bytes": 57599273,
"memory_in_bytes": 5140,
"committed": true,
"search": true,
"version": "8.8.2",
"compound": false,
"attributes": {
"Lucene87StoredFieldsFormat.mode": "BEST_COMPRESSION"
}
},
"_7jq": {
"generation": 9782,
"num_docs": 520420,
"deleted_docs": 0,
"size_in_bytes": 145654675,
"memory_in_bytes": 5204,
"committed": true,
"search": true,
"version": "8.8.2",
"compound": false,
"attributes": {
"Lucene87StoredFieldsFormat.mode": "BEST_COMPRESSION"
}
},
"_82w": {
"generation": 10472,
"num_docs": 15416,
"deleted_docs": 0,
"size_in_bytes": 4490224,
"memory_in_bytes": 5140,
"committed": true,
"search": true,
"version": "8.8.2",
"compound": true,
"attributes": {
"Lucene87StoredFieldsFormat.mode": "BEST_COMPRESSION"
}
},
"_87w": {
"generation": 10652,
"num_docs": 16837,
"deleted_docs": 0,
"size_in_bytes": 4933481,
"memory_in_bytes": 5140,
"committed": true,
"search": true,
"version": "8.8.2",
"compound": true,
"attributes": {
"Lucene87StoredFieldsFormat.mode": "BEST_COMPRESSION"
}
},
"_8ao": {
"generation": 10752,
"num_docs": 48855,
"deleted_docs": 0,
"size_in_bytes": 13906162,
"memory_in_bytes": 5140,
"committed": true,
"search": true,
"version": "8.8.2",
"compound": true,
"attributes": {
"Lucene87StoredFieldsFormat.mode": "BEST_COMPRESSION"
}
},
"_8ls": {
"generation": 11152,
"num_docs": 70903,
"deleted_docs": 0,
"size_in_bytes": 20181892,
"memory_in_bytes": 5140,
"committed": true,
"search": true,
"version": "8.8.2",
"compound": true,
"attributes": {
"Lucene87StoredFieldsFormat.mode": "BEST_COMPRESSION"
}
},
"_8nq": {
"generation": 11222,
"num_docs": 3954,
"deleted_docs": 0,
"size_in_bytes": 1235557,
"memory_in_bytes": 6924,
"committed": true,
"search": true,
"version": "8.8.2",
"compound": true,
"attributes": {
"Lucene87StoredFieldsFormat.mode": "BEST_COMPRESSION"
}
},
"_8oa": {
"generation": 11242,
"num_docs": 2785,
"deleted_docs": 0,
"size_in_bytes": 873643,
"memory_in_bytes": 6820,
"committed": true,
"search": true,
"version": "8.8.2",
"compound": true,
"attributes": {
"Lucene87StoredFieldsFormat.mode": "BEST_COMPRESSION"
}
},
"_8pp": {
"generation": 11293,
"num_docs": 5194,
"deleted_docs": 0,
"size_in_bytes": 1594522,
"memory_in_bytes": 7060,
"committed": true,
"search": true,
"version": "8.8.2",
"compound": true,
"attributes": {
"Lucene87StoredFieldsFormat.mode": "BEST_COMPRESSION"
}
},
"_8r5": {
"generation": 11345,
"num_docs": 2936,
"deleted_docs": 0,
"size_in_bytes": 914853,
"memory_in_bytes": 6748,
"committed": true,
"search": true,
"version": "8.8.2",
"compound": true,
"attributes": {
"Lucene87StoredFieldsFormat.mode": "BEST_COMPRESSION"
}
},
"_8rf": {
"generation": 11355,
"num_docs": 2920,
"deleted_docs": 0,
"size_in_bytes": 941318,
"memory_in_bytes": 6836,
"committed": true,
"search": true,
"version": "8.8.2",
"compound": true,
"attributes": {
"Lucene87StoredFieldsFormat.mode": "BEST_COMPRESSION"
}
},
"_8rz": {
"generation": 11375,
"num_docs": 4304,
"deleted_docs": 0,
"size_in_bytes": 1316157,
"memory_in_bytes": 6820,
"committed": true,
"search": true,
"version": "8.8.2",
"compound": true,
"attributes": {
"Lucene87StoredFieldsFormat.mode": "BEST_COMPRESSION"
}
},
"_8s9": {
"generation": 11385,
"num_docs": 3647,
"deleted_docs": 0,
"size_in_bytes": 1122537,
"memory_in_bytes": 6892,
"committed": true,
"search": true,
"version": "8.8.2",
"compound": true,
"attributes": {
"Lucene87StoredFieldsFormat.mode": "BEST_COMPRESSION"
}
},
"_8sk": {
"generation": 11396,
"num_docs": 657,
"deleted_docs": 0,
"size_in_bytes": 244321,
"memory_in_bytes": 7620,
"committed": true,
"search": true,
"version": "8.8.2",
"compound": true,
"attributes": {
"Lucene87StoredFieldsFormat.mode": "BEST_COMPRESSION"
}
}
}
}
]
}
}
}
}
对比segment与shard目录中文件可以看出,两者是一一对应的。
看下es及对应lucene的版本
GET /
{
"name" : "10.138.204.193-node1",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "XWDyVuo6TgK4yUp2XWD3lw",
"version" : {
"number" : "7.13.4",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "c5f60e894ca0c61cdbae4f5a686d9f08bcefc942",
"build_date" : "2021-07-14T18:33:36.673943207Z",
"build_snapshot" : false,
"lucene_version" : "8.8.2",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
那么shard目录中各种后缀的文件具体是什么含义呢?下面来看下
截图出处:
https://lucene.apache.org/core/8_8_2/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description
从表格中可以看出与FST相关的文件后缀有:tip、tim,从这里就可以看出FST文件是以segment维度来创建的。