ES中如何设置自定义分词器并且每个分词器使用自己定义的词库?
1.首先在ansj.cfg.yml中配置
然后在ansj-library.properties文件中添加词典放置路径。。ansj-library.properties和library文件放同一路径下
curl - XPUT‘ http: //localhost:9200/fencitest3?pretty’ -d
‘{“
settings”: {“
analysis”: {“
analyzer”: {“
my_xm_analyzer”: {“
type”: “custom”,
“tokenizer”: “xm_dic”
}
},
“tokenizer”: {“
xm_dic”: {“
type”: “dic_ansj”,
“dic”: “dicxm”,
“stop”: “stop”,
“ambiguity”: “ambiguity”,
“synonyms”: “synonyms”,
“isNameRecognition”: “true”,
“isNumRecognition”: true,
“isQuantifierRecognition”: true,
“isRealName”: false
}
}
}
},
“mappings”: {“
fencitest3”: {“
properties”: {“
title”: {“
type”: “string”,
“analyzer”: “my_xm_analyzer”
}
}
}
}
}’
curl -XGET ‘http://localhost:9200/fencitest3/_analyze?pretty&analyzer=my_xm_analyzer’ -d ‘网五河是一个名字’
如果要在一个es中自定义多个分词器应如下
curl - XPUT 'http://localhost:9200/fencitest3?pretty' - d '{
"settings": {
"analysis": {
"analyzer": {
"my_xm_analyzer": {
"type": "custom",
"tokenizer": "xm_dic"
},
"my_bm_analyzer": {
"type": "custom",
"tokenizer": "bm_dic"
}
},
"tokenizer": {
"xm_dic": {
"type": "dic_ansj",
"dic": "dicxm",
"stop": "stop",
"ambiguity": "ambiguity",
"synonyms": "synonyms",
"isNameRecognition": "true",
"isNumRecognition": true,
"isQuantifierRecognition": true,
"isRealName": false
},
"bm_dic": {
"type": "dic_ansj",
"dic": "dicbm",
"stop": "stop",
"ambiguity": "ambiguity",
"synonyms": "synonyms",
"isNameRecognition": "true",
"isNumRecognition": true,
"isQuantifierRecognition": true,
"isRealName": false
}
}
}
},
“mappings”: {
“fencitest3”: {
“properties”: {
“title”: {
“type”: “string”,
“analyzer”: “my_xm_analyzer”
},
“name”: {
“type”: “string”,
“analyzer”: “my_bm_analyzer”
}
}
}
}
}
’
curl - XPUT ‘http://localhost:9200/fencitest4?pretty’ - d
‘{
“settings”: {
“analysis”: {
“analyzer”: {
“my_xm_analyzer”: {
“type”: “custom”,
“tokenizer”: “xm_dic”
},
“my_bm_analyzer”: {
“type”: “custom”,
“tokenizer”: “bm_dic”
}
},
“tokenizer”: {
“xm_dic”: {
“type”: “dic_ansj”,
“dic”: “dicxm”,
“stop”: “stop”,
“ambiguity”: “ambiguity”,
“synonyms”: “synonyms”,
“isNameRecognition”: “true”,
“isNumRecognition”: true,
“isQuantifierRecognition”: true,
“isRealName”: false
},
“bm_dic”: {
“type”: “dic_ansj”,
“dic”: “dicbm”,
“stop”: “stop”,
“ambiguity”: “ambiguity”,
“synonyms”: “synonyms”,
“isNameRecognition”: “true”,
“isNumRecognition”: true,
“isQuantifierRecognition”: true,
“isRealName”: false
}
}
}
},
“mappings”: {
“fencitest4”: {
“properties”: {
“title”: {
“type”: “string”,
“analyzer”: “my_xm_analyzer”
},
“name”: {
“type”: “string”,
“analyzer”: “my_bm_analyzer”
}
}
}
}
}’