ES设置多个自定义分词器,每个分词器使用不同的词库

ES中如何设置自定义分词器并且每个分词器使用自己定义的词库?
1.首先在ansj.cfg.yml中配置

ES设置多个自定义分词器,每个分词器使用不同的词库_第1张图片

然后在ansj-library.properties文件中添加词典放置路径。。ansj-library.properties和library文件放同一路径下

ES设置多个自定义分词器,每个分词器使用不同的词库_第2张图片

ES设置多个自定义分词器,每个分词器使用不同的词库_第3张图片

curl - XPUT‘ http: //localhost:9200/fencitest3?pretty’ -d
‘{“
settings”: {“
analysis”: {“
analyzer”: {“
my_xm_analyzer”: {“
type”: “custom”,
“tokenizer”: “xm_dic”
}
},
“tokenizer”: {“
xm_dic”: {“
type”: “dic_ansj”,
“dic”: “dicxm”,
“stop”: “stop”,
“ambiguity”: “ambiguity”,
“synonyms”: “synonyms”,
“isNameRecognition”: “true”,
“isNumRecognition”: true,
“isQuantifierRecognition”: true,
“isRealName”: false
}
}
}
},
“mappings”: {“
fencitest3”: {“
properties”: {“
title”: {“
type”: “string”,
“analyzer”: “my_xm_analyzer”
}
}
}
}
}’
curl -XGET ‘http://localhost:9200/fencitest3/_analyze?pretty&analyzer=my_xm_analyzer’ -d ‘网五河是一个名字’

ES设置多个自定义分词器,每个分词器使用不同的词库_第4张图片

如果要在一个es中自定义多个分词器应如下

 curl - XPUT 'http://localhost:9200/fencitest3?pretty' - d '{

  "settings": {
  "analysis": {
	
	"analyzer": {
		
		"my_xm_analyzer": {
		
			"type": "custom",
	
		"tokenizer": "xm_dic"

	},
				
		"my_bm_analyzer": {
		
			"type": "custom",
		
			"tokenizer": "bm_dic"		
		}			
},
	"tokenizer": {
		"xm_dic": {
			"type": "dic_ansj",
			"dic": "dicxm",
			"stop": "stop",
			"ambiguity": "ambiguity",
			"synonyms": "synonyms",
			"isNameRecognition": "true",
			"isNumRecognition": true,
			"isQuantifierRecognition": true,
			"isRealName": false
		},
		"bm_dic": {
			"type": "dic_ansj",
			"dic": "dicbm",
			"stop": "stop",
			"ambiguity": "ambiguity",
			"synonyms": "synonyms",
			"isNameRecognition": "true",
			"isNumRecognition": true,
			"isQuantifierRecognition": true,
			"isRealName": false
		}
	}
}

},
“mappings”: {
“fencitest3”: {
“properties”: {
“title”: {
“type”: “string”,
“analyzer”: “my_xm_analyzer”
},
“name”: {
“type”: “string”,
“analyzer”: “my_bm_analyzer”
}
}
}
}
}

curl - XPUT ‘http://localhost:9200/fencitest4?pretty’ - d
‘{
“settings”: {
“analysis”: {
“analyzer”: {
“my_xm_analyzer”: {
“type”: “custom”,
“tokenizer”: “xm_dic”
},
“my_bm_analyzer”: {
“type”: “custom”,
“tokenizer”: “bm_dic”
}
},
“tokenizer”: {
“xm_dic”: {
“type”: “dic_ansj”,
“dic”: “dicxm”,
“stop”: “stop”,
“ambiguity”: “ambiguity”,
“synonyms”: “synonyms”,
“isNameRecognition”: “true”,
“isNumRecognition”: true,
“isQuantifierRecognition”: true,
“isRealName”: false
},
“bm_dic”: {
“type”: “dic_ansj”,
“dic”: “dicbm”,
“stop”: “stop”,
“ambiguity”: “ambiguity”,
“synonyms”: “synonyms”,
“isNameRecognition”: “true”,
“isNumRecognition”: true,
“isQuantifierRecognition”: true,
“isRealName”: false
}
}
}
},
“mappings”: {
“fencitest4”: {
“properties”: {
“title”: {
“type”: “string”,
“analyzer”: “my_xm_analyzer”
},
“name”: {
“type”: “string”,
“analyzer”: “my_bm_analyzer”
}
}
}
}
}’

你可能感兴趣的:(ES设置多个自定义分词器,每个分词器使用不同的词库)