扩展字典中的词会被筛选出来,扩展停止词中的词会被过滤掉
1.没有加入扩展字典 停止词字典用法
ik
[root@localhost custom]# curl -i -X GET -H 'Content-type:application/json' -d '{"analyzer":"ik","text":"自古刀扇过背刺"}' http://192.168.0.110:9200/_analyze?pretty
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 725
{
"tokens" : [ {
"token" : "自古",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 0
}, {
"token" : "刀",
"start_offset" : 2,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 1
}, {
"token" : "扇",
"start_offset" : 3,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 2
}, {
"token" : "过",
"start_offset" : 4,
"end_offset" : 5,
"type" : "CN_CHAR",
"position" : 3
}, {
"token" : "背",
"start_offset" : 5,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 4
}, {
"token" : "刺",
"start_offset" : 6,
"end_offset" : 7,
"type" : "CN_CHAR",
"position" : 5
} ]
}
ik_smart
[root@localhost custom]# curl -i -X GET -H 'Content-type:application/json' -d '{"analyzer":"ik_smart","text":"自古刀扇过背刺"}' http://192.168.0.110:9200/_analyze?pretty HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 725
{
"tokens" : [ {
"token" : "自古",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 0
}, {
"token" : "刀",
"start_offset" : 2,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 1
}, {
"token" : "扇",
"start_offset" : 3,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 2
}, {
"token" : "过",
"start_offset" : 4,
"end_offset" : 5,
"type" : "CN_CHAR",
"position" : 3
}, {
"token" : "背",
"start_offset" : 5,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 4
}, {
"token" : "刺",
"start_offset" : 6,
"end_offset" : 7,
"type" : "CN_CHAR",
"position" : 5
} ]
}
ik_max_word
[root@localhost custom]# curl -i -X GET -H 'Content-type:application/json' -d '{"analyzer":"ik_max_word","text":"自古刀扇过背刺"}' http://192.168.0.110:9200/_analyze?pretty
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 725
{
"tokens" : [ {
"token" : "自古",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 0
}, {
"token" : "刀",
"start_offset" : 2,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 1
}, {
"token" : "扇",
"start_offset" : 3,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 2
}, {
"token" : "过",
"start_offset" : 4,
"end_offset" : 5,
"type" : "CN_CHAR",
"position" : 3
}, {
"token" : "背",
"start_offset" : 5,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 4
}, {
"token" : "刺",
"start_offset" : 6,
"end_offset" : 7,
"type" : "CN_CHAR",
"position" : 5
} ]
}
2.加入自定义字典
test.dic
刀扇
背刺
teststop.dic
自古
过
/analysis-ik/config/IKAnalyzer.cfg.xml
[root@localhost config]# curl -i -X GET -H 'Content-type:application/json' -d '{"analyzer":"ik","text":"自古刀扇过背刺"}' http://192.168.0.110:9200/_analyze?pretty
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 728
{
"tokens" : [ {
"token" : "刀扇",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 0
}, {
"token" : "刀",
"start_offset" : 2,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 1
}, {
"token" : "扇",
"start_offset" : 3,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 2
}, {
"token" : "背刺",
"start_offset" : 5,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 3
}, {
"token" : "背",
"start_offset" : 5,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 4
}, {
"token" : "刺",
"start_offset" : 6,
"end_offset" : 7,
"type" : "CN_CHAR",
"position" : 5
} ]
}
ik_smart
[root@localhost config]# curl -i -X GET -H 'Content-type:application/json' -d '{"analyzer":"ik_smart","text":"自古刀扇过背刺"}' http://192.168.0.110:9200/_analyze?pretty HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 260
{
"tokens" : [ {
"token" : "刀扇",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 0
}, {
"token" : "背刺",
"start_offset" : 5,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 1
} ]
}
ik_max_word
[root@localhost config]# curl -i -X GET -H 'Content-type:application/json' -d '{"analyzer":"ik_max_word","text":"自古刀扇过背刺"}' http://192.168.0.110:9200/_analyze?pretty
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 728
{
"tokens" : [ {
"token" : "刀扇",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 0
}, {
"token" : "刀",
"start_offset" : 2,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 1
}, {
"token" : "扇",
"start_offset" : 3,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 2
}, {
"token" : "背刺",
"start_offset" : 5,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 3
}, {
"token" : "背",
"start_offset" : 5,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 4
}, {
"token" : "刺",
"start_offset" : 6,
"end_offset" : 7,
"type" : "CN_CHAR",
"position" : 5
} ]
}