ElasticSearch 6.x 学习笔记:4.IK分词器插件

2019独角兽企业重金招聘Python工程师标准>>> hot3.png

 

ElasticSearch 6.x 学习笔记:4.IK分词器插件

标签: ElastaticSearch ik 中文分词

2018年01月06日 21:04:561456人阅读 评论(0) 收藏 举报

category_icon.jpg 分类:

ElasticSearch学习笔记(40) arrow_triangle%20_down.jpg

版权声明:本文为博主原创文章,欢迎转载。 //blog.csdn.net/chengyuqiang/article/details/78991570

目录(?)[+]

4.1 elasticsearch-analysis-ik 6.1.1

(1)源码 
https://github.com/medcl/elasticsearch-analysis-ik

这里写图片描述 
(2)releases 
https://github.com/medcl/elasticsearch-analysis-ik/releases

这里写图片描述

(3)复制zip地址 
https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.1.1/elasticsearch-analysis-ik-6.1.1.zip

4.2 安装插件

(1)elasticsearch-plugin

[es@node1 elasticsearch-6.1.1]$ bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.1.1/elasticsearch-analysis-ik-6.1.1.zip
-> Downloading https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.1.1/elasticsearch-analysis-ik-6.1.1.zip
[=================================================] 100%   
-> Installed analysis-ik
[es@node1 elasticsearch-6.1.1]$ ll plugins/
total 0
drwxr-xr-x 2 es es 199 Jan  7 08:52 analysis-ik
[es@node1 elasticsearch-6.1.1]$ 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

(2)查看目录

[es@node1 elasticsearch-6.1.1]$ ll plugins/analysis-ik/
total 1420
-rw-r--r-- 1 es es 263965 Jan  7 08:52 commons-codec-1.9.jar
-rw-r--r-- 1 es es  61829 Jan  7 08:52 commons-logging-1.2.jar
-rw-r--r-- 1 es es  51658 Jan  7 08:52 elasticsearch-analysis-ik-6.1.1.jar
-rw-r--r-- 1 es es 736658 Jan  7 08:52 httpclient-4.5.2.jar
-rw-r--r-- 1 es es 326724 Jan  7 08:52 httpcore-4.4.4.jar
-rw-r--r-- 1 es es   2666 Jan  7 08:52 plugin-descriptor.properties
[es@node1 elasticsearch-6.1.1]$ 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

4.3 重启elasticsearch

[es@node1 elasticsearch-6.1.1]$ bin/elasticsearch
[2018-01-07T09:01:17,283][INFO ][o.e.n.Node               ] [] initializing ...
[2018-01-07T09:01:17,421][INFO ][o.e.e.NodeEnvironment    ] [cNWkQjt] using [1] data paths, mounts [[/ (rootfs)]], net usable_space [14.3gb], net total_space [21.9gb], types [rootfs]
[2018-01-07T09:01:17,422][INFO ][o.e.e.NodeEnvironment    ] [cNWkQjt] heap size [1007.3mb], compressed ordinary object pointers [true]
[2018-01-07T09:01:17,484][INFO ][o.e.n.Node               ] node name [cNWkQjt] derived from node ID [cNWkQjt9SzKFNtyx8IIu-A]; set [node.name] to override
[2018-01-07T09:01:17,484][INFO ][o.e.n.Node               ] version[6.1.1], pid[3445], build[bd92e7f/2017-12-17T20:23:25.338Z], OS[Linux/3.10.0-514.el7.x86_64/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_112/25.112-b15]
[2018-01-07T09:01:17,485][INFO ][o.e.n.Node               ] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -XX:+HeapDumpOnOutOfMemoryError, -Des.path.home=/opt/elasticsearch-6.1.1, -Des.path.conf=/opt/elasticsearch-6.1.1/config]
[2018-01-07T09:01:19,000][INFO ][o.e.p.PluginsService     ] [cNWkQjt] loaded module [aggs-matrix-stats]
[2018-01-07T09:01:19,000][INFO ][o.e.p.PluginsService     ] [cNWkQjt] loaded module [analysis-common]
[2018-01-07T09:01:19,000][INFO ][o.e.p.PluginsService     ] [cNWkQjt] loaded module [ingest-common]
[2018-01-07T09:01:19,001][INFO ][o.e.p.PluginsService     ] [cNWkQjt] loaded module [lang-expression]
[2018-01-07T09:01:19,001][INFO ][o.e.p.PluginsService     ] [cNWkQjt] loaded module [lang-mustache]
[2018-01-07T09:01:19,001][INFO ][o.e.p.PluginsService     ] [cNWkQjt] loaded module [lang-painless]
[2018-01-07T09:01:19,001][INFO ][o.e.p.PluginsService     ] [cNWkQjt] loaded module [mapper-extras]
[2018-01-07T09:01:19,001][INFO ][o.e.p.PluginsService     ] [cNWkQjt] loaded module [parent-join]
[2018-01-07T09:01:19,002][INFO ][o.e.p.PluginsService     ] [cNWkQjt] loaded module [percolator]
[2018-01-07T09:01:19,002][INFO ][o.e.p.PluginsService     ] [cNWkQjt] loaded module [reindex]
[2018-01-07T09:01:19,002][INFO ][o.e.p.PluginsService     ] [cNWkQjt] loaded module [repository-url]
[2018-01-07T09:01:19,002][INFO ][o.e.p.PluginsService     ] [cNWkQjt] loaded module [transport-netty4]
[2018-01-07T09:01:19,002][INFO ][o.e.p.PluginsService     ] [cNWkQjt] loaded module [tribe]
[2018-01-07T09:01:19,003][INFO ][o.e.p.PluginsService     ] [cNWkQjt] loaded plugin [analysis-ik]
[2018-01-07T09:01:21,678][INFO ][o.e.d.DiscoveryModule    ] [cNWkQjt] using discovery type [zen]
[2018-01-07T09:01:22,567][INFO ][o.e.n.Node               ] initialized
[2018-01-07T09:01:22,568][INFO ][o.e.n.Node               ] [cNWkQjt] starting ...
[2018-01-07T09:01:22,803][INFO ][o.e.t.TransportService   ] [cNWkQjt] publish_address {192.168.80.131:9300}, bound_addresses {192.168.80.131:9300}
[2018-01-07T09:01:22,837][INFO ][o.e.b.BootstrapChecks    ] [cNWkQjt] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2018-01-07T09:01:25,940][INFO ][o.e.c.s.MasterService    ] [cNWkQjt] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {cNWkQjt}{cNWkQjt9SzKFNtyx8IIu-A}{Xvho5gpPTuavakz227C_uA}{192.168.80.131}{192.168.80.131:9300}
[2018-01-07T09:01:25,949][INFO ][o.e.c.s.ClusterApplierService] [cNWkQjt] new_master {cNWkQjt}{cNWkQjt9SzKFNtyx8IIu-A}{Xvho5gpPTuavakz227C_uA}{192.168.80.131}{192.168.80.131:9300}, reason: apply cluster state (from master [master {cNWkQjt}{cNWkQjt9SzKFNtyx8IIu-A}{Xvho5gpPTuavakz227C_uA}{192.168.80.131}{192.168.80.131:9300} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
[2018-01-07T09:01:25,993][INFO ][o.e.h.n.Netty4HttpServerTransport] [cNWkQjt] publish_address {192.168.80.131:9200}, bound_addresses {192.168.80.131:9200}
[2018-01-07T09:01:25,993][INFO ][o.e.n.Node               ] [cNWkQjt] started
[2018-01-07T09:01:26,077][INFO ][o.w.a.d.Monitor          ] try load config from /opt/elasticsearch-6.1.1/config/analysis-ik/IKAnalyzer.cfg.xml
[2018-01-07T09:01:26,799][INFO ][o.e.g.GatewayService     ] [cNWkQjt] recovered [2] indices into cluster_state
[2018-01-07T09:01:27,526][INFO ][o.e.c.r.a.AllocationService] [cNWkQjt] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[test][2], [test][0]] ...]).
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33

4.3 测试IK中文分词器的基本功能

(1)ik_smart 
其中pretty本意”漂亮的”,表示以美观的形式打印出JSON格式响应。

GET _analyze?pretty
{
  "analyzer": "ik_smart",
  "text":"安徽省长江流域"
}
  • 1
  • 2
  • 3
  • 4
  • 5

分词结果

{
  "tokens": [
    {
      "token": "安徽省",
      "start_offset": 0,
      "end_offset": 3,
      "type": "CN_WORD",
      "position": 0
    },
    {
      "token": "长江流域",
      "start_offset": 3,
      "end_offset": 7,
      "type": "CN_WORD",
      "position": 1
    }
  ]
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18

这里写图片描述 
(2)ik_max_word

GET _analyze?pretty
{
  "analyzer": "ik_max_word",
  "text":"安徽省长江流域"
}
  • 1
  • 2
  • 3
  • 4
  • 5

分词结果

{
  "tokens": [
    {
      "token": "安徽省",
      "start_offset": 0,
      "end_offset": 3,
      "type": "CN_WORD",
      "position": 0
    },
    {
      "token": "安徽",
      "start_offset": 0,
      "end_offset": 2,
      "type": "CN_WORD",
      "position": 1
    },
    {
      "token": "省长",
      "start_offset": 2,
      "end_offset": 4,
      "type": "CN_WORD",
      "position": 2
    },
    {
      "token": "长江流域",
      "start_offset": 3,
      "end_offset": 7,
      "type": "CN_WORD",
      "position": 3
    },
    {
      "token": "长江",
      "start_offset": 3,
      "end_offset": 5,
      "type": "CN_WORD",
      "position": 4
    },
    {
      "token": "江流",
      "start_offset": 4,
      "end_offset": 6,
      "type": "CN_WORD",
      "position": 5
    },
    {
      "token": "流域",
      "start_offset": 5,
      "end_offset": 7,
      "type": "CN_WORD",
      "position": 6
    }
  ]
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53

这里写图片描述

(3)新词

GET _analyze?pretty
{
  "analyzer": "ik_smart",
  "text": "王者荣耀"
}
  • 1
  • 2
  • 3
  • 4
  • 5

分词结果

{
  "tokens": [
    {
      "token": "王者",
      "start_offset": 0,
      "end_offset": 2,
      "type": "CN_WORD",
      "position": 0
    },
    {
      "token": "荣耀",
      "start_offset": 2,
      "end_offset": 4,
      "type": "CN_WORD",
      "position": 1
    }
  ]
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18

这里写图片描述

4.4 扩展字典

(1)查看已有词典

[es@node1 analysis-ik]$ pwd
/opt/elasticsearch-6.1.1/config/analysis-ik
[es@node1 analysis-ik]$ ll
total 8260
-rw-rw---- 1 es bigdata 5225922 Jan  7 08:52 extra_main.dic
-rw-rw---- 1 es bigdata   63188 Jan  7 08:52 extra_single_word.dic
-rw-rw---- 1 es bigdata   63188 Jan  7 08:52 extra_single_word_full.dic
-rw-rw---- 1 es bigdata   10855 Jan  7 08:52 extra_single_word_low_freq.dic
-rw-rw---- 1 es bigdata     156 Jan  7 08:52 extra_stopword.dic
-rw-rw---- 1 es bigdata     625 Jan  7 08:52 IKAnalyzer.cfg.xml
-rw-rw---- 1 es bigdata 3058510 Jan  7 08:52 main.dic
-rw-rw---- 1 es bigdata     123 Jan  7 08:52 preposition.dic
-rw-rw---- 1 es bigdata    1824 Jan  7 08:52 quantifier.dic
-rw-rw---- 1 es bigdata     164 Jan  7 08:52 stopword.dic
-rw-rw---- 1 es bigdata     192 Jan  7 08:52 suffix.dic
-rw-rw---- 1 es bigdata     752 Jan  7 08:52 surname.dic
[es@node1 analysis-ik]$
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17

(2)自定义词典

[es@node1 analysis-ik]$ mkdir custom
[es@node1 analysis-ik]$ vi custom/new_word.dic
[es@node1 analysis-ik]$ cat custom/new_word.dic 
老铁
王者荣耀
洪荒之力
共有产权房
一带一路
[es@node1 analysis-ik]$ 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

(3)更新配置

[es@node1 analysis-ik]$ vi IKAnalyzer.cfg.xml 
[es@node1 analysis-ik]$ cat IKAnalyzer.cfg.xml 



    IK Analyzer 扩展配置
    
    custom/new_word.dic
     
    
    
    
    
    

[es@node1 analysis-ik]$
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16

(4)重启elasticsearch

[es@node1 elasticsearch-6.1.1]$ bin/elasticsearch
[2018-01-07T10:00:23,032][INFO ][o.e.n.Node               ] [] initializing ...
[2018-01-07T10:00:23,170][INFO ][o.e.e.NodeEnvironment    ] [cNWkQjt] using [1] data paths, mounts [[/ (rootfs)]], net usable_space [14.3gb], net total_space [21.9gb], types [rootfs]
[2018-01-07T10:00:23,171][INFO ][o.e.e.NodeEnvironment    ] [cNWkQjt] heap size [1007.3mb], compressed ordinary object pointers [true]
[2018-01-07T10:00:23,209][INFO ][o.e.n.Node               ] node name [cNWkQjt] derived from node ID [cNWkQjt9SzKFNtyx8IIu-A]; set [node.name] to override
[2018-01-07T10:00:23,210][INFO ][o.e.n.Node               ] version[6.1.1], pid[3574], build[bd92e7f/2017-12-17T20:23:25.338Z], OS[Linux/3.10.0-514.el7.x86_64/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_112/25.112-b15]
[2018-01-07T10:00:23,210][INFO ][o.e.n.Node               ] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -XX:+HeapDumpOnOutOfMemoryError, -Des.path.home=/opt/elasticsearch-6.1.1, -Des.path.conf=/opt/elasticsearch-6.1.1/config]
[2018-01-07T10:00:24,717][INFO ][o.e.p.PluginsService     ] [cNWkQjt] loaded module [aggs-matrix-stats]
[2018-01-07T10:00:24,717][INFO ][o.e.p.PluginsService     ] [cNWkQjt] loaded module [analysis-common]
[2018-01-07T10:00:24,718][INFO ][o.e.p.PluginsService     ] [cNWkQjt] loaded module [ingest-common]
[2018-01-07T10:00:24,718][INFO ][o.e.p.PluginsService     ] [cNWkQjt] loaded module [lang-expression]
[2018-01-07T10:00:24,718][INFO ][o.e.p.PluginsService     ] [cNWkQjt] loaded module [lang-mustache]
[2018-01-07T10:00:24,718][INFO ][o.e.p.PluginsService     ] [cNWkQjt] loaded module [lang-painless]
[2018-01-07T10:00:24,718][INFO ][o.e.p.PluginsService     ] [cNWkQjt] loaded module [mapper-extras]
[2018-01-07T10:00:24,719][INFO ][o.e.p.PluginsService     ] [cNWkQjt] loaded module [parent-join]
[2018-01-07T10:00:24,719][INFO ][o.e.p.PluginsService     ] [cNWkQjt] loaded module [percolator]
[2018-01-07T10:00:24,719][INFO ][o.e.p.PluginsService     ] [cNWkQjt] loaded module [reindex]
[2018-01-07T10:00:24,719][INFO ][o.e.p.PluginsService     ] [cNWkQjt] loaded module [repository-url]
[2018-01-07T10:00:24,719][INFO ][o.e.p.PluginsService     ] [cNWkQjt] loaded module [transport-netty4]
[2018-01-07T10:00:24,720][INFO ][o.e.p.PluginsService     ] [cNWkQjt] loaded module [tribe]
[2018-01-07T10:00:24,720][INFO ][o.e.p.PluginsService     ] [cNWkQjt] loaded plugin [analysis-ik]
[2018-01-07T10:00:27,866][INFO ][o.e.d.DiscoveryModule    ] [cNWkQjt] using discovery type [zen]
[2018-01-07T10:00:28,794][INFO ][o.e.n.Node               ] initialized
[2018-01-07T10:00:28,795][INFO ][o.e.n.Node               ] [cNWkQjt] starting ...
[2018-01-07T10:00:29,047][INFO ][o.e.t.TransportService   ] [cNWkQjt] publish_address {192.168.80.131:9300}, bound_addresses {192.168.80.131:9300}
[2018-01-07T10:00:29,093][INFO ][o.e.b.BootstrapChecks    ] [cNWkQjt] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2018-01-07T10:00:32,210][INFO ][o.e.c.s.MasterService    ] [cNWkQjt] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {cNWkQjt}{cNWkQjt9SzKFNtyx8IIu-A}{N6t0NiDmQp2vlrbx-FtcUQ}{192.168.80.131}{192.168.80.131:9300}
[2018-01-07T10:00:32,217][INFO ][o.e.c.s.ClusterApplierService] [cNWkQjt] new_master {cNWkQjt}{cNWkQjt9SzKFNtyx8IIu-A}{N6t0NiDmQp2vlrbx-FtcUQ}{192.168.80.131}{192.168.80.131:9300}, reason: apply cluster state (from master [master {cNWkQjt}{cNWkQjt9SzKFNtyx8IIu-A}{N6t0NiDmQp2vlrbx-FtcUQ}{192.168.80.131}{192.168.80.131:9300} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
[2018-01-07T10:00:32,285][INFO ][o.e.h.n.Netty4HttpServerTransport] [cNWkQjt] publish_address {192.168.80.131:9200}, bound_addresses {192.168.80.131:9200}
[2018-01-07T10:00:32,286][INFO ][o.e.n.Node               ] [cNWkQjt] started
[2018-01-07T10:00:32,326][INFO ][o.w.a.d.Monitor          ] try load config from /opt/elasticsearch-6.1.1/config/analysis-ik/IKAnalyzer.cfg.xml
[2018-01-07T10:00:32,905][INFO ][o.w.a.d.Monitor          ] [Dict Loading] custom/new_word.dic
[2018-01-07T10:00:33,279][INFO ][o.e.g.GatewayService     ] [cNWkQjt] recovered [2] indices into cluster_state
[2018-01-07T10:00:34,092][INFO ][o.e.c.r.a.AllocationService] [cNWkQjt] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[test][3]] ...]).
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35

从输出信息中可以看到

[Dict Loading] custom/new_word.dic
  • 1

说明自定义词典已经加载了。

(5)重启Kibana 
重启Kibana后,从新执行下面命令

GET _analyze?pretty
{
  "analyzer": "ik_smart",
  "text":"王者荣耀"
}
  • 1
  • 2
  • 3
  • 4
  • 5

分词结果

{
  "tokens": [
    {
      "token": "王者荣耀",
      "start_offset": 0,
      "end_offset": 4,
      "type": "CN_WORD",
      "position": 0
    }
  ]
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

这里写图片描述

  • 上一篇 ElasticSearch 6.x 学习笔记:3.Kibana插件
  • 下一篇 ElasticSearch 6.x 学习笔记:5.核心概念解读

转载于:https://my.oschina.net/u/3367404/blog/1635003

你可能感兴趣的:(ElasticSearch 6.x 学习笔记:4.IK分词器插件)