准备工作
创建新用户elk
# 创建新用户elk
$ useradd elk
$ passwd elk
enter password
# 新建elk文件夹并修改拥有者
$ cd /opt
$ mkdir elk
$ chown elk:elk elk
# 切换到elk用户
$ su - elk
enter password
elasticsearch
安装
$ cd /opt/elk
$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.10.tar.gz
$ mkdir elasticsearch && cd elasticsearch
$ tar -zxvf ../elasticsearch-5.6.10.tar.gz
$ ln -s elasticsearch-5.6.10/ default
配置环境变量
修改文件/etc/profile
$ su -
enter password of root
$ vim /etc/profile
在文件最下方添加如下内容
# env variable of elasticsearch
export ES_HOME=/opt/elk/elasticsearch/default
export PATH=$ES_HOME/bin:$PATH
安装中文分词插件
$ source /etc/profile
$ elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.10/elasticsearch-analysis-ik-5.6.10.zip
-> Downloading https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.10/elasticsearch-analysis-ik-5.6.10.zip
[=================================================] 100%
-> Installed analysis-ik
ik
分词插件,有两种分词粒度,ik_max_word
、ik_smart
。
ik_max_word 和 ik_smart 的区别
ik_max_word: 会将文本做最细粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”,会穷尽各种可能的组合;
ik_smart: 会做最粗粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”。
测试ik_max_word 和 ik_smart 的区别
注:需要确保
elasticsearch
已启动。
ik_max_word
curl -X POST "http://127.0.0.1:9200/_analyze?pretty" -H 'Content-Type: application/json' -d '{
"analyzer": "ik_max_word",
"text":"中华人民共和国国歌"
}'
{
"tokens" : [
{
"token" : "中华人民共和国",
"start_offset" : 0,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "中华人民",
"start_offset" : 0,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "中华",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "华人",
"start_offset" : 1,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 3
},
{
"token" : "人民共和国",
"start_offset" : 2,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 4
},
{
"token" : "人民",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 5
},
{
"token" : "共和国",
"start_offset" : 4,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 6
},
{
"token" : "共和",
"start_offset" : 4,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 7
},
{
"token" : "国",
"start_offset" : 6,
"end_offset" : 7,
"type" : "CN_CHAR",
"position" : 8
},
{
"token" : "国歌",
"start_offset" : 7,
"end_offset" : 9,
"type" : "CN_WORD",
"position" : 9
}
]
}
ik_smart
curl -X POST "http://127.0.0.1:9200/_analyze?pretty" -H 'Content-Type: application/json' -d '{
"analyzer": "ik_smart",
"text":"中华人民共和国国歌"
}'
{
"tokens" : [
{
"token" : "中华人民共和国",
"start_offset" : 0,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "国歌",
"start_offset" : 7,
"end_offset" : 9,
"type" : "CN_WORD",
"position" : 1
}
]
}
注:最后的参数
pretty
的意思是格式化返回结果,即让结果更优雅,不然返回的是一大坨字符串。
quick example
启动elasticsearch
$ elasticsearch
以下的例子摘自ik
分词插件的github上的示例。
create an index
curl -XPUT http://localhost:9200/index
create a mapping
curl -XPOST http://localhost:9200/index/fulltext/_mapping -H 'Content-Type:application/json' -d'
{
"properties": {
"content": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word"
}
}
}'
建议创建索引的type
的时候,最好用_doc
,即上面的fulltext
修改成_doc
。因为从6.x
版本开始,一个index
就只能有一个type
,这样的话,在操作索引的时候,请求url
的type
部分就为_doc
,比如:PUT {index}/_doc/{id}
、 POST {index}/_doc
,而这与7.x
版本操作索引的格式是一样的。到7.x版本,就会移除type
,说移除可能不太恰当,应该说是:对用户来说,type
这一概念,被模糊化,没以前那么具体(type
对应rdbms
的表)。至于为什么要这么做,请参考:Why are mapping types being removed?
如果对elasticsearch
接下来版本大改的内容有兴趣的,请详见本文的 附录A 。
index some docs
curl -XPOST http://localhost:9200/index/fulltext/1 -H 'Content-Type:application/json' -d'
{"content":"美国留给伊拉克的是个烂摊子吗"}
'
curl -XPOST http://localhost:9200/index/fulltext/2 -H 'Content-Type:application/json' -d'
{"content":"公安部:各地校车将享最高路权"}
'
curl -XPOST http://localhost:9200/index/fulltext/3 -H 'Content-Type:application/json' -d'
{"content":"中韩渔警冲突调查:韩警平均每天扣1艘中国渔船"}
'
curl -XPOST http://localhost:9200/index/fulltext/4 -H 'Content-Type:application/json' -d'
{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}
'
query with highlighting
curl -XPOST http://localhost:9200/index/fulltext/_search -H 'Content-Type:application/json' -d'
{
"query" : { "match" : { "content" : "中国" }},
"highlight" : {
"pre_tags" : ["", ""],
"post_tags" : [" ", ""],
"fields" : {
"content" : {}
}
}
}
'
result
{
"took": 14,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 2,
"hits": [
{
"_index": "index",
"_type": "fulltext",
"_id": "4",
"_score": 2,
"_source": {
"content": "中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"
},
"highlight": {
"content": [
"中国 驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首 "
]
}
},
{
"_index": "index",
"_type": "fulltext",
"_id": "3",
"_score": 2,
"_source": {
"content": "中韩渔警冲突调查:韩警平均每天扣1艘中国渔船"
},
"highlight": {
"content": [
"均每天扣1艘中国 渔船 "
]
}
}
]
}
}
附录 A
下面是关于接下来版本大改的官方原文:
Schedule for removal of mapping typesedit
This is a big change for our users, so we have tried to make it as painless as possible. The change will roll out as follows:
Elasticsearch 5.6.0
Setting index.mapping.single_type: true on an index will enable the single-type-per-index behaviour which will be enforced in 6.0.
The join field replacement for parent-child is available on indices created in 5.6.Elasticsearch 6.x
- Indices created in 5.x will continue to function in 6.x as they did in 5.x.
- Indices created in 6.x only allow a single-type per index. Any name can be used for the type, but there can be only one. The preferred type name is _doc, so that index APIs have the same path as they will have in 7.0: PUT {index}/_doc/{id} and POST {index}/_doc
- The _type name can no longer be combined with the _id to form the _uid field. The _uid field has become an alias for the _id field.
- New indices no longer support the old-style of parent/child and should use the join field instead.
The default mapping type is deprecated.Elasticsearch 7.x
- The type parameter in URLs are deprecated. For instance, indexing a document no longer requires a document type. The new index APIs are PUT {index}/_doc/{id} in case of explicit ids and POST {index}/_doc for auto-generated ids.
- The index creation, GET|PUT _mapping and document APIs support a query string parameter (include_type_name) which indicates whether requests and responses should include a type name. It defaults to true. 7.x indices which don’t have an explicit type will use the dummy type name _doc. Not setting include_type_name=false will result in a deprecation warning.
- The default mapping type is removed.
Elasticsearch 8.x
- The type parameter is no longer supported in URLs.
- The include_type_name parameter is deprecated, default to false and fails the request when set to true.
Elasticsearch 9.x
- The include_type_name parameter is removed.
原文地址
完!!!