linux下安装elasticsearch

准备工作

创建新用户elk

# 创建新用户elk
$ useradd elk
$ passwd elk
  enter password

# 新建elk文件夹并修改拥有者
$ cd /opt
$ mkdir elk
$ chown elk:elk elk

# 切换到elk用户
$ su - elk
  enter password

elasticsearch

安装

$ cd /opt/elk
$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.10.tar.gz

$ mkdir elasticsearch && cd elasticsearch
$ tar -zxvf ../elasticsearch-5.6.10.tar.gz

$ ln -s elasticsearch-5.6.10/ default

配置环境变量

修改文件/etc/profile

$ su -
  enter password of root
$ vim /etc/profile

在文件最下方添加如下内容

# env variable of elasticsearch
export ES_HOME=/opt/elk/elasticsearch/default
export PATH=$ES_HOME/bin:$PATH

安装中文分词插件

$ source /etc/profile
$ elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.10/elasticsearch-analysis-ik-5.6.10.zip
-> Downloading https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.10/elasticsearch-analysis-ik-5.6.10.zip
[=================================================] 100%
-> Installed analysis-ik

ik分词插件，有两种分词粒度，ik_max_word、ik_smart。

ik_max_word 和 ik_smart 的区别

ik_max_word: 会将文本做最细粒度的拆分，比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”，会穷尽各种可能的组合；
ik_smart: 会做最粗粒度的拆分，比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”。

测试ik_max_word 和 ik_smart 的区别

注：需要确保elasticsearch已启动。

ik_max_word

curl -X POST "http://127.0.0.1:9200/_analyze?pretty" -H 'Content-Type: application/json' -d '{
    "analyzer": "ik_max_word",
    "text":"中华人民共和国国歌"
}'
{
  "tokens" : [
    {
      "token" : "中华人民共和国",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "中华人民",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "中华",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "华人",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "人民共和国",
      "start_offset" : 2,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "人民",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "共和国",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 6
    },
    {
      "token" : "共和",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 7
    },
    {
      "token" : "国",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "CN_CHAR",
      "position" : 8
    },
    {
      "token" : "国歌",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 9
    }
  ]
}

ik_smart

curl -X POST "http://127.0.0.1:9200/_analyze?pretty" -H 'Content-Type: application/json' -d '{
    "analyzer": "ik_smart",
    "text":"中华人民共和国国歌"
}'
{
  "tokens" : [
    {
      "token" : "中华人民共和国",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "国歌",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 1
    }
  ]
}

注：最后的参数pretty的意思是格式化返回结果，即让结果更优雅，不然返回的是一大坨字符串。

quick example

启动elasticsearch

$ elasticsearch

以下的例子摘自ik分词插件的github上的示例。

create an index

curl -XPUT http://localhost:9200/index

create a mapping

curl -XPOST http://localhost:9200/index/fulltext/_mapping -H 'Content-Type:application/json' -d'
{
        "properties": {
            "content": {
                "type": "text",
                "analyzer": "ik_max_word",
                "search_analyzer": "ik_max_word"
            }
        }
}'

建议创建索引的type的时候，最好用_doc，即上面的fulltext修改成_doc。因为从6.x版本开始，一个index就只能有一个type，这样的话，在操作索引的时候，请求url的type部分就为_doc，比如：PUT {index}/_doc/{id}、 POST {index}/_doc，而这与7.x版本操作索引的格式是一样的。到7.x版本，就会移除type，说移除可能不太恰当，应该说是：对用户来说，type这一概念，被模糊化，没以前那么具体（type对应rdbms的表）。至于为什么要这么做，请参考：Why are mapping types being removed?

如果对elasticsearch接下来版本大改的内容有兴趣的，请详见本文的 附录A 。

index some docs

curl -XPOST http://localhost:9200/index/fulltext/1 -H 'Content-Type:application/json' -d'
{"content":"美国留给伊拉克的是个烂摊子吗"}
'
curl -XPOST http://localhost:9200/index/fulltext/2 -H 'Content-Type:application/json' -d'
{"content":"公安部：各地校车将享最高路权"}
'
curl -XPOST http://localhost:9200/index/fulltext/3 -H 'Content-Type:application/json' -d'
{"content":"中韩渔警冲突调查：韩警平均每天扣1艘中国渔船"}
'
curl -XPOST http://localhost:9200/index/fulltext/4 -H 'Content-Type:application/json' -d'
{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}
'

query with highlighting

curl -XPOST http://localhost:9200/index/fulltext/_search  -H 'Content-Type:application/json' -d'
{
    "query" : { "match" : { "content" : "中国" }},
    "highlight" : {
        "pre_tags" : ["", ""],
        "post_tags" : ["", ""],
        "fields" : {
            "content" : {}
        }
    }
}
'

result

{
    "took": 14,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 2,
        "hits": [
            {
                "_index": "index",
                "_type": "fulltext",
                "_id": "4",
                "_score": 2,
                "_source": {
                    "content": "中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"
                },
                "highlight": {
                    "content": [
                        "中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首 "
                    ]
                }
            },
            {
                "_index": "index",
                "_type": "fulltext",
                "_id": "3",
                "_score": 2,
                "_source": {
                    "content": "中韩渔警冲突调查：韩警平均每天扣1艘中国渔船"
                },
                "highlight": {
                    "content": [
                        "均每天扣1艘中国渔船 "
                    ]
                }
            }
        ]
    }
}

附录 A

下面是关于接下来版本大改的官方原文：

Schedule for removal of mapping typesedit

This is a big change for our users, so we have tried to make it as painless as possible. The change will roll out as follows:

Elasticsearch 5.6.0

Setting index.mapping.single_type: true on an index will enable the single-type-per-index behaviour which will be enforced in 6.0.
The join field replacement for parent-child is available on indices created in 5.6.

Elasticsearch 6.x

Indices created in 5.x will continue to function in 6.x as they did in 5.x.

Indices created in 6.x only allow a single-type per index. Any name can be used for the type, but there can be only one. The preferred type name is _doc, so that index APIs have the same path as they will have in 7.0: PUT {index}/_doc/{id} and POST {index}/_doc

The _type name can no longer be combined with the _id to form the _uid field. The _uid field has become an alias for the _id field.

New indices no longer support the old-style of parent/child and should use the join field instead.
The default mapping type is deprecated.

Elasticsearch 7.x

The type parameter in URLs are deprecated. For instance, indexing a document no longer requires a document type. The new index APIs are PUT {index}/_doc/{id} in case of explicit ids and POST {index}/_doc for auto-generated ids.

The index creation, GET|PUT _mapping and document APIs support a query string parameter (include_type_name) which indicates whether requests and responses should include a type name. It defaults to true. 7.x indices which don’t have an explicit type will use the dummy type name _doc. Not setting include_type_name=false will result in a deprecation warning.

The default mapping type is removed.

Elasticsearch 8.x

The type parameter is no longer supported in URLs.

The include_type_name parameter is deprecated, default to false and fails the request when set to true.

Elasticsearch 9.x

The include_type_name parameter is removed.

原文地址

完！！！