linux下安装elasticsearch

准备工作

创建新用户elk

# 创建新用户elk
$ useradd elk
$ passwd elk
  enter password

# 新建elk文件夹并修改拥有者
$ cd /opt
$ mkdir elk
$ chown elk:elk elk

# 切换到elk用户
$ su - elk
  enter password

elasticsearch

安装

$ cd /opt/elk
$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.10.tar.gz

$ mkdir elasticsearch && cd elasticsearch
$ tar -zxvf ../elasticsearch-5.6.10.tar.gz

$ ln -s elasticsearch-5.6.10/ default

配置环境变量

修改文件/etc/profile

$ su -
  enter password of root
$ vim /etc/profile

在文件最下方添加如下内容

# env variable of elasticsearch
export ES_HOME=/opt/elk/elasticsearch/default
export PATH=$ES_HOME/bin:$PATH

安装中文分词插件

$ source /etc/profile
$ elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.10/elasticsearch-analysis-ik-5.6.10.zip
-> Downloading https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.10/elasticsearch-analysis-ik-5.6.10.zip
[=================================================] 100%
-> Installed analysis-ik

ik分词插件,有两种分词粒度,ik_max_wordik_smart

ik_max_word 和 ik_smart 的区别

  • ik_max_word: 会将文本做最细粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”,会穷尽各种可能的组合;

  • ik_smart: 会做最粗粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”。

测试ik_max_word 和 ik_smart 的区别

注:需要确保elasticsearch已启动。

ik_max_word
curl -X POST "http://127.0.0.1:9200/_analyze?pretty" -H 'Content-Type: application/json' -d '{
    "analyzer": "ik_max_word",
    "text":"中华人民共和国国歌"
}'
{
  "tokens" : [
    {
      "token" : "中华人民共和国",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "中华人民",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "中华",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "华人",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "人民共和国",
      "start_offset" : 2,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "人民",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "共和国",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 6
    },
    {
      "token" : "共和",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 7
    },
    {
      "token" : "国",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "CN_CHAR",
      "position" : 8
    },
    {
      "token" : "国歌",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 9
    }
  ]
}
ik_smart
curl -X POST "http://127.0.0.1:9200/_analyze?pretty" -H 'Content-Type: application/json' -d '{
    "analyzer": "ik_smart",
    "text":"中华人民共和国国歌"
}'
{
  "tokens" : [
    {
      "token" : "中华人民共和国",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "国歌",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 1
    }
  ]
}

注:最后的参数pretty的意思是格式化返回结果,即让结果更优雅,不然返回的是一大坨字符串。

quick example

启动elasticsearch

$ elasticsearch

以下的例子摘自ik分词插件的github上的示例。

create an index

curl -XPUT http://localhost:9200/index

create a mapping

curl -XPOST http://localhost:9200/index/fulltext/_mapping -H 'Content-Type:application/json' -d'
{
        "properties": {
            "content": {
                "type": "text",
                "analyzer": "ik_max_word",
                "search_analyzer": "ik_max_word"
            }
        }
}'

建议创建索引的type的时候,最好用_doc,即上面的fulltext修改成_doc。因为从6.x版本开始,一个index就只能有一个type,这样的话,在操作索引的时候,请求urltype部分就为_doc,比如:PUT {index}/_doc/{id}POST {index}/_doc,而这与7.x版本操作索引的格式是一样的。到7.x版本,就会移除type,说移除可能不太恰当,应该说是:对用户来说,type这一概念,被模糊化,没以前那么具体(type对应rdbms的表)。至于为什么要这么做,请参考:Why are mapping types being removed?

如果对elasticsearch接下来版本大改的内容有兴趣的,请详见本文的 附录A

index some docs

curl -XPOST http://localhost:9200/index/fulltext/1 -H 'Content-Type:application/json' -d'
{"content":"美国留给伊拉克的是个烂摊子吗"}
'
curl -XPOST http://localhost:9200/index/fulltext/2 -H 'Content-Type:application/json' -d'
{"content":"公安部:各地校车将享最高路权"}
'
curl -XPOST http://localhost:9200/index/fulltext/3 -H 'Content-Type:application/json' -d'
{"content":"中韩渔警冲突调查:韩警平均每天扣1艘中国渔船"}
'
curl -XPOST http://localhost:9200/index/fulltext/4 -H 'Content-Type:application/json' -d'
{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}
'

query with highlighting

curl -XPOST http://localhost:9200/index/fulltext/_search  -H 'Content-Type:application/json' -d'
{
    "query" : { "match" : { "content" : "中国" }},
    "highlight" : {
        "pre_tags" : ["", ""],
        "post_tags" : ["", ""],
        "fields" : {
            "content" : {}
        }
    }
}
'

result

{
    "took": 14,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 2,
        "hits": [
            {
                "_index": "index",
                "_type": "fulltext",
                "_id": "4",
                "_score": 2,
                "_source": {
                    "content": "中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"
                },
                "highlight": {
                    "content": [
                        "中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首 "
                    ]
                }
            },
            {
                "_index": "index",
                "_type": "fulltext",
                "_id": "3",
                "_score": 2,
                "_source": {
                    "content": "中韩渔警冲突调查:韩警平均每天扣1艘中国渔船"
                },
                "highlight": {
                    "content": [
                        "均每天扣1艘中国渔船 "
                    ]
                }
            }
        ]
    }
}





附录 A

下面是关于接下来版本大改的官方原文:

Schedule for removal of mapping typesedit

This is a big change for our users, so we have tried to make it as painless as possible. The change will roll out as follows:

Elasticsearch 5.6.0

Setting index.mapping.single_type: true on an index will enable the single-type-per-index behaviour which will be enforced in 6.0.
The join field replacement for parent-child is available on indices created in 5.6.

Elasticsearch 6.x

  • Indices created in 5.x will continue to function in 6.x as they did in 5.x.
  • Indices created in 6.x only allow a single-type per index. Any name can be used for the type, but there can be only one. The preferred type name is _doc, so that index APIs have the same path as they will have in 7.0: PUT {index}/_doc/{id} and POST {index}/_doc
  • The _type name can no longer be combined with the _id to form the _uid field. The _uid field has become an alias for the _id field.
  • New indices no longer support the old-style of parent/child and should use the join field instead.
    The default mapping type is deprecated.

Elasticsearch 7.x

  • The type parameter in URLs are deprecated. For instance, indexing a document no longer requires a document type. The new index APIs are PUT {index}/_doc/{id} in case of explicit ids and POST {index}/_doc for auto-generated ids.
  • The index creation, GET|PUT _mapping and document APIs support a query string parameter (include_type_name) which indicates whether requests and responses should include a type name. It defaults to true. 7.x indices which don’t have an explicit type will use the dummy type name _doc. Not setting include_type_name=false will result in a deprecation warning.
  • The default mapping type is removed.

Elasticsearch 8.x

  • The type parameter is no longer supported in URLs.
  • The include_type_name parameter is deprecated, default to false and fails the request when set to true.

Elasticsearch 9.x

  • The include_type_name parameter is removed.

原文地址

完!!!

你可能感兴趣的:(linux下安装elasticsearch)