ElasticSearch学习(2)安装和初步使用

参考文档

1 安装

1.1 确认Java版本

  1. 最新版ES 6.0.1至少需要Java 8
  2. 手册上推荐Oracle JDK version 1.8.0_131
java -version
echo $JAVA_HOME

1.2 Linux下安装

curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.0.1.tar.gz
tar -xvf elasticsearch-6.0.1.tar.gz
cd elasticsearch-6.0.1/bin
./elasticsearch

1.3 Windows下安装

  1. 下载安装包
  2. 通过GUI进行安装
    ElasticSearch学习(2)安装和初步使用_第1张图片

1.4 启动集群

# linux下
cd %PROGRAMFILES%\Elastic\Elasticsearch\bin
# powershell下
cd $env:PROGRAMFILES\Elastic\Elasticsearch\bin
.\elasticsearch.exe
# 启动时指定集群名字和节点名字
./elasticsearch -Ecluster.name=my_cluster_name -Enode.name=my_node_name

2 探索集群

2.1 REST API

使用REST API可以干的事:

  1. cluster,node,index健康、状态、策略查询
  2. cluster,node,index数据和元数据管理
  3. CRUD、indexes查询操作
  4. 执行预操作,例如paging,sorting,filtering,scripting,aggregations等

2.2 安装kibana

参考文档

  1. 下载和解压kibana
  2. 配置config/kibana.yml文件,设置elasticsearch.url为es实例
  3. 运行kibana
# linux
bin/kibana
# windows
bin\kibana.bat
  1. 使用浏览器登录 http://localhost:5601
  2. kibana用户指南

2.3 集群健康状态

2.3.1 查询集群健康状态

GET /_cat/health?v

使用postman执行查询
ElasticSearch学习(2)安装和初步使用_第2张图片
返回json结果

GET 127.0.0.1:9200/_cat/health?format=json&pretty
# response
[
    {
        "epoch": "1541249930",
        "timestamp": "20:58:50",
        "cluster": "elasticsearch",
        "status": "green",
        "node.total": "1",
        "node.data": "1",
        "shards": "0",
        "pri": "0",
        "relo": "0",
        "init": "0",
        "unassign": "0",
        "pending_tasks": "0",
        "max_task_wait_time": "-",
        "active_shards_percent": "100.0%"
    }
]

三种健康状态:

  1. green:所有服务运行正常,集群全部功能都可用
  2. yellow:所有数据都可用,但部分replica失效,集群全部功能都可用
  3. red:部分数据不可用,集群部分功能可用

2.3.2获取集群节点列表

GET /_cat/nodes?v
GET 127.0.0.1:9200/_cat/nodes?format=json&pretty
# response
[
    {
        "ip": "127.0.0.1",
        "heap.percent": "11",
        "ram.percent": "41",
        "cpu": "8",
        "load_1m": null,
        "load_5m": null,
        "load_15m": null,
        "node.role": "mdi",
        "master": "*",
        "name": "my_first_node"
    }
]

2.4 列出所有indices

GET /_cat/indices?v

2.5 创建index

创建一个名为customer的index,并列出所有分片

PUT /customer?pretty
GET /_cat/indices?v
PUT 127.0.0.1:9200/customer?pretty
# response
{
    "acknowledged": true,
    "shards_acknowledged": true,
    "index": "customer"
}
GET 127.0.0.1:9200/indices?format=json&pretty
# response
[
    {
        "health": "yellow",	# 目前只有一个节点,无法分派1个replica,固为yellow状态
        "status": "open",
        "index": "customer",
        "uuid": "BpesQm0kRhWBauTfht4UZg",
        "pri": "5",					# 5个primary shards
        "rep": "1",					# 1个replica
        "docs.count": "0",	# 0个document
        "docs.deleted": "0",
        "store.size": "1.1kb",
        "pri.store.size": "1.1kb"
    }
]

2.5 index和query document

2.5.1 index

index一个ID为1的customer document 到customer index

PUT /customer/_doc/1?pretty
{
  "name": "John Doe"
}
# response
{
    "_index": "customer",
    "_type": "doc",
    "_id": "1",
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 0,
    "_primary_term": 1
}

2.5.2 query

GET /customer/_doc/1?pretty
# response
{
    "_index": "customer",
    "_type": "doc",
    "_id": "1",
    "_version": 1,
    "found": true,
    "_source": {								# 返回全量JSON document
        "name": "Xiaotong Who"
    }
}

2.6 删除 index

DELETE /customer?pretty
# response
{
    "acknowledged": true
}
GET /_cat/indices?v
# response
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size

2.7 访问ES数据的模式

 ///

3 修改数据

3.1 修改数据:替换 document

对某一document执行PUT操作,ES会自动替换document中的值

# 创建index和document
PUT /customer
PUT /customer/_doc/1?pretty
{
  "name": "John Doe"
}
GET  /customer/_doc/1?pretty
# 替换之前的document
PUT /customer/_doc/1?pretty
{
	"name":"Tom Hu"
}
GET  /customer/_doc/1?pretty
# 创建一个新的document
PUT /customer/_doc/2?pretty
{
	"name":"Yaping Leaf"
}

创建document的时候,ID是可选的,如果没指定,ES会自动生成一个随机ID

3.2 修改数据:更新数据

  1. 除了插入和替换数据,还可以更新数据
  2. 更新数据不是真的跟新,而是把旧的删除,然后创建个新的document
# 修改名字,并增加年龄
POST /customer/_doc/1/_update?pretty
{
	"doc":{"name":"Xiaotong Who","age":20}
}
  1. update支持使用简单的脚本
    ctx._source表示当前document的引用
# 给年龄增加5
POST /customer/_doc/1_update?pretty
{
	"script":"ctx._source.age += 5"
}

3.3 删除documents

DELETE /customer/_doc/2?pretty

3.4 批量处理

# 执行两条index document
POST /customer/_doc/_bulk?pretty
{"index":{"_id":"1"}}
{"name": "Tom Hu" }
{"index":{"_id":"2"}}
{"name": "Yaping Leaf" }
# 执行跟新和删除
POST /customer/_doc/_bulk?pretty
{"update":{"_id":"1"}}
{"doc": { "name": "Tom Hu becomes Xiaotong Who" } }
{"delete":{"_id":"2"}}

4 探索数据

4.1 search API

4.1.1 发送请求

通过URL发送请求

GET /bank/_search?q=*&sort=account_number:asc&pretty
  1. 使用_search节点
  2. q=*参数可以匹配index中的所有document
  3. sort=account_number:asc参数使返回值以account_number字段按升序排序
  4. pretty参数使ES将返回值以JSON的形式返回,便于阅读
# response(部分)
{
  "took" : 63,				# 查询花费了63毫秒
  "timed_out" : false,		# 查询没有超时
  "_shards" : {				# 总共5个shards被查询,成功5个,跳过0个,失败0个
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {				# 查询结果 
    "total" : 1000,			# 满足查询标准的document总量
    "max_score" : null,		
    "hits" : [ {			# 实际查到的document列表,默认前10条数据
      "_index" : "bank",		
      "_type" : "_doc",
      "_id" : "0",
      "sort": [0],			# 排序key
      "_score" : null,	
      "_source" : {"account_number":0,"balance":16623,"firstname":"Bradshaw","lastname":"Mckenzie","age":29,"gender":"F","address":"244 Columbus Place","employer":"Euron","email":"[email protected]","city":"Hobucken","state":"CO"}
    }, {
      "_index" : "bank",
      "_type" : "_doc",
      "_id" : "1",
      "sort": [1],
      "_score" : null,
      "_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"[email protected]","city":"Brogan","state":"IL"}
    }, ...
    ]
  }
}

通过method body发送请求

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ]
}

4.2 query语言

例1 :查询

GET /bank/_search
{
  "query": { "match_all": {} },
  "from": 10,
  "size": 10
}
  1. query:定义查询
  2. match_all:查询全部文档
  3. from:从第10条开始
  4. size:返回数量

例2:按balance字段降序排序

GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": { "balance": { "order": "desc" } }
}

4.3 执行search

4.3.1 查询document中的部分字段

GET /bank/_search
{
  "query": { "match_all": {} },
  "_source": ["account_number", "balance"] 	# 只返回account_number和balance字段
}

4.3.2 条件查询

返回account_number为20的document

GET /bank/_search
{
  "query": { "match": { "account_number": 20 } }
}

返回在address字段中含有“mill“或者”lane”

GET /bank/_search
{
  "query": { "match": { "address": "mill lane" } }
}

返回在address字段中含有“mill lane”

GET /bank/_search
{
  "query": { "match_phrase": { "address": "mill lane" } }
}

4.3.3 Bool查询

4.3.3.1 must子句:所有match查询都必须为都真才会匹配成功

查询address字段同时包含mill和lane的document

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}
4.3.3.2 shoud子句:只要有一个match查询为真就会匹配成功

查询address字段包含“mill”或“lane”的document

GET /bank/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}
4.3.3.3 must_not子句:所有match查询都为假,才会匹配成功

查询address字段既不包含“mill”也不包含“lane”的document

GET /bank/_search
{
  "query": {
    "bool": {
      "must_not": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}
4.3.3.4 组合bool查询

查询age为40,state不为ID的账户:

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}

4.4 执行filter

4.4.1 document score (_score字段)

  1. 用来表示document与我们指定的查询的相关程度,数值越大,相关度越高,数值越小,相关度越低。
  2. 查询不总是会有score来衡量相关性,一般在执行filter的时候才会涉及
    3.bool查询也支持filter子句,在bool查询中写filter子句,可以让我们在不用计算score增减的情况下,使用别的子句来条件查询document

4.4.2 range查询

可以通过限定一个范围值来过滤文档,通常用在数值或者日期的过滤。
举例:返回余额在20000到30000的账户

GET /bank/_search
{
  "query": {
    "bool": {
      "must": { "match_all": {} },	# 匹配所有document
      "filter": {					# filter子句
        "range": {					# range子句
          "balance": {				# 余额
            "gte": 20000,			# 大于等于20000 
            "lte": 30000			# 小于等于30000 
          }
        }
      }
    }
  }
}

4.5 执行aggregation

  1. 提供类似SQL GROUP BY语句以及SQL Aggregation功能
  2. 能同时返回search结果集以及aggregation结果

4.5.1 Group By

将账户按state排序,然后按count降序排序,返回Top10的state
如果写SQL

SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC LIMIT 10;
GET /bank/_search
{
  "size": 0,	# 在response中不返回查询到的document,只需要数量
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
  }
}

# response
{
  "took": 29,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped" : 0,
    "failed": 0
  },
  "hits" : {
    "total" : 1000,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_state" : {
      "doc_count_error_upper_bound": 20,
      "sum_other_doc_count": 770,
      "buckets" : [ {
        "key" : "ID",
        "doc_count" : 27
      }, {
        "key" : "TX",
        "doc_count" : 27
      }, {
        "key" : "AL",
        "doc_count" : 25
      }, {
        "key" : "MD",
        "doc_count" : 25
      }, {
        "key" : "TN",
        "doc_count" : 23
      }, {
        "key" : "MA",
        "doc_count" : 21
      }, {
        "key" : "NC",
        "doc_count" : 21
      }, {
        "key" : "ND",
        "doc_count" : 21
      }, {
        "key" : "ME",
        "doc_count" : 20
      }, {
        "key" : "MO",
        "doc_count" : 20
      } ]
    }
  }
}

4.5.2 aggregation语句中嵌套aggregation

通常用于对聚合得到的数据做另外的总结操作
对账户数在前十的州求余额平均值

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

按账户数前十的州余额平均值降序排序

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword",
        "order": {
          "average_balance": "desc"
        }
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

按年龄段分别聚合,然后在各年龄段内按性别聚合,然后获取到各年龄段中各性格的平均账户余额

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_age": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 20,
            "to": 30
          },
          {
            "from": 30,
            "to": 40
          },
          {
            "from": 40,
            "to": 50
          }
        ]
      },
      "aggs": {
        "group_by_gender": {
          "terms": {
            "field": "gender.keyword"
          },
          "aggs": {
            "average_balance": {
              "avg": {
                "field": "balance"
              }
            }
          }
        }
      }
    }
  }
}

你可能感兴趣的:(大数据)