ElasticSearch(以下简称ES)是一个基于Apache Lucene(TM)的开源搜索引擎。无论在开源还是专有领域,Lucene可以被认为是迄今为止最先进、性能最好的、功能最全的搜索引擎库。其使用Java开发并使用Lucene作为其核心来实现所有索引和搜索的功能,但是它的目的是通过简单的RESTful API来隐藏Lucene的复杂性,从而让全文搜索变得简单。
一、安装与启动(windows)
首先在官网下载zip包,下载地址:https://www.elastic.co/downloads/elasticsearch#ga-release,下载后解压,启动bin目录下的elasticsearch.bat,ElasticSearch便启动了。这时在浏览器中输入网址http://localhost:9200/?pretty,可以看到一个Json(如下),显示的是ES的版本等信息。
{
"name": "x62D3ht",
"cluster_name": "elasticsearch",
"cluster_uuid": "yDPE_WTBQE6Hp5ZBydgjSw",
"version": {
"number": "5.6.2",
"build_hash": "57e20f3",
"build_date": "2017-09-23T13:16:45.703Z",
"build_snapshot": false,
"lucene_version": "6.6.1"
},
"tagline": "You Know, for Search"
}
二、索引(index)与查询
在Elasticsearch中存储数据的行为就叫做索引(indexing),不过在索引之前,我们需要明确数据应该存储在哪里。在Elasticsearch中,文档归属于一种类型(type),而这些类型存在于索引(index)中,我们可以拿ES和传统关系型数据库做一个对比:
传统数据库 | ES | 说明 |
---|---|---|
Databases | Indices | 数据库 |
Tables | Types | 表 |
Rows | Documents | 记录 |
Columns | Fields | 字段 |
Elasticsearch集群可以包含多个索引(indices)(数据库),每一个索引可以包含多个类型(types)(表),每一个类型包含多个文档(documents)(行),然后每个文档包含多个字段(Fields)(列)。
在这里要特别说明一下索引(index)在ES中的不同含义。
- 索引(名词) 如上文所述,一个索引(index)就像是传统关系数据库中的数据库,它是相关文档存储的地方,index的复数是indices 或indexes。
- 索引(动词) 「索引一个文档」表示把一个文档存储到索引(名词)里,以便它可以被检索或者查询。这很像SQL中的INSERT关键字,差别是,如果文档已经存在,新的文档将覆盖旧的文档。
- 倒排索引 传统数据库为特定列增加一个索引,例如B-Tree索引来加速检索。Elasticsearch和Lucene使用一种叫做倒排索引(inverted index)的数据结构来达到相同目的。
索引
接下来我们通过建立一个员工目录,并对其进行索引和搜索(可以使用Postman发送请求),首先我们要创建员工目录,大概有如下操作:
- 为每个员工的文档(document)建立索引,每个文档包含了相应员工的所有信息。
- 每个文档的类型为employee。
- employee类型归属于索引megacorp。
- megacorp索引存储在Elasticsearch集群中。
我们只需要一个命令就能完成这些操作:
在Postman中发送PUT请求:localhost:9200//megacorp/employee/1
在body中加入如下参数(Json格式):
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
发送请求后就会将一条员工记录加入到ES中,在Postman中发送GET请求:localhost:9200//megacorp/employee/1就会查询到这一条记录。返回信息如下:
{
"_index": "megacorp",
"_type": "employee",
"_id": "1",
"_version": 1,
"found": true,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [
"sports",
"music"
]
}
}
接下来,让我们在目录中加入更多员工信息:
发送PUT请求:localhost:9200//megacorp/employee/2,并设置body索引第二个员工文档。
{
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
发送PUT请求:localhost:9200//megacorp/employee/3,并设置body索引第三个员工文档。
{
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}
搜索
上边我们录入了3条员工信息,可以通过如下请求搜索全部员工。
发送GET请求:localhost:9200//megacorp/employee/_search
返回信息如下:
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "megacorp",
"_type": "employee",
"_id": "2",
"_score": 1,
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [
"music"
]
}
},
{
"_index": "megacorp",
"_type": "employee",
"_id": "1",
"_score": 1,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [
"sports",
"music"
]
}
},
{
"_index": "megacorp",
"_type": "employee",
"_id": "3",
"_score": 1,
"_source": {
"first_name": "Douglas",
"last_name": "Fir",
"age": 35,
"about": "I like to build cabinets",
"interests": [
"forestry"
]
}
}
]
}
}
可以看到我们使用_search代替原来的文档id,响应内容的数组中包含所有的3个文档,默认情况下此搜索会返回前10条结果。
查询字符串
查询字符串就像传递URL参数一样去传递查询语句,比如查询last_name为"Smith"的文档,可以发送GET请求:localhost:9200//megacorp/employee/_search?q=last_name:Smith
返回的结果如下:
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.2876821,
"hits": [
{
"_index": "megacorp",
"_type": "employee",
"_id": "2",
"_score": 0.2876821,
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [
"music"
]
}
},
{
"_index": "megacorp",
"_type": "employee",
"_id": "1",
"_score": 0.2876821,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [
"sports",
"music"
]
}
}
]
}
}
DSL语句查询
查询字符串便于通过命令进行特定的查询,但是也有一定的局限性,ES提供的更加强大的查询语言(DSL查询),DSL是以Json作为请求体进行查询,这样上面的查询可以使用如下方法:
发送POST请求:localhost:9200//megacorp/employee/_search,并设置body参数:
{
"query" : {
"match" : {
"last_name" : "Smith"
}
}
}
返回的结果与之前用查询字符串查询的结果一样,
更复杂的搜索
eg.查询last_name为"smith" 并且年龄大于30的员工,发送POST请求:localhost:9200//megacorp/employee/_search,设置如下body参数:
{
"query": {
"bool": {
"filter": {
"range": {
"age": {"gt": 30}
}
},
"must": {
"match": {"last_name": "Smith"}
}
}
}
}
响应的内容为:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.2876821,
"hits": [
{
"_index": "megacorp",
"_type": "employee",
"_id": "2",
"_score": 0.2876821,
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [
"music"
]
}
}
]
}
}