Search API
- URI Search
- 在URL中使用查询参数
- Request Body Search
- 使用Elasticsearch提供的,基于JSON格式的更加完备的Query Domain Specific Language(DSL)
指定查询的索引
语法 | 范围 |
---|---|
/_search | 集群上所有的索引 |
/index1/_search | index1 |
/index1,index2/_search | index1和index2 |
/index*/_search | 以index开头的索引 |
URI查询
- 使用"q",指定查询字符串
- "query string syntax", KV 键值对
例如:
curl -XGET "http://elasticsearch:9200/kibana_sample_data_ecommerce/_search?q=customer_first_name:Eddie"
其中kibana_sample_data_ecommerce
为指定的index,q用来表示查询的内容,搜索叫做Eddie的客户,customer_first_name
为K,Eddie
为V
Request Body
例如:
curl -XGET "http://elasticsearch:9200/kibana_sample_data_ecommerce/_search" -H
'Content-Type: application/json' -d '
{
"query":{
"match_all":{}
}
}'
其中kibana_sample_data_ecommerce
为索引名,_search
为执行搜索的操作 query
为查询,match_all
为查询所有,返回所有文档
搜索Response
{
"took" : 10 --- took:花费的时间
"time_out": false,
"_shards": {
"total": 1,
"skipped":0,
"failed":0
},
"hits": {
"total":4675, --- total: 符合条件的总文档数
"max_score": 1,
"hits": [ --- 结果集,默认前10个文档
{
"_index":"kibana_sample_data_ecommerce", --- 索引名称
"_type" : "_doc",
"_id" : "CbLRW2kBi-meog", --- 文档ID
"_score":1, --- 相关度评分
"_source":{ --- 文档原始信息
"category": ["Men's Clothing"],
"currency":"EUR",
"customer_first_name":"Eddie"
}
}
]
}
}
URI Search - 通过URI query实现搜索
GET /movies/_search?q=2012&df=title&sort=year:desc&from=0&size=10&timeout=1s
{
"profile":true
}
// 与上方语句相同 不使用df指定key,则q后用k:v
GET /movies/_search?q=title:2012&sort=year:desc&from=0&size=10&timeout=1s
{
"profile":true
}
- q 指定查询语句,使用Query String Syntax
- df 默认字段,不指定时,对所有字段进行查询
- sort排序 / from 和 size 用于分页
- profile 可以查看查询是如何被执行的
Query String Syntax(1)
- 指定字段v.s泛查询
- q=title:2012 / q=2012
- Term v.s Phrase
- Beautiful Mind 等效于 Beautiful OR Mind 要用()括起来
- "Beautiful Mind",等效于Beautiful AND Mind. Phrase查询,还要求前后顺序保持一致
- 分组与引号
- title:(Beautiful AND Mind)
- title="Beautiful Mind"
Query String Syntax(2)
- 布尔操作
- AND/ OR/ NOT 或者 &&/ ||/ !
- 必须大写
- title:(matrix NOT reloaded)
- 分组
-
+
表示 must 也可用%2B
-
-
表示 must_not - title:(+matrix -reloaded)
-
- AND/ OR/ NOT 或者 &&/ ||/ !
Query String Syntax(3)
- 范围查询
* 区间表示:[]闭区间,{}开区间
* year:{2019 TO 2018}
* year:[* TO 2018] - 算数符号
- year:>2010
- year:(>2010&&<=2018)
- year:(+>2010 +<=2018)
Query String Syntax(4)
- 通配符查询(通配符查询效率低,占用内存大,不建议使用。特别是放在最前面)
-
?
代表1个字符,*
代表0或多个字符- title:mi?d
- title:be*
-
- 正则表达
- title:[bt]oy
- 模糊匹配与近似查询
- title:befutifl~1
- title:"lord rings"~2
Request Body Search
- 将查询语句通过HTTP Request Body发送给Elasticsearch
- Query DSL
POST /movies,404_idx/_search?ignore_unavailable=true
{
"profile":true
"query":{
"_source":["a","b","c"] --- 要查询的字段,如果_source中没有存储内容,则只返回匹配的文档的元数据,支持通配符
"sort":[{"order_date":"desc"}] --- 排序
"from":10,
"size":20,
"match_all":{}
}
}
- 最好在“数字型”与"日期型"字段上排序
- 因为对于多值类型或分析过的字段排序,系统会选一个值,无法得知该值
脚本字段
GET kibana_sample_data_ecommerce/_search
{
"script_fields": {
"new_field":{
"script": {
“lang”: "painless",
"source":"doc['order_date'].value+'_hello'"
}
}
},
"from":10,
"size":5,
"query":{
"match_all":{}
}
}
使用场景:订单中有不同的汇率,需要结合汇率对订单价格进行排序或计算出新的值
使用查询表达式 - Match
// 出现Last或Christmas或Last Christmas
GET /comments/_doc/_search
{
"query":{
"match": {
"comment": "Last Christmas"
}
}
}
// 同时出现 Last 和 Chrismas
GET /comments/_doc/_search
{
”query“:{
"match":{
"comment": {
"query":"Last Chrismas",
”operator“: "AND"
}
}
}
}
短语搜索 - Match Phrase
- 通过设置slop为1,查询出现'SongLast Chrismas'短语的结果
GET /comments/_doc/_search
{
"query":{
"match_phrase":{
"comment": {
"query":"SongLast Chrismas",
"slop": 1
}
}
}
}
Query String Query
- 类似URI Query
POST users/_search
{
"query": {
"query_string" : {
"default_field":"name",
"query":"Ruan AND Yiming"
}
}
}
POST users/_search
{
"query": {
"query_string": {
"fields":["name", "about"],
"query" : "(Ruan AND Yiming) OR (Java AND Elasticsearch)"
}
}
}
Simple Query String Query
- 类似Query String,但是会忽略错误的语法,同时只支持部分查询语法
- 不支持AND OR NOT, 会当作字符串处理,可以通过指定default_operator来使用对应的功能
- Term之间默认的关系是OR, 可以指定Operator
- 支持部分逻辑
-
+
替代AND -
|
替代 OR -
-
替代NOT
-
POST users/_search
{
"query": {
"simple_query_string": {
"query": "Ruan - Yiming",
"fields": ["name"],
"default_operator": "AND"
}
}
}
Mapping
- Mapping 类似数据库中的schema的定义,作用如下
- 定义索引中的字段的名称
- 定义字段的数据类型,例如字符串,数字,布尔......
- 字段,倒排索引的相关配置,(Analyzed or Not Analyzed, Analyzer)
- Mapping 会把JSON文档映射成Lucene所需要的扁平格式
- 一个Mapping属于一个索引的Type
- 每个文档都属于一个Type
- 一个Type有一个Mapping定义
- 7.0开始,不需要在Mapping定义中指定type信息
字段的数据类型
- 简单类型
- Text/Keyword
- Date
- Integer / Floating
- Boolean
- IPv4 & IPv6
- 复杂类型 - 对象和嵌套对象
- 对象类型/ 嵌套类型
- 特殊类型
- geo_point & geo_shape / percolator
Dynamic Mapping
- 在写入文档时候,如果索引不存在,会自动创建索引
- Dynamic Mapping的机制,使得我们无需手动定义Mappings.Elasticsearch会自动根据文档信息,推算出字段的类型
- 但是有时候会推算不对,例如地理位置信息
- 当类型如果设置不对时,会导致一些功能无法正常使用,例如Range查询
类型自动识别
JSON类型 | Elasticsearch类型 |
---|---|
字符串 | 匹配日期格式,设置成Date 配置数字设置为float或者long,该选项默认关闭 设置为Text,并且增加keyword子字段 |
布尔值 | boolean |
浮点数 | float |
整数 | long |
对象 | Object |
数组 | 由第一个非空数值的类型所决定 |
空值 | 忽略 |
- demo
// 写入文档
PUT mapping_test/_doc/1
{
"firstName": "Chan",
"lastName": "Jackie",
"loginDate": "2018-07-24T10:29:48.103Z"
}
// 查看Mapping文件
GET mapping_test/_mapping
// GET返回值
{
"mapping_test" : {
"mappings" : {
"properties" : {
"firstName" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"lastName" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"loginDate" : {
"type" : "date"
}
}
}
}
}
// Delete index
DELETE mapping_test
// dynamic mapping,推断字段类型
PUT mapping_test/_doc/1
{
"uid":"123",
"isVip": false,
"isAdmin": "true",
"age":19,
"heigh":180
}
GET mapping_test/_mapping
// 返回值
{
"mapping_test" : {
"mappings" : {
"properties" : {
"age" : {
"type" : "long"
},
"heigh" : {
"type" : "long"
},
"isAdmin" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"isVip" : {
"type" : "boolean"
},
"uid" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
能否更改Mapping的字段类型
- 两种情况
- 新增加字段
- Dynamic设为true时,一旦有新增字段的文档写入,Mapping也同时被更新
- Dynamic设为false,Mapping不会被更新,新增字段的数据无法被索引,但是信息会出现在_source中
- Dynamic设置为Strict, 文档写入失败
- 对已有字段,一旦已经有数据写入,就不再支持修改字段定义
- Lucene 实现的倒排索引,一旦生成后,就不允许修改
- 如果希望改变字段类型,必须Reindex API,重建索引
- 新增加字段
- 原因
- 如果修改了字段的数据类型,会导致已被索引的数据无法被搜索
- 但是如果时增加新的字段, 就不会有这样的影响
控制Dynamic Mappings
"true" | "false" | "strict" | |
---|---|---|---|
文档可索引 | YES | YES | NO |
字段可索引 | YES | NO | NO |
Mapping被更新 | YES | NO | NO |
PUT movies
{
"mappings": {
"_doc": {
"dynamic": "false"
}
}
}
- 当dynamic被设置成false时,存在新增字段的数据写入,该数据可以被索引,但是新增字段被丢弃
- 当设置成Strict模式时,数据写入直接出错
显示定义一个Mapping
PUT movies
{
"mappings": {
//...
}
}
控制当前字段是否被索引
- index - 控制当前字段是否被索引。默认为true。如果设置为false,该字段不可被搜索
PUT users
{
"mappings": {
"properties": {
"firstName": {
"type": "text"
},
"lastName": {
"type": "text"
},
"mobile": {
"type": "text",
"index": false --- 无法通过手机号进行搜索
}
}
}
}
Index Options
- 四种不同级别的Index Options配置,可以控制倒排索引记录的内容
- docs - 记录 doc id
- freqs - 记录doc id 和 term frequencies
- positions - 记录 doc id / term frequencies / term position
- offsets - doc id / term frequencies/ term posistion / character offects
Null_value
- 需要对Null值实现搜索
- 只有Keyword类型支持设定Null_Value
GET users/_search?q=mobile:NULL
PUT users
{
"mappings": {
"properties": {
"firstName": {
"type": "text"
},
"lastName": {
"type": "text"
},
"mobile": {
"type": "keyword",
"null_value": "NULL"
}
}
}
}
// 返回
"_source": {
"firstName": "Ruan",
"lastName": "Yiming",
"mobile": null
}
copy_to 设置
- _all 在7中被copy_to所替代
- 满足一些特定的搜索需求
- copy_to将字段的数值拷贝到目标字段,实现类似_all的作用
- copy的目标字段不出现在_source中
PUT users
{
"mappings": {
"properties": {
"firstName": {
"type": "text",
"copy_to":"fullName"
},
"lastName": {
"type": "text",
"copy_to": "fullName"
}
}
}
}
// 查询时
GET users/_search?q=fullName:(Ruan YIming)
多字段类型
- 多字段特性
- 厂商名字实现精确匹配
- 增加一个keyword字段
- 使用不同的analyzer
- 不同语言
- 拼音字段的搜索
- 还支持为搜索和索引指定不同的analyzer
- 厂商名字实现精确匹配
PUT products
{
"mappings" : {
"properties": {
"company": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"comment": {
"type": "text",
"fields": {
"english_comment": {
"type" : "text",
"analyzer": "english",
"search_analyzer": "english"
}
}
}
}
}
}
Exact Values vs Full Text
- Excat values: 包括数字/ 日期/ 具体一个字符串(例如“Apple Store”)
- Elasticsearch中的keyword
- 不需要做特殊的分词处理
- Full Text:全文本, 非结构化的文本数据
- Elasticsearch中的text
自定义分词
- 当Elasticsearch自带的分词器无法满足时,可以自定义分词器。通过自组合不同的组件实现
- Character Filter
- Tokenizer
- Token Filter
Character Filters
- 在Tokenizer之前对文本进行处理,例如增加删除及替换字符。可以配置多个Character Filters.会影响Tokenizer的position和offset信息
- 一些自带的Character Filters
- HTML strip - 去除html标签
POST _analyze
{
"tokenizer": "keyword",
"char_filter": ["html_strip"],
"text": "hello world"
}
// 返回值
{
"token": [
"token": "hello world",
"start_offset": 3,
"end_offset": 18,
"type": "word",
"position": 0
]
}
- Mapping - 字符串替换
POST _analyze
{
"tokenizer": "standard",
"char_filter": [
{
"type": "mapping",
"mappings": ["- => _"]
}
],
"text": "123-456, I-test! test-990 650-555-1234"
}
// 返回值
{
"tokens" : [
{
"token" : "123_456",
"start_offset" : 0,
"end_offset" : 7,
"type" : "",
"position" : 0
},
{
"token" : "I_test",
"start_offset" : 9,
"end_offset" : 15,
"type" : "",
"position" : 1
},
{
"token" : "test_990",
"start_offset" : 17,
"end_offset" : 25,
"type" : "",
"position" : 2
},
{
"token" : "650_555_1234",
"start_offset" : 26,
"end_offset" : 38,
"type" : "",
"position" : 3
}
]
}
// 替换表情
POST _analyze
{
"tokenizer": "standard",
"char_filter": [
{
"type": "mapping",
"mappings":[":) => happy", ":( => sad"]
}
],
"text": ["I am felling :)", "Feeling :( today"]
}
// 返回值
{
"tokens" : [
{
"token" : "I",
"start_offset" : 0,
"end_offset" : 1,
"type" : "",
"position" : 0
},
{
"token" : "am",
"start_offset" : 2,
"end_offset" : 4,
"type" : "",
"position" : 1
},
{
"token" : "felling",
"start_offset" : 5,
"end_offset" : 12,
"type" : "",
"position" : 2
},
{
"token" : "happy",
"start_offset" : 13,
"end_offset" : 15,
"type" : "",
"position" : 3
},
{
"token" : "Feeling",
"start_offset" : 16,
"end_offset" : 23,
"type" : "",
"position" : 104
},
{
"token" : "sad",
"start_offset" : 24,
"end_offset" : 26,
"type" : "",
"position" : 105
},
{
"token" : "today",
"start_offset" : 27,
"end_offset" : 32,
"type" : "",
"position" : 106
}
]
}
- Pattern replace - 正则匹配替换
GET _analyze
{
"tokenizer": "standard",
"char_filter": [
{
"type": "pattern_replace",
"pattern": "http://(.*)",
"replacement": "$1"
}
],
"text": "http://www.elastic.co"
}
// 返回
{
"tokens" : [
{
"token" : "www.elastic.co",
"start_offset" : 0,
"end_offset" : 21,
"type" : "",
"position" : 0
}
]
}
Tokenizer
- 将原始的文本按照一定的规则,切分为词(term or token)
- Elasticsearch内置的Tokenizers
- whitespace / standard / uax_url_email / pattern / keyword / path hierarchy
- 可以自己实现Tokenizer
POST _analyze
{
"tokenizer": "path_hierarchy",
"text": "/user/ymruan/a/b/c/d/e"
}
// 返回
{
"tokens" : [
{
"token" : "/user",
"start_offset" : 0,
"end_offset" : 5,
"type" : "word",
"position" : 0
},
{
"token" : "/user/ymruan",
"start_offset" : 0,
"end_offset" : 12,
"type" : "word",
"position" : 0
},
{
"token" : "/user/ymruan/a",
"start_offset" : 0,
"end_offset" : 14,
"type" : "word",
"position" : 0
},
{
"token" : "/user/ymruan/a/b",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 0
},
{
"token" : "/user/ymruan/a/b/c",
"start_offset" : 0,
"end_offset" : 18,
"type" : "word",
"position" : 0
},
{
"token" : "/user/ymruan/a/b/c/d",
"start_offset" : 0,
"end_offset" : 20,
"type" : "word",
"position" : 0
},
{
"token" : "/user/ymruan/a/b/c/d/e",
"start_offset" : 0,
"end_offset" : 22,
"type" : "word",
"position" : 0
}
]
}
Token Filters
- 将Tokenizer输出的单词(term),进行增加,修改,删除
- 自带的Token Filters
- Lowercase / stop/ synonym(添加近义词)
GET analyze
{
"tokenizer": "whitespace",
"filter": ["stop"],
"text": ["The rain in Spain falls mainly on the plain."]
}
// 返回值
{
"tokens" : [
{
"token" : "The",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 0
},
{
"token" : "rain",
"start_offset" : 4,
"end_offset" : 8,
"type" : "word",
"position" : 1
},
{
"token" : "Spain",
"start_offset" : 12,
"end_offset" : 17,
"type" : "word",
"position" : 3
},
{
"token" : "falls",
"start_offset" : 18,
"end_offset" : 23,
"type" : "word",
"position" : 4
},
{
"token" : "mainly",
"start_offset" : 24,
"end_offset" : 30,
"type" : "word",
"position" : 5
},
{
"token" : "plain.",
"start_offset" : 38,
"end_offset" : 44,
"type" : "word",
"position" : 8
}
]
}
设置一个Custom Analyzer
{
"settings": {
"analysis": {
"analyzer": {
"my_english": {
"type": "english",
"stem_exclusion": ["organization", "organizations"],
"stopwords": [
"a","an","and","are","as","at","be","but","by","for","if","in","into","is","it","of","on","or",
"such", "that", "the", "their", "then", "there", "these", "they", "this", "to","was","will","with"
]
}
}
}
}
}
PUT mapping_test/_doc/1
{
"firstName": "Chan",
"lastName": "Jackie",
"loginDate": "2018-07-24T10:29:48.103Z"
}
GET mapping_test/_mapping
DELETE mapping_test
PUT mapping_test/_doc/1
{
"uid":"123",
"isVip": false,
"isAdmin": "true",
"age":19,
"heigh":180
}
POST _analyze
{
"tokenizer": "standard",
"char_filter": [
{
"type": "mapping",
"mappings": ["- => _"]
}
],
"text": "123-456, I-test! test-990 650-555-1234"
}
POST _analyze
{
"tokenizer": "standard",
"char_filter": [
{
"type": "mapping",
"mappings":[":) => happy", ":( => sad"]
}
],
"text": ["I am felling :)", "Feeling :( today"]
}
GET _analyze
{
"tokenizer": "standard",
"char_filter": [
{
"type": "pattern_replace",
"pattern": "http://(.*)",
"replacement": "$1"
}
],
"text": "http://www.elastic.co"
}
POST _analyze
{
"tokenizer": "path_hierarchy",
"text": "/user/ymruan/a/b/c/d/e"
}
GET _analyze
{
"tokenizer": "whitespace",
"filter": ["stop"],
"text": ["The rain in Spain falls mainly on the plain."]
}
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"char_filter": [
"emoticons"
],
"tokenizer":"punctuation",
"filter":[
"lowercase",
"english_stop"
]
}
},
"tokenizer": {
"punctuation": {
"type": "pattern",
"pattern": "[.,!?]"
}
},
"char_filter": {
"emoticons": {
"type": "mapping",
"mappings": [
":) => _happy_",
":( => _sad_"
]
}
},
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
}
}
}
}
}
// 返回
{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "my_index"
}
POST my_index/_analyze
{
"analyzer": "my_custom_analyzer",
"text": "I'm a :) person, and you ?"
}
// 返回
{
"tokens" : [
{
"token" : "i'm a _happy_ person",
"start_offset" : 0,
"end_offset" : 15,
"type" : "word",
"position" : 0
},
{
"token" : " and you ",
"start_offset" : 16,
"end_offset" : 25,
"type" : "word",
"position" : 1
}
]
}