ElasticSearch
是一款非常强大的开源搜素引擎,具备非常强大的功能,可以帮助我们从海量数据中快速找到需要的内容4090显卡
会以红色标识ElasticSearch
结合kibana
、Logstash
、Beats
,也就是elastic stack
(ELK)。被广泛应用在日志数据分析、实时监控等领域ElasticSearch
是elastic stack
的核心,负责存储、搜索、分析数据ElasticSearch底层是基于Lucene来实现的
Lucene是一个Java语言的搜索引擎类库,是Apache公司的顶级项目,由DougCutting于1999年研发,官网地址:https://lucene.apache.org/
Lucene的优势
Lucene的缺点
ElasticSearch的发展史
相比于Lucene,ElasticSearch具备以下优势
id | title | price |
---|---|---|
1 | 小米手机 | 3499 |
2 | 华为手机 | 4999 |
3 | 华为小米充电器 | 49 |
4 | 小米手环 | 49 |
select id, title, price from tb_goods where title like %手机%
%手机%
词条(term) | 文档id |
---|---|
小米 | 1,3,4 |
手机 | 1,2 |
华为 | 2,3 |
充电器 | 3 |
手环 | 4 |
华为手机
为例
华为手机
,进行搜索。正向索引
是最传统的,根据id索引的方式。但是根据词条查询是,必须先逐条获取每个文档,然后判断文档中是否包含所需要的词条,是根据文档查找词条的过程
倒排索引
则相反,是先找到用户要搜索的词条,然后根据词条得到包含词条的文档id,然后根据文档id获取文档,是根据词条查找文档的过程
正向索引
倒排索引
ElasticSearch中有很多独有的概念,与MySQL中略有差别,但也有相似之处
{
"id": 1,
"title": "小米手机",
"price": 3499
}
{
"id": 2,
"title": "华为手机",
"price": 4999
}
{
"id": 3,
"title": "华为小米充电器",
"price": 49
}
{
"id": 4,
"title": "小米手环",
"price ": 299
}
索引(Index),就是相同类型的文档的集合
例如
{
"id": 101,
"name": "张三",
"age": 39
}
{
"id": 102,
"name": "李四",
"age": 49
}
{
"id": 103,
"name": "王五",
"age": 69
}
{
"id": 1,
"title": "小米手机",
"price": 3499
}
{
"id": 2,
"title": "华为手机",
"price": 4999
}
{
"id": 3,
"title": "苹果手机",
"price": 6999
}
{
"id": 11,
"userId": 101,
"goodsId": 1,
"totalFee": 3999
}
{
"id": 12,
"userId": 102,
"goodsId": 2,
"totalFee": 4999
}
{
"id": 13,
"userId": 103,
"goodsId": 3,
"totalFee": 6999
}
因此,我们可以把索引当做是数据库中的表
数据库的表会有约束信息,用来定义表的结构、字段的名称、类型等信息。因此,索引库就有映射(mapping)
,是索引中文档的字段约束信息,类似于表的结构约束
MySQL | Elasticsearch | 说明 |
---|---|---|
Table | Index | 索引(index),就是文档的集合,类似数据库的表(Table) |
Row | Document | 文档(Document),就是一条条的数据,类似数据库中的行(Row),文档都是JSON格式 |
Column | Field | 字段(Field),就是JSON文档中的字段,类似数据库中的列(Column) |
Schema | Mapping | Mapping(映射)是索引中文档的约束,例如字段类型约束。类似数据库的表结构(Schema) |
SQL | DSL | DSL是elasticsearch提供的JSON风格的请求语句,用来操作elasticsearch,实现CRUD |
二者各有自己擅长之处
MySQL
:产长事务类型操作,可以保证数据的安全和一致性ElasticSearch
:擅长海量数据的搜索、分析、计算因此在企业中,往往是这二者结合使用
docker network create es-net
docker pull elasticsearch:7.12.1
docker run -d \
--name es \
-e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
-e "discovery.type=single-node" \
-v es-data:/usr/share/elasticsearch/data \
-v es-plugins:/usr/share/elasticsearch/plugins \
--privileged \
--network es-net \
-p 9200:9200 \
elasticsearch:7.12.1
命令解释:
-e "ES_JAVA_OPTS=-Xms512m -Xmx512m"
:配置JVM的堆内存大小,默认是1G,但是最好不要低于512M-e "discovery.type=single-node"
:单点部署-v es-data:/usr/share/elasticsearch/data
:数据卷挂载,绑定es的数据目录-v es-plugins:/usr/share/elasticsearch/plugins
:数据卷挂载,绑定es的插件目录-privileged
:授予逻辑卷访问权--network es-net
:让ES加入到这个网络当中-p 9200
:暴露的HTTP协议端口,供我们用户访问的成功启动之后,打开浏览器访问:http://192.168.128.130:9200/, 即可看到elasticsearch的响应结果
docker pull kibana:7.12.1
docker run -d \
--name kibana \
-e ELASTICSEARCH_HOSTS=http://es:9200 \
--network=es-net \
-p 5601:5601 \
kibana:7.12.1
--network=es-net
:让kibana加入es-net
这个网络,与ES在同一个网络中-e ELASTICSEARCH_HOSTS=http://es:9200
:设置ES的地址,因为kibana和ES在同一个网络,因此可以直接用容器名访问ES-p 5601:5601
:端口映射配置# 进入容器内部
docker exec -it elasticsearch /bin/bash
# 在线下载并安装
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.12.1/elasticsearch-analysis-ik-7.12.1.zip
#退出
exit
#重启容器
docker restart elasticsearch
ik_smart
:最少切分ik_max_word
:最细切分GET /_analyze
{
"analyzer": "ik_smart",
"text": "青春猪头G7人马文不会梦到JK黑丝兔女郎铁驭艾许"
}
{
"tokens" : [
{
"token" : "青春",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "猪头",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "G7",
"start_offset" : 4,
"end_offset" : 6,
"type" : "LETTER",
"position" : 2
},
{
"token" : "人",
"start_offset" : 6,
"end_offset" : 7,
"type" : "COUNT",
"position" : 3
},
{
"token" : "不会",
"start_offset" : 7,
"end_offset" : 9,
"type" : "CN_WORD",
"position" : 4
},
{
"token" : "梦到",
"start_offset" : 9,
"end_offset" : 11,
"type" : "CN_WORD",
"position" : 5
},
{
"token" : "jk",
"start_offset" : 11,
"end_offset" : 13,
"type" : "ENGLISH",
"position" : 6
},
{
"token" : "黑",
"start_offset" : 13,
"end_offset" : 14,
"type" : "CN_CHAR",
"position" : 7
},
{
"token" : "丝",
"start_offset" : 14,
"end_offset" : 15,
"type" : "CN_CHAR",
"position" : 8
},
{
"token" : "兔女郎",
"start_offset" : 15,
"end_offset" : 18,
"type" : "CN_WORD",
"position" : 9
},
{
"token" : "铁",
"start_offset" : 18,
"end_offset" : 19,
"type" : "CN_CHAR",
"position" : 10
},
{
"token" : "驭",
"start_offset" : 19,
"end_offset" : 20,
"type" : "CN_CHAR",
"position" : 11
},
{
"token" : "艾",
"start_offset" : 20,
"end_offset" : 21,
"type" : "CN_CHAR",
"position" : 12
},
{
"token" : "许",
"start_offset" : 21,
"end_offset" : 22,
"type" : "CN_CHAR",
"position" : 13
}
]
}
GET /_analyze
{
"analyzer": "ik_max_word",
"text": "青春猪头G7人马文不会梦到JK黑丝兔女郎铁驭艾许"
}
{
"tokens" : [
{
"token" : "青春",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "猪头",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "G7",
"start_offset" : 4,
"end_offset" : 6,
"type" : "LETTER",
"position" : 2
},
{
"token" : "G",
"start_offset" : 4,
"end_offset" : 5,
"type" : "ENGLISH",
"position" : 3
},
{
"token" : "7",
"start_offset" : 5,
"end_offset" : 6,
"type" : "ARABIC",
"position" : 4
},
{
"token" : "人马",
"start_offset" : 6,
"end_offset" : 8,
"type" : "CN_WORD",
"position" : 5
},
{
"token" : "人",
"start_offset" : 6,
"end_offset" : 7,
"type" : "COUNT",
"position" : 6
},
{
"token" : "马文",
"start_offset" : 7,
"end_offset" : 9,
"type" : "CN_WORD",
"position" : 7
},
{
"token" : "不会",
"start_offset" : 9,
"end_offset" : 11,
"type" : "CN_WORD",
"position" : 8
},
{
"token" : "梦到",
"start_offset" : 11,
"end_offset" : 13,
"type" : "CN_WORD",
"position" : 9
},
{
"token" : "jk",
"start_offset" : 13,
"end_offset" : 15,
"type" : "ENGLISH",
"position" : 10
},
{
"token" : "黑",
"start_offset" : 15,
"end_offset" : 16,
"type" : "CN_CHAR",
"position" : 11
},
{
"token" : "丝",
"start_offset" : 16,
"end_offset" : 17,
"type" : "CN_CHAR",
"position" : 12
},
{
"token" : "兔女郎",
"start_offset" : 17,
"end_offset" : 20,
"type" : "CN_WORD",
"position" : 13
},
{
"token" : "女郎",
"start_offset" : 18,
"end_offset" : 20,
"type" : "CN_WORD",
"position" : 14
},
{
"token" : "铁",
"start_offset" : 20,
"end_offset" : 21,
"type" : "CN_CHAR",
"position" : 15
},
{
"token" : "驭",
"start_offset" : 21,
"end_offset" : 22,
"type" : "CN_CHAR",
"position" : 16
},
{
"token" : "艾",
"start_offset" : 22,
"end_offset" : 23,
"type" : "CN_CHAR",
"position" : 17
},
{
"token" : "许",
"start_offset" : 23,
"end_offset" : 24,
"type" : "CN_CHAR",
"position" : 18
}
]
}
{% endtabs %}
人马
,而在最细切分时,被分为了人马
,而且目前现在识别不了黑丝
、铁驭
、艾许
等词汇,所以我们需要自己扩展词典{% note info no-icon %}
随着互联网的发展,造词运动
也愈发频繁。出现了许多新词汇,但是在原有的词汇表中并不存在,例如白给
、白嫖
等
所以我们的词汇也需要不断的更新,IK分词器提供了扩展词汇的功能
{% endnote %}
IK Analyzer 扩展配置
ext.dic
stopword.dic
艾许
铁驭
黑丝
兔女郎
{% endtabs %}
4. 重启es
docker restart es
{
"tokens" : [
{
"token" : "青春",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "猪头",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "g7",
"start_offset" : 4,
"end_offset" : 6,
"type" : "LETTER",
"position" : 2
},
{
"token" : "人",
"start_offset" : 6,
"end_offset" : 7,
"type" : "COUNT",
"position" : 3
},
{
"token" : "不会",
"start_offset" : 7,
"end_offset" : 9,
"type" : "CN_WORD",
"position" : 4
},
{
"token" : "梦到",
"start_offset" : 9,
"end_offset" : 11,
"type" : "CN_WORD",
"position" : 5
},
{
"token" : "jk",
"start_offset" : 11,
"end_offset" : 13,
"type" : "ENGLISH",
"position" : 6
},
{
"token" : "铁驭",
"start_offset" : 18,
"end_offset" : 20,
"type" : "CN_WORD",
"position" : 7
},
{
"token" : "艾许",
"start_offset" : 20,
"end_offset" : 22,
"type" : "CN_WORD",
"position" : 8
}
]
}
{% note info no-icon %}
库
和表
type
:字段数据类型,常见的简单类型有
index
:是否创建索引,默认为true,默认情况下会对所有字段创建倒排索引,即每个字段都可以被搜索。但是某些字段是不存在搜索的意义的,例如邮箱,图片(存储的只是图片url),搜索邮箱或图片url的片段,没有任何意义。因此我们在创建字段映射时,一定要判断一下这个字段是否参与搜索,如果不参与搜索,则将其设置为falseanalyzer
:使用哪种分词器properties
:该字段的子字段{
"age": 32,
"weight": 48,
"isMarried": false,
"info": "次元游击兵--恶灵",
"email": "[email protected]",
"score": [99.1, 99.5, 98.9],
"name": {
"firstName": "雷尼",
"lastName": "布莱希"
}
}
字段 | 类型 | index | analyzer |
---|---|---|---|
age | integer | true | null |
weight | float | true | null |
isMarried | boolean | true | null |
info | text | true | ik_smart |
keyword | false | null | |
score | float | true | null |
name | object | ||
name.firstName | keyword | true | null |
name.lastName | keyword | true | null |
score
:虽然是数组,但是我们只看其中元素的类型,类型为float;email
不参与搜索,所以index
为false
;info
参与搜索,且需要分词,所以需要设置一下分词器{% note info no-icon %}
小结
PUT
/{索引库名}
,可以自定义mapping映射
PUT /{索引库名}
{
"mappings": {
"properties": {
"字段名1": {
"type": "text ",
"analyzer": "standard"
},
"字段名2": {
"type": "text",
"index": true
},
"字段名3": {
"type": "text",
"properties": {
"子字段1": {
"type": "keyword"
},
"子字段2": {
"type": "keyword"
}
}
}
}
}
}
PUT /test001
{
"mappings": {
"properties": {
"info": {
"type": "text",
"analyzer": "ik_smart"
},
"email": {
"type": "keyword",
"index": false
},
"name": {
"type": "object",
"properties": {
"firstName": {
"type": "keyword"
},
"lastName": {
"type": "keyword"
}
}
}
}
}
}
GET
/{索引库名}
无
GET /{索引库名}
GET /test001
PUT
/{索引库名}/_mapping
mapping映射
PUT /{索引库名}/_mapping
{
"properties": {
"新字段名":{
"type": "integer"
}
}
}
一旦创建,就无法修改mapping
PUT /test001/_mapping
{
"properties": {
"age": {
"type": "integer"
}
}
}
DELETE
/{索引库名}
DELETE /{索引库名}
POST /{索引库名}/_doc/{文档id}
{
"字段1": "值1",
"字段2": "值2",
"字段3": {
"子属性1": "值3",
"子属性2": "值4"
},
// ...
}
POST /test001/_doc/1
{
"info": "次元游记兵--恶灵",
"email": "[email protected]",
"name": {
"firstName": "雷尼",
"lastName": "布莱希"
}
}
GET /{索引库名}/_doc/{id}
GET /test001/_doc/1
DELETE /{索引库名}/_doc/{id}
DELETE /test001/_doc/1
全量修改是覆盖原来的文档,其本质是
语法
PUT /{索引库名}/_doc/{文档id}
{
"字段1": "值1",
"字段2": "值2",
// ... 略
}
PUT /test001/_doc/1
{
"info": "爆破专家--暴雷",
"email": "@Apex.net",
"name": {
"firstName": "沃尔特",
"lastName": "菲茨罗伊"
}
}
POST /{索引库名}/_update/{文档id}
{
"doc": {
"字段名": "新的值",
...
}
}
POST /test001/_update/1
{
"doc":{
"email":"[email protected]",
"info":"恐怖G7人--马文"
}
}
字段 | 类型 | 长度 | 注释 |
---|---|---|---|
id | bigint | 20 | 酒店id |
name | varchar | 255 | 酒店名称 |
address | varchar | 255 | 酒店地址 |
price | int | 10 | 酒店价格 |
score | int | 2 | 酒店评分 |
brand | varchar | 32 | 酒店品牌 |
city | varchar | 32 | 所在城市 |
star_name | varchar | 16 | 酒店星级,1星到5星,1钻到5钻 |
business | varchar | 255 | 商圈 |
latitude | varchar | 32 | 纬度 |
longitude | varchar | 32 | 经度 |
pic | varchar | 255 | 酒店图片 |
ik_max_word
id
:id的类型比较特殊,不是long
,而是keyword
,而且id后期肯定需要涉及到我们的增删改查,所以需要参与搜索name
:需要参与搜索,而且是text
,需要参与分词,分词器选择ik_max_wordaddress
:是字符串,但是个人感觉不需要分词(所以这里把它设为keyword),当然你也可以选择分词,个人感觉不需要参与搜索,所以index为falseprice
:类型:integer,需要参与搜索(做范围排序)score
:类型:integer,需要参与搜索(做范围排序)brand
:类型:keyword,但是不需要分词(品牌名称分词后毫无意义),所以为keyword,需要参与搜索city
:类型:keyword,分词无意义,需要参与搜索star_name
:类型:keyword,需要参与搜索business
:类型:keyword,需要参与搜索latitude
和longitude
:地理坐标在ES中比较特殊,ES中支持两种地理坐标数据类型:
geo_point:
由纬度(latitude)和经度( longitude)确定的一个点。例如:“32.8752345,120.2981576”geo_shape:
有多个geo_point组成的复杂几何图形。例如一条直线,“LINESTRING (-77.03653 38.897676,-77.009051 38.889939)”pic
:类型:keyword,不需要参与搜索,index为falsePUT /hotel
{
"mappings": {
"properties": {
"id": {
"type": "keyword"
},
"name": {
"type": "text",
"analyzer": "ik_max_word"
},
"address": {
"type": "keyword",
"index": false
},
"price": {
"type": "integer"
},
"score": {
"type": "integer"
},
"brand": {
"type": "keyword"
},
"city": {
"type": "keyword"
},
"starName": {
"type": "keyword"
},
"business": {
"type": "keyword"
},
"location": {
"type": "geo_point"
},
"pic": {
"type": "keyword",
"index": false
}
}
}
}
上海虹桥希尔顿五星酒店
copy_to
属性,将当前字段拷贝到指定字段,示例"all": {
"type": "text",
"analyzer": "ik_max_word"
},
"brand": {
"type": "keyword",
"copy_to": "all"
}
{% endnote %}
PUT /hotel
{
"mappings": {
"properties": {
"id": {
"type": "keyword"
},
"name": {
"type": "text",
"analyzer": "ik_max_word",
"copy_to": "all"
},
"address": {
"type": "keyword",
"index": false
},
"price": {
"type": "integer"
},
"score": {
"type": "integer"
},
"brand": {
"type": "keyword",
"copy_to": "all"
},
"city": {
"type": "keyword"
},
"starName": {
"type": "keyword"
},
"business": {
"type": "keyword"
, "copy_to": "all"
},
"location": {
"type": "geo_point"
},
"pic": {
"type": "keyword",
"index": false
},
"all":{
"type": "text",
"analyzer": "ik_max_word"
}
}
}
}
ElasticSearch
提供的API中,与ElasticSearch
一切交互都封装在一个名为RestHighLevelClient
的类中,必须先完成这个对象的初始化,建立与ES的连接
org.elasticsearch.client
elasticsearch-rest-high-level-client
1.8
7.12.1
RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
HttpHost.create("http://192.168.150.101:9200")
));
@BeforeEach
中import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.springframework.boot.test.context.SpringBootTest;
import java.io.IOException;
@SpringBootTest
class HotelDemoApplicationTests {
private RestHighLevelClient client;
@Test
void contextLoads() {
}
@BeforeEach
public void setup() {
this.client = new RestHighLevelClient(RestClient.builder(
new HttpHost("http://192.168.128.130:9200")
));
}
@AfterEach
void tearDown() throws IOException {
this.client.close();
}
}
@Test
void testCreateHotelIndex() throws IOException {
CreateIndexRequest request = new CreateIndexRequest("hotel");
request.source(MAPPING_TEMPLATE, XContentType.JSON);
client.indices().create(request, RequestOptions.DEFAULT);
}
PUT /hotel
MAPPING_TEMPLATE
,让代码看起来更优雅public static final String MAPPING_TEMPLATE = "{\n" +
" \"mappings\": {\n" +
" \"properties\": {\n" +
" \"id\": {\n" +
" \"type\": \"keyword\"\n" +
" },\n" +
" \"name\": {\n" +
" \"type\": \"text\",\n" +
" \"analyzer\": \"ik_max_word\",\n" +
" \"copy_to\": \"all\"\n" +
" },\n" +
" \"address\": {\n" +
" \"type\": \"keyword\",\n" +
" \"index\": false\n" +
" },\n" +
" \"price\": {\n" +
" \"type\": \"integer\"\n" +
" },\n" +
" \"score\": {\n" +
" \"type\": \"integer\"\n" +
" },\n" +
" \"brand\": {\n" +
" \"type\": \"keyword\",\n" +
" \"copy_to\": \"all\"\n" +
" },\n" +
" \"city\": {\n" +
" \"type\": \"keyword\"\n" +
" },\n" +
" \"starName\": {\n" +
" \"type\": \"keyword\"\n" +
" },\n" +
" \"business\": {\n" +
" \"type\": \"keyword\"\n" +
" , \"copy_to\": \"all\"\n" +
" },\n" +
" \"location\": {\n" +
" \"type\": \"geo_point\"\n" +
" },\n" +
" \"pic\": {\n" +
" \"type\": \"keyword\",\n" +
" \"index\": false\n" +
" },\n" +
" \"all\":{\n" +
" \"type\": \"text\",\n" +
" \"analyzer\": \"ik_max_word\"\n" +
" }\n" +
" }\n" +
" }\n" +
"}";
DELETE /hotel
@Test
void testDeleteHotelIndex() throws IOException {
DeleteIndexRequest request = new DeleteIndexRequest("hotel");
client.indices().delete(request, RequestOptions.DEFAULT);
}
GET /hotel
@Test
void testGetHotelIndex() throws IOException {
GetIndexRequest request = new GetIndexRequest("hotel");
boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
System.out.println(exists ? "索引库已存在" : "索引库不存在");
}
import cn.blog.hotel.service.IHotelService;
import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import java.io.IOException;
@SpringBootTest
public class HotelDocumentTest {
@Autowired
private IHotelService hotelService;
private RestHighLevelClient client;
@BeforeEach
void setUp() {
client = new RestHighLevelClient(RestClient.builder(
new HttpHost("http://192.168.128.130:9200")
));
}
@AfterEach
void tearDown() throws IOException {
client.close();
}
}
@Data
@TableName("tb_hotel")
public class Hotel {
@TableId(type = IdType.INPUT)
private Long id;
private String name;
private String address;
private Integer price;
private Integer score;
private String brand;
private String city;
private String starName;
private String business;
private String longitude;
private String latitude;
private String pic;
}
import lombok.Data;
import lombok.NoArgsConstructor;
@Data
@NoArgsConstructor
public class HotelDoc {
private Long id;
private String name;
private String address;
private Integer price;
private Integer score;
private String brand;
private String city;
private String starName;
private String business;
private String location;
private String pic;
public HotelDoc(Hotel hotel) {
this.id = hotel.getId();
this.name = hotel.getName();
this.address = hotel.getAddress();
this.price = hotel.getPrice();
this.score = hotel.getScore();
this.brand = hotel.getBrand();
this.city = hotel.getCity();
this.starName = hotel.getStarName();
this.business = hotel.getBusiness();
this.location = hotel.getLatitude() + ", " + hotel.getLongitude();
this.pic = hotel.getPic();
}
}
POST /{索引库名}/_doc/{id}
{
"name": "Jack",
"age": 21
}
@Test
void testIndexDocument() throws IOException {
IndexRequest request = new IndexRequest("indexName").id("1");
request.source("{\"name\":\"Jack\",\"age\":21}");
client.index(request, RequestOptions.DEFAULT);
}
Hotel
对象Hotel
对象需要转换为HotelDoc
对象HotelDoc
需要序列化为json
格式@Test
void testAddDocument() throws IOException {
// 1. 根据id查询酒店数据
Hotel hotel = hotelService.getById(61083L);
// 2. 转换为文档类型
HotelDoc hotelDoc = new HotelDoc(hotel);
// 3. 转换为Json字符串
String jsonString = JSON.toJSONString(hotelDoc);
// 4. 准备request对象
IndexRequest request = new IndexRequest();
// 5. 准备json文档
request.source(jsonString, XContentType.JSON);
// 6. 发送请求
client.index(request, RequestOptions.DEFAULT);
}
{
"_index" : "hotel",
"_type" : "_doc",
"_id" : "61083",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true,
"_source" : {
"address" : "自由贸易试验区临港新片区南岛1号",
"brand" : "皇冠假日",
"business" : "滴水湖临港地区",
"city" : "上海",
"id" : 61083,
"location" : "30.890867, 121.937241",
"name" : "上海滴水湖皇冠假日酒店",
"pic" : "https://m.tuniucdn.com/fb3/s1/2n9c/312e971Rnj9qFyR3pPv4bTtpj1hX_w200_h200_c1_t0.jpg",
"price" : 971,
"score" : 44,
"starName" : "五钻"
}
}
GET /hotel/_doc/{id}
_source
属性中,所以我们要获取这部分内容,然后将其转化为HotelDoc@Test
void testGetDocumentById() throws IOException {
// 1. 准备request对象
GetRequest request = new GetRequest("hotel").id("61083");
// 2. 发送请求,得到结果
GetResponse response = client.get(request, RequestOptions.DEFAULT);
// 3. 解析结果
String jsonStr = response.getSourceAsString();
HotelDoc hotelDoc = JSON.parseObject(jsonStr, HotelDoc.class);
System.out.println(hotelDoc);
}
POST /test001/_update/1
{
"doc":{
"email":"[email protected]",
"info":"恐怖G7人--马文"
}
}
@Test
void testUpdateDocumentById() throws IOException {
// 1. 准备request对象
UpdateRequest request = new UpdateRequest("hotel","61083");
// 2. 准备参数
request.doc(
"city","北京",
"price",1888);
// 3. 发送请求
client.update(request,RequestOptions.DEFAULT);
}
DELETE /hotel/_doc/{id}
@Test
void testDeleteDocumentById() throws IOException {
// 1. 准备request对象
DeleteRequest request = new DeleteRequest("hotel","61083");
// 2. 发送请求
client.delete(request,RequestOptions.DEFAULT);
}
@Test
void testBulkAddDoc() throws IOException {
BulkRequest request = new BulkRequest();
request.add(new IndexRequest("hotel").id("101").source("json source1", XContentType.JSON));
request.add(new IndexRequest("hotel").id("102").source("json source2", XContentType.JSON));
client.bulk(request, RequestOptions.DEFAULT);
}
{% endnote %}
@Test
void testBulkAddDoc() throws IOException {
BulkRequest request = new BulkRequest();
List hotels = hotelService.list();
for (Hotel hotel : hotels) {
HotelDoc hotelDoc = new HotelDoc(hotel);
request.add(new IndexRequest("hotel").
id(hotelDoc.getId().toString()).
source(JSON.toJSONString(hotelDoc), XContentType.JSON));
}
client.bulk(request, RequestOptions.DEFAULT);
}
@Test
void testBulkAddDoc() throws IOException {
BulkRequest request = new BulkRequest();
hotelService.list().stream().forEach(hotel ->
request.add(new IndexRequest("hotel")
.id(hotel.getId().toString())
.source(JSON.toJSONString(new HotelDoc(hotel)), XContentType.JSON)));
client.bulk(request, RequestOptions.DEFAULT);
}
查询所有
:查询出所有数据,一般测试用。例如
全文检索(full text)
:利用分词器对用户输入的内容分词,然后去倒排索引库中匹配。例如
精确查询
:根据精确词条值查找数据,一般是查找keyword、数值、日期、boolean等类型字段。例如
地理查询(geo)
:根据经纬度查询。例如
复合查询(compound)
:复合查询可以将上述各种查询条件组合起来,合并查询条件。例如
GET /indexname/_search
{
"query": {
"查询类型": {
"查询条件": "条件值"
}
}
}
match_all
GET /indexName/_search
{
"query": {
"match_all": {
}
}
}
查询类型
和查询条件
的变化GET /indexName/_search
{
"query": {
"match": {
"FIELD": "TEXT"
}
}
}
GET /indexName/_search
{
"query": {
"multi_match": {
"fields": ["FIELD1", "FIELD2"]
}
}
}
all
字段是之前由name
、city
、business
这三个字段拷贝得来的GET /hotel/_search
{
"query": {
"match": {
"all": "上海外滩"
}
}
}
GET /hotel/_search
{
"query": {
"multi_match": {
"query": "上海外滩",
"fields": ["brand", "city", "business"]
}
}
}
name
、city
、business
的值都利用copy_to
复制到了all字段中,因此根据这三个字段搜索和根据all字段搜索的结果当然一样了copy_to
,然后使用单字段查询的方式term
:根据词条精确值查询range
:根据值的范围查询GET /indexName/_search
{
"query": {
"term": {
"FIELD": {
"value": "VALUE"
}
}
}
}
GET /hotel/_search
{
"query": {
"term": {
"city": {
"value": "北京"
}
}
}
}
GET /hotel/_search
{
"query": {
"range": {
"FIELD": {
"gte": 10, //这里的gte表示大于等于,gt表示大于
"lte": 20 //这里的let表示小于等于,lt表示小于
}
}
}
}
GET /hotel/_search
{
"query": {
"range": {
"price": {
"gte": 1000,
"lte": 3000
}
}
}
}
GET /indexName/_search
{
"query": {
"geo_bounding_box": {
"FIELD": {
"top_left": { // 左上点
"lat": 31.1, // lat: latitude 纬度
"lon": 121.5 // lon: longitude 经度
},
"bottom_right": { // 右下点
"lat": 30.9, // lat: latitude 纬度
"lon": 121.7 // lon: longitude 经度
}
}
}
}
}
GET /hotel/_search
{
"query": {
"geo_bounding_box": {
"location": {
"top_left": {
"lat": 31.1,
"lon": 121.5
},
"bottom_right": {
"lat": 30.9,
"lon": 121.7
}
}
}
}
}
GET /indexName/_search
{
"query": {
"geo_distance": {
"distance": "3km", // 半径
"location": "39.9, 116.4" // 圆心
}
}
}
GET /hotel/_search
{
"query": {
"geo_distance": {
"distance": "3km",
"location": "39.9, 116.4"
}
}
}
[
{
"_score" : 17.850193,
"_source" : {
"name" : "虹桥如家酒店真不错",
}
},
{
"_score" : 12.259849,
"_source" : {
"name" : "外滩如家酒店真不错",
}
},
{
"_score" : 11.91091,
"_source" : {
"name" : "迪士尼如家酒店真不错",
}
}
]
TF-IDF算法有一种缺陷,就是词条频率越高,文档得分也会越高,单个词条对文档影响较大。而BM25则会让单个词条的算分有一个上限,曲线更平滑
小结:ES会根据词条和文档的相关度做打分,算法有两种
function score
查询了GET /indexName/_search
{
"query": {
"function_score": {
"query": {
"match": {
"all": "外滩"
}
},
"functions": [
{
"filter": {
"term": {
"id": "1"
}
},
"weight": 10
}
],
"boost_mode": "multiply"
}
}
}
原始条件
查询搜索文档,并且计算相关性算分,称为原始算法(query score)过滤条件
,过滤文档过滤条件
的文档,基于算分函数
运算,得到函数算分
(function score)原始算分
(query score)和函数算分
(function score)基于运算模式
做运算,得到最终给结果,作为相关性算分如家
这个品牌的酒店排名靠前一点"brand": "如家"
,算分函数和运算模式我们可以暴力一点,固定算分结果相乘GET /hotel/_search
{
"query": {
"function_score": {
"query": {
"match": {
"all": "外滩"
}
},
"functions": [
{
"filter": {
"term": {
"brand": "如家"
}
},
"weight": 10
}
],
"boost_mode": "multiply"
}
}
}
must
:必须匹配每个子查询,类似与
should
:选择性匹配子查询,类似或
must_not
:必须不匹配,不参与算分
,类似非
filter
:必须匹配,不参与算分
如家
,价格不高于400
,在坐标39.9, 116.4
周围10km
范围内的酒店must
中must_not
中filter
中GET /hotel/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"name": {
"value": "如家"
}
}
}
],
"must_not": [
{
"range": {
"price": {
"gt": 400
}
}
}
],
"filter": [
{
"geo_distance": {
"distance": "10km",
"location": {
"lat": 39.9,
"lon": 116.4
}
}
}
]
}
}
}
{% note info no-icon %}
需求:搜索城市在上海,品牌为皇冠假日
或华美达
,价格不低于500
,且用户评分在45分
以上的酒店
{% endnote %}
GET /hotel/_search
{
"query": {
"bool": {
"must": [
{"term": {
"city": {
"value": "上海"
}
}}
],
"should": [
{"term": {
"brand": {
"value": "皇冠假日"
}
}},
{"term": {
"brand": {
"value": "华美达"
}
}}
],
"must_not": [
{"range": {
"price": {
"lte": 500
}
}}
],
"filter": [
{"range": {
"score": {
"gte": 45
}
}}
]
}
}
}
{% note warning no-icon %}
如果细心一点,就会发现这里的should有问题,must和should一起用的时候,should会不生效,结果中会查询到除了皇冠假日
和华美达
之外的品牌。
对于DSL语句的解决方案比较麻烦,需要在must里再套一个bool,里面再套should,但是对于Java代码来说比较容易修改
{% endnote %}
小结:布尔查询有几种逻辑关系?
与
或
非
GET /hotel/_search
{
"query": {
"match_all": {
}
},
"sort": [
{
"FIELD": {
"order": "desc"
},
"FIELD": {
"order": "asc"
}
}
]
}
GET /hotel/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"score": {
"order": "desc"
},
"price": {
"order": "asc"
}
}
]
}
GET /indexName/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"_geo_distance": {
"FIELD": {
"lat": 40,
"lon": -70
},
"order": "asc",
"unit": "km"
}
}
]
}
GET /hotel/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"_geo_distance": {
"location": {
"lat": 39.9,
"lon": 116.4
},
"order": "asc",
"unit": "km"
}
}
]
}
top10
的数据。而如果要查询更多数据就需要修改分页参数了。from
:从第几个文档开始size
:总共查询几个文档limit ?, ?
GET /indexName/_search
{
"query": {
"match_all": {}
},
"from": 0,
"size": 20
}
GET /hotel/_search
{
"query": {
"match_all": {}
},
"from": 990,
"size": 10,
"sort": [
{
"price": {
"order": "asc"
}
}
]
}
0~1000
条,然后截取其中990~1000
的这10条TOP1000
,如果ES是单点模式,那么并无太大影响TOP1000
的数据,并不是每个节点查询TOP200
就可以了。TOP1000
,就必须先查询出每个节点的TOP1000
,汇总结果后,重新排名,重新截取TOP1000
9900~10000
的数据呢?是不是要先查询TOP10000
,然后汇总每个节点的TOP10000
,重新排名呢?form + size > 10000
的请求提示:限于网页篇幅,部分结果未予显示。
)GET /indexName/_search
{
"query": {
"match": {
"FIELD": "TEXT"
}
},
"highlight": {
"fields": {
"FIELD": {
"pre_tags": "",
"post_tags": ""
}
}
}
}
{% note warning no-icon %}
注意:
required_field_match=false
GET /hotel/_search
{
"query": {
"match": {
"all": "上海如家"
}
},
"highlight": {
"fields": {
"name": {
"pre_tags": "",
"post_tags": "",
"require_field_match": "false"
}
}
}
}
标签,所以我们也可以省略GET /hotel/_search
{
"query": {
"match": {
"all": "上海如家"
}
},
"highlight": {
"fields": {
"name": {
"require_field_match": "false"
}
}
}
}
GET /hotel/_search
{
"query": {
"match": {
"all": "上海如家"
}
},
"from": 0,
"size": 20,
"sort": [
{
"_geo_distance": {
"location": {
"lat": 39.9,
"lon": 116.4
},
"order": "asc",
"unit": "km"
},
"price": "asc"
}
],
"highlight": {
"fields": {
"name": {
"require_field_match": "false"
}
}
}
}