类似模糊匹配,match知道分词器的存在,会根据查询条件进行分词操作,然后再查询,
GET 索引名/_search
{
"query":{
"match":{
"FIELD":"text"
}
}
}
查询所有文档
GET 索引名/_search
{
"query":{
"match_all":{}
}
}
等同于
GET 索引名/_search
可以指定多个字段进行查询,同一查询文本在多个字段中查询
GET 索引/_search
{
"query":{
"multi_match":{
"query":"字段值",
"fields":["需要匹配的字段1","需要匹配的字段2"]
}
}
}
短语匹配,ES引擎首先分析查询字符串,从分析后的文本中构建短语查询,这意味着必须匹配短语中所有分词,并且保证各个分词的相对位置相同
GET 索引名/_search
{
"query":{
"match_pharse":{
"FIELD":"text"
}
}
}
term会从倒排索引中寻找确切的term,它不知道分词器的存在,这种查询适合keyword、numeric、date、Booleans
GET 索引名/_search
{
"query":{
"term":{
"FIELD":{
"value":"value"
}
}
}
}
查询某个字段里含有多个关键字的文档
GET 索引名/_search
{
"query":{
"terms":{
"FEILD":["VALUE1","VALUE2"]
}
}
}
过滤器,会对结果进行缓存,不会计算相关度,避免计算分值(没有_score),执行速度快
查询语句的表现行为取决于使用了查询上下文方式还是过滤上下文的方式
query关注匹配度,filter关注是否能查到
query不检索缓存,filter会检索缓存
query用于拥有分词的内容,filter一般用于范围查询和精确检索
bool(布尔)过滤器,这是个复合过滤器,可以接受多个其他过滤器作为参数,并将这些过滤器结合成各式各样的布尔组合,用户合并其他查询语句,比如一个bool语句,允许在需要的时候组合其他语句,包括must、must_not、should和filter语句
GET 索引/_search
{
"query":{
"bool":{
"must":[],
"should":[],
"must_not":[],
"filter":[],
"minimum_should_match":0
}
}
}
query context
"query":{
"range":{
"字段":{
"from":起始值,
"to":最终值,
"include_lower":true,#是否允许小写
"include_upper":true,#是否允许大写
}
}
}
filter context
"query":{
"bool":{
"filter":{
"range":{
"字段":{
"gte":起始值,
"lte":最终值,
}
}
}
}
}
"_source":["field1","field2"...]
"from":起点
"size":大小
"sort":{"字段":{"order":"asc/desc"}}
高亮查询实际就是给查到的结果外部包一层html代码,让它看起来醒目
GET 索引/_search
{
"highlight":{
"pre_tags":"", //自定义高亮
"post_tags":"",
"fields":{
"字段":{} //字段内部还可以对指定部分进行自定义高亮
}
}
}
Analysis(文本分析),是指把全文本转换成一系列单词(term/token)的过程,也叫分词。Analysis是通过Analyzer(分词器)来实现的。当一个文档被索引时,每个Field可能会创建一个倒排索引(Mapping可以设置不索引该Field,通过dynamic=false)
倒排索引的过程就是将文档通过Analyzer分成一个个新的Term,每一个Term都指向包含这个Term文档的集合,当查询时,ES会根据搜索类型决定是否对query进行analyze,然后和倒排索引中的term进行相关性查询,匹配到相关文档
Analyzer由散三种构键组成:字符过滤器、分词器、token过滤器,工作流程也是按照这个顺序
标准的分词器根据单词边界将文本划分为术语,这是由Unicode文本分割算法定义的。它删除了大多数标点符号。它是大多数语言的最佳选择
max_token_length
:设置分词器的最大令牌长度,如果看到令牌超过此长度,则将其以该值间隔分割,默认为255
为索引设置分词器
PUT test3
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer":{
"tokenizer":"my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer":{
"type":"standard",
"max_token_length":"5"
}
}
}
}
}
将文本提交到分词器查看结果
POST test3/_analyze
{
"analyzer": "my_analyzer",
"text": "qwerqwer qwer"
}
对非字母进行分隔,符号和数字会被去除,字母的大写会转成小写,中文不分词
GET _analyze
{
"analyzer": "simple",
"text": "WWWWW WWW-汉字汉字 ,,。、 ’‘_1233213"
}
结果
{
"tokens" : [
{
"token" : "wwwww",
"start_offset" : 0,
"end_offset" : 5,
"type" : "word",
"position" : 0
},
{
"token" : "www",
"start_offset" : 6,
"end_offset" : 9,
"type" : "word",
"position" : 1
},
{
"token" : "汉字汉字",
"start_offset" : 10,
"end_offset" : 14,
"type" : "word",
"position" : 2
}
]
}
将停用词(the、a、is)过滤,符号和数字会被去除,字母大写会转换为小写,中文不分词
GET _analyze
{
"analyzer": "stop",
"text": "WWWWW WWW-汉字汉字 ,,。、 ’‘_1233213 the a is"
}
结果
{
"tokens" : [
{
"token" : "wwwww",
"start_offset" : 0,
"end_offset" : 5,
"type" : "word",
"position" : 0
},
{
"token" : "www",
"start_offset" : 6,
"end_offset" : 9,
"type" : "word",
"position" : 1
},
{
"token" : "汉字汉字",
"start_offset" : 10,
"end_offset" : 14,
"type" : "word",
"position" : 2
}
]
}
按照空格分隔,英文不区分大小写也不会转换,中文不分词
GET _analyze
{
"analyzer": "whitespace",
"text": "WWWWW WWW-汉字汉字 ,1233213 the a is"
}
结果
{
"tokens" : [
{
"token" : "WWWWW",
"start_offset" : 0,
"end_offset" : 5,
"type" : "word",
"position" : 0
},
{
"token" : "WWW-汉字汉字",
"start_offset" : 6,
"end_offset" : 14,
"type" : "word",
"position" : 1
},
{
"token" : ",1233213",
"start_offset" : 15,
"end_offset" : 23,
"type" : "word",
"position" : 2
},
{
"token" : "the",
"start_offset" : 24,
"end_offset" : 27,
"type" : "word",
"position" : 3
},
{
"token" : "a",
"start_offset" : 28,
"end_offset" : 29,
"type" : "word",
"position" : 4
},
{
"token" : "is",
"start_offset" : 30,
"end_offset" : 32,
"type" : "word",
"position" : 5
}
]
}
不分词
GET _analyze
{
"analyzer": "keyword",
"text": "WWWWW WWW-汉字汉字 ,1233213 the a is"
}
结果
{
"tokens" : [
{
"token" : "WWWWW WWW-汉字汉字 ,1233213 the a is",
"start_offset" : 0,
"end_offset" : 32,
"type" : "word",
"position" : 0
}
]
}
通过正则表达式分词,默认是\W+(非字母进行分隔)
PUT test4
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer":{
"tokenizer":"my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer":{
"type":"pattern",
"pattern":",,"
}
}
}
}
}
POST test4/_analyze
{
"analyzer": "my_analyzer",
"text": "qwerqwer qwer,, abcd"
}
结果
{
"tokens" : [
{
"token" : "qwerqwer qwer",
"start_offset" : 0,
"end_offset" : 13,
"type" : "word",
"position" : 0
},
{
"token" : " abcd",
"start_offset" : 15,
"end_offset" : 20,
"type" : "word",
"position" : 1
}
]
}
停用词the、is、a过滤,支持多种语言分词
一个字段可以配置多个分词器,如analyzer用来进行倒排索引时的分词而search_analyzer用来进行搜索时的分词
PUT test5
{
"mappings": {
"properties": {
"addr":{
"analyzer": "standard",
"search_analyzer": "standard"
}
}
}
}
ES内置的分词器都不支持中文分词,现在比较好用的中文分词器有IK分词器
yum -y install zip unzip
,unzip elasticsearch-analysis-ik-7.0.0.zip
,rm -f elasticsearch-analysis-ik-7.0.0.zip
docker cp /opt/ik my_es:/usr/share/elasticsearch/plugins
ik有两种颗粒度的分词方式:
GET _analyze
{
"text": "我爱北京天安门",
"analyzer": "ik_smart"
}
结果
{
"tokens" : [
{
"token" : "我",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "爱",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "北京",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "天安门",
"start_offset" : 4,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 3
}
]
}
GET _analyze
{
"text": "我爱北京天安门",
"analyzer": "ik_max_word"
}
结果
{
"tokens" : [
{
"token" : "我",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "爱",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "北京",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "天安门",
"start_offset" : 4,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 3
},
{
"token" : "天安",
"start_offset" : 4,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 4
},
{
"token" : "门",
"start_offset" : 6,
"end_offset" : 7,
"type" : "CN_CHAR",
"position" : 5
}
]
}
创建索引的时候一定要指定分词器,否则倒排索引的分词器还是默认的标准分词器
PUT test5
{
"mappings": {
"properties": {
"addr":{
"analyzer": "ik_smart",
"search_analyzer": "ik_smart"
}
}
}
}
在ik分词器的config目录下,有很多dic文件,这些就是分词字典,可以在这里自定义dic文件类自定义分词
docker exec -it my_es /bin/bash
,cd plugins/ik
,touch mydic.dic
,vi mydic.dic
宇宙超人咚咚强
DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置comment>
<entry key="ext_dict">mydic.dicentry>
<entry key="ext_stopwords">entry>
properties>
GET _analyze
{
"text": "宇宙超人咚咚强",
"analyzer": "ik_smart"
}
结果
{
"tokens" : [
{
"token" : "宇宙超人咚咚强",
"start_offset" : 0,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 0
}
]
}
spring:
elasticsearch:
rest:
uris: http://192.168.206.128:9200
@Test
public void testCreateIndex() throws IOException {
//创建索引请求
CreateIndexRequest request = new CreateIndexRequest("student");
//客户端执行请求
CreateIndexResponse response = restHighLevelClient.indices().create(request, RequestOptions.DEFAULT);
//打印响应内容,查看是否创建成功
System.out.println(response);
}
@Test
public void testGetIndex() throws IOException {
//获取索引请求对象
GetIndexRequest request = new GetIndexRequest("student");
//客户端执行请求
boolean exists = restHighLevelClient.indices().exists(request, RequestOptions.DEFAULT);
//打印响应内容,查看索引是否存在
System.out.println(exists);
}
@Test
public void testDelIndex() throws IOException {
//删除索引请求对象
DeleteIndexRequest request = new DeleteIndexRequest("student");
//客户端执行请求
AcknowledgedResponse response = restHighLevelClient.indices().delete(request, RequestOptions.DEFAULT);
//打印响应内容,查看是否删除成功
System.out.println(response);
}
@Test
public void testCreateDoc() throws IOException {
//创建请求
IndexRequest request = new IndexRequest("student");
//设置请求
request.id("1");
request.timeout(TimeValue.timeValueSeconds(1));
//创建对象
Student student = new Student();
student.setName("张三");
student.setSex("male");
student.setAge(19);
// 转json
ObjectMapper mapper = new ObjectMapper();
String obj = mapper.writeValueAsString(student);
//将数据放入请求
request.source(obj, XContentType.JSON);
//发起请求
IndexResponse response = restHighLevelClient.index(request,RequestOptions.DEFAULT);
System.out.println(response);
}
@Test
public void testFindAllDoc() throws IOException {
SearchRequest searchRequest = new SearchRequest();
//构建查询条件
SearchSourceBuilder builder = new SearchSourceBuilder();
QueryBuilder queryBuilder = QueryBuilders.matchAllQuery();
builder.query(queryBuilder);
searchRequest.source(builder);
SearchResponse response = restHighLevelClient.search(searchRequest,RequestOptions.DEFAULT);
//查看查询结果
long count = response.getHits().getTotalHits().value;
System.out.println("总记录树:"+count);
for (SearchHit hit: response.getHits()
) {
Map map = hit.getSourceAsMap();
System.out.println(map);
}
}
@Test
public void testFindDocById() throws IOException {
GetRequest request = new GetRequest("student","1");
GetResponse response = restHighLevelClient.get(request,RequestOptions.DEFAULT);
System.out.println(response.getId()+"\t"+response.getIndex()+"\t"+response.getVersion());
System.out.println(response.getSource());
System.out.println(response);
}
@Test
public void testUpdateDoc() throws IOException {
UpdateRequest request = new UpdateRequest("student","1");
Student student = new Student();
student.setName("李四");
student.setSex("male");
student.setAge(19);
ObjectMapper mapper = new ObjectMapper();
String obj = mapper.writeValueAsString(student);
request.doc(obj,XContentType.JSON);
UpdateResponse response = restHighLevelClient.update(request,RequestOptions.DEFAULT);
System.out.println(response);
}
@Test
public void testDelDoc() throws IOException {
DeleteRequest request = new DeleteRequest("student","1");
DeleteResponse response = restHighLevelClient.delete(request,RequestOptions.DEFAULT);
System.out.println(response);
}
@Test
public void testBulkCreateDoc() throws IOException {
BulkRequest request = new BulkRequest();
Student student = new Student();
for (int i=1;i<=20;i++){
student.setName("张三"+i);
student.setSex("male");
student.setAge(18+i);
request.add(new IndexRequest("student").id(i+"").source(new ObjectMapper().writeValueAsString(student)));
}
BulkResponse response = restHighLevelClient.bulk(request,RequestOptions.DEFAULT);
System.out.println(response);
}
@Test
public void testBulkDelDoc() throws IOException {
BulkRequest request = new BulkRequest();
for (int i = 1; i < 21; i++) {
request.add(new DeleteRequest("student",i+""));
}
BulkResponse response = restHighLevelClient.bulk(request,RequestOptions.DEFAULT);
System.out.println(response);
}
@Test
public void testMatchFind() throws IOException {
SearchRequest request = new SearchRequest("student");
//构建查询条件
SearchSourceBuilder builder = new SearchSourceBuilder();
//通过SearchSourceBuilder设置from和size等条件
builder.from(0);
builder.size(100);
//通过MatchQueryBuilder设置match条件
MatchQueryBuilder queryBuilder = QueryBuilders.matchQuery("age","20");
builder.query(queryBuilder);
builder.timeout(TimeValue.timeValueSeconds(1));
request.source(builder);
SearchResponse response = restHighLevelClient.search(request,RequestOptions.DEFAULT);
System.out.println(response);
}
MatchQueryBuilder和TermQueryBuilder都是QueryBuilder的子类,这里SearchBuilder.query()方法参数是QueryBuilder,所以需要使用match或term查询时构建对应的;QueryBuilders中含有短语匹配、多条件匹配等各种与DSL查询对应的方法
@Test
public void testMatchFind() throws IOException {
SearchRequest request = new SearchRequest("student");
//构建查询条件
SearchSourceBuilder builder = new SearchSourceBuilder();
//通过SearchSourceBuilder设置from和size等条件
builder.from(0);
builder.size(100);
//通过MatchQueryBuilder设置match条件
MatchQueryBuilder queryBuilder = QueryBuilders.matchQuery("name.keyword","张三2");//通过字段.keyword方式以关键字形式搜索
//构建布尔过滤器
BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder();
boolQueryBuilder.must(queryBuilder);
builder.query(queryBuilder);
builder.timeout(TimeValue.timeValueSeconds(1));
request.source(builder);
SearchResponse response = restHighLevelClient.search(request,RequestOptions.DEFAULT);
System.out.println(response);
}
BoolQueryBuilder也是QueryBuilder的子类,通过BoolQueryBuilder的must、should等方法实现DSL中对应的must、should查询字段;通过字段名.keyword或字段名.text传参方式来指定查询文本为关键字还是需要分词的文本
BoolQueryBuilder boolQuery = QueryBuilders.boolQuery()
.must(QueryBuilders.termQuery("name","张三2"))
.must(Querybuilders.termQuery("sex","male"));
可以通过链式调用的方式来构建复合查询条件
//构建高亮
HighlightBuilder highlightBuilder = new HighlightBuilder();
HighlightBuilder.Field highlightTitle = new HighlightBuilder.Field("name");//指定高亮的字段
//设置高亮的方式
highlightBuilder.preTags("");
highlightBuilder.postTags("");
highlightBuilder.field(highlightTitle);
builder.highlighter(highlightBuilder);
{
"from":0,
"size":10
}
GET _search/scroll
{
"scroll_id": ,
"scroll":"5m"
}
{
"from":0,
"size":10
}
{
"from":0,
"size":10,
"search_after":[id]
}
ES是一款全文搜索引擎,它的作用全部都是为了全文检索,所有的数据要插入做持久化还是要插入到MySQL中,然后ES从MySQL中拿数据
Logstash可以采集各种样式、大小和来源的数据,支持各种输入选择,可以同时从众多常用来源捕捉时间
实时解析和转换是数据
数据从源传输到数据库的过程中Logstash过滤器可以解析各个事件,识别已经命名的字段以构建结构,并将他们转换成通用格式,以便进行更强大的分析
解析示例
Logstash不光可以输出到ES还能发送到任何指定的地方
docker pull logstash:7.0.0
docker run -d --restart=always --log-driver json-file --log-opt max-size=100m --log-opt max-file=2 -p 5044:5044 --name logstash logstash:7.0.0
path.config: /usr/share/logstash/conf.d/*.conf
path.logs: /var/log/logstash
input{
beats{
port => 5044
codec => "json"
}
}
output{
elasticsearch {
hosts => ["192.168.206.128:9200"]
}
stdout{
codec => rubydebug
}
}
input{
jdbc{
jdbc_driver_library => "/root/mysql-connector-java-8.0.21.jar"
jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
jdbc_connection_string => "连接url"
jdbc_user =>""
jdbc_password =>""
schedule => "cron表达式,执行的周期"
statement => "select * from person(查表的sql语句)"
type => "jdbc"
}
}
output{
elasticsearch{
hosts =>["127.0.0.1:9200"]
index =>"person"
document_id=>"%{id}"
document_type=>"_doc"
}
stdout{
codec=>json_lines
}
}
bin/logstash-plugin install logstash-input-jdbc
,bin/logstash-plugin install logstash-output-elasticsearch
bin/logstash -f config/mysql.conf
mysql.conf文件
input{
jdbc{
jdbc_driver_library => "/root/mysql-connector-java-8.0.21.jar"
jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
jdbc_connection_string => "连接url"
jdbc_user =>""
jdbc_password =>""
schedule => "cron表达式,执行的周期"
statement => "select * from person where id > :sql_last_value"
# 需要记录查询结果某字段的值是,此字段为true,否则默认traking_column为timestamp的值
use_column_value => true
tracking_column =>id
# record_last_run是否记录上次数据存放位置
record_last_run =>true
#上一个sql_last_value存放文件的路径,必须要在文件中执行字段的初始值
last_run_metadata_path =>"/opt/logstash/config/person_1.txt"
# 是否清除last_run_metadata_path的记录,需要增量同步时此字段必须为false
clean_run=>false
type => "jdbc"
}
}
output{
elasticsearch{
hosts =>["127.0.0.1:9200"]
index =>"person"
document_id=>"%(uid)"
document_type=>"_doc"
}
stdout{
codec=>json_lines
}
}
使用id作为增量复制的条件只对新增的字段有用,对字段的过呢更新不会起效,使用一个新的字段optime来存储数据更新的时间,通过这个字段来进行增量复制就能解决这个问题
input{
jdbc{
jdbc_driver_library => "/root/mysql-connector-java-8.0.21.jar"
jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
jdbc_connection_string => "连接url"
jdbc_user =>""
jdbc_password =>""
schedule => "*****"
statement => "select * from person where optime> date_add(:sql_last_value,interval 8 hour) AND optime,interval 8 hour)"
# 需要记录查询结果某字段的值是,此字段为true,否则默认traking_column为timestamp的值
use_column_value => true
tracking_column =>optime
tracking_column_type=>timestamp
# record_last_run是否记录上次数据存放位置
record_last_run =>true
#上一个sql_last_value存放文件的路径,必须要在文件中执行字段的初始值
last_run_metadata_path =>"/opt/logstash/config/person_1.txt"
# 是否清除last_run_metadata_path的记录,需要增量同步时此字段必须为false
clean_run=>false
type => "jdbc"
}
}
output{
elasticsearch{
hosts =>["127.0.0.1:9200"]
index =>"person"
document_id=>"%{id}"
document_type=>"_doc"
}
stdout{
codec=>json_lines
}
}
使用时间增量同步的时候由于数据库使用的时UTC时间,而logstash使用的是CST时间,两个时间相差8小时,除了上方修改sql语句的方式解决还可以启动时通过参数指定时间为UTC来解决
在config目录下的pipelines.yml文件中对不同表的conf文件进行添加,即可自动加载配置文件,注意将logstash里面配置的path.config删除,否则不会加载pipelines.yml
- pipeline.id: person
path.config: "/opt/logstash/config/mysql.conf"
启动了一个ES节点后,默认就是创建了一个ES集群,只是这个集群只有一个节点,此时如果没有创建索引,则集群处于一种’空"的状态
status字段只是这当前集群在总体上是否工作正常:
mkdir /ES/config
,vi es1.yml
#集群名称
cluster.name: ES-Cluster
#节点名
node.name: node-1
#设置绑定的ip地址,默认为0.0.0.0
netword.publish_host:
#设置对外服务的http端口,默认是9200
http.port: 9200
#设置节点之间交互的tcp端口,默认是9300
transport.tcp.port: 9300
#是否允许跨域REST请求
http.cors.enabled: true
#允许跨域REST请求的地址
http.cors.allow-origin: "*"
#节点角色设置
node.master: true
node.data: true
#有成为主节点资格的节点列表
doscovery.seed_hosts: ["",""]
cluster.initial_master_nodes: ["",""]
#集群中一直正常运行的,有称为master节点资格的最少节点数,默认是1
discovery.zen.minimum_master_nodes: 1
docker run -e ES_JAVA_OPTS="-Xmx256m -Xmx256m" -d -p 9200:9200 -p 9300:9300 -v /ES/config/es1.yml:/usr/share/elasticsearch/config/elasticsearch.yml --name node-1 elasticsearch:7.0.0
和Redis集群一样,只有主节点具有写的功能,slave从主分片上复制数据,查数据时从任何一个分片都可以查到数据,当主分片挂掉,其他分片会顶替它的位置
elk包括了ES、kibana、logstash;使用dockers部署elk可以省区部署三个容器时共同的运行环境
docker pull sebp/elk:700
,拉取elkecho "vm.max_map_count=262144" > /etc/sysctl.conf
sysctl -p
docker run -dit --name elk -p 5601:5601 -p 9200:9200 -p 5044:5044 -v /opt/elk-data:/var/lib/elasticsearch -v /etc/localtime:/etc/localtime sebp/elk:700