Search 请求返回一个单一的结果“页”,而 scroll API 可以被用来检索大量的结果(甚至所有的结果),就像在传统数据库中使用的游标 cursor。
滚动并不是为了实时的用户响应,而是为了处理大量的数据,例如,为了查询索引index下大量数据。
虽然搜索请求返回结果是单个“页面”,但scroll API 可用于从单个搜索请求中检索大量结果(甚至所有结果),其方式与在传统数据库使用相似。
为了使用滚动,初始搜索请求应该在查询字符串中指定scroll参数,它告诉 Elasticsearch 它应该保持“搜索上下文”多长时间(请参阅保持搜索上下文活着),例如Scroll scroll = new Scroll(TimeValue.timeValueMinutes(2L));。
下面展示 代码
.
package com.xxx.xxx.service.common;
import org.apache.commons.lang3.StringUtils;
import org.apache.http.HttpHost;
import org.apache.http.auth.AuthScope;
import org.apache.http.auth.UsernamePasswordCredentials;
import org.apache.http.client.CredentialsProvider;
import org.apache.http.impl.client.BasicCredentialsProvider;
import org.elasticsearch.action.admin.indices.alias.get.GetAliasesRequest;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.search.*;
import org.elasticsearch.action.support.IndicesOptions;
import org.elasticsearch.action.support.master.AcknowledgedResponse;
import org.elasticsearch.client.*;
import org.elasticsearch.cluster.metadata.AliasMetadata;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.RangeQueryBuilder;
import org.elasticsearch.search.Scroll;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.sort.SortOrder;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Component;
import java.text.SimpleDateFormat;
import java.util.*;
@Component
public class Elasticsearch {
private static final Logger LOG = LoggerFactory.getLogger(Elasticsearch.class);
/**
* 使用es的scroll方法来滚动查询es的数据,可以有效的解决大数据容量读取的限制
*
* @param index es的索引名称
* @param ip es的主机ip号
* @param port es的端口号
* @param protocol 查请求方式
* @param time 上一次查询最后一条时间
* @param rangeQuery 所要查询的字段
*/
public static List<SearchHit> scrollQueryElasticsearch(String index, String ip, int port, String username, String password, String protocol, Long time, String rangeQuery) {
RestHighLevelClient restHighLevelClient = null;
try {
//判断传入的用户名是否为空,不为空表示es需要认证
if (StringUtils.isNotBlank(username)) {
HttpHost httpHost = new HttpHost(ip, port, protocol);
RestClientBuilder restClientBuilder = RestClient.builder(httpHost);
//设置用户名和密码
CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials(username, password));
restClientBuilder.setHttpClientConfigCallback(f -> f.setDefaultCredentialsProvider(credentialsProvider));
restHighLevelClient = new RestHighLevelClient(restClientBuilder);
} else {
restHighLevelClient = new RestHighLevelClient(RestClient.builder(new HttpHost(ip, port, protocol)));
}
//构造查询条件
SearchRequest searchRequest = new SearchRequest(index);
SearchSourceBuilder builder = new SearchSourceBuilder();
//设置查询超时时间
Scroll scroll = new Scroll(TimeValue.timeValueMinutes(2L));
//构造查询条件,查询时间大于当前时间的数据
if (time > 0) {
RangeQueryBuilder rangequerybuilder = QueryBuilders
.rangeQuery(rangeQuery)
.gte(time);
builder.query(rangequerybuilder);
}
//设置最多一次能够取出1000笔数据,从第1001笔数据开始,将开启滚动查询
//PS:滚动查询也属于这一次查询,只不过因为一次查不完,分多次查
builder.size(10000);
builder.sort("@timestamp", SortOrder.ASC);
searchRequest.source(builder);
//将滚动放入
searchRequest.scroll(scroll);
//ignore_unavailable :是否忽略不可用的索引
//allow_no_indices:是否允许索引不存在
//expandToOpenIndices :通配符表达式将扩展为打开的索引
//expandToClosedIndices :通配符表达式将扩展为关闭的索引
searchRequest.indicesOptions(IndicesOptions.fromOptions(true, true, true, false));
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
SearchHits hits = searchResponse.getHits();
SearchHit[] hit = hits.getHits();
List<SearchHit> searchHitList = new ArrayList<>();
for (SearchHit h : hit) {
searchHitList.add(h);
}
/**
*在这个位置已经读到了前一千条数据,可以在这先对这一千数据进行处理。下面滚动查询剩下的数据
*/
//记录要滚动的ID
String scrollId = searchResponse.getScrollId();
//滚动查询部分,将从第1001笔数据开始取
SearchHit[] hitsScroll = hits.getHits();
SearchScrollRequest searchScrollRequest = null;
while (hitsScroll != null && hitsScroll.length > 0) {
//构造滚动查询条件
searchScrollRequest = new SearchScrollRequest(scrollId);
searchScrollRequest.scroll(scroll);
//响应必须是上面的响应对象,需要对上一层进行覆盖
searchResponse = restHighLevelClient.scroll(searchScrollRequest, RequestOptions.DEFAULT);
scrollId = searchResponse.getScrollId();
hits = searchResponse.getHits();
hitsScroll = hits.getHits();
SearchHit[] hitNew = hits.getHits();
/**
*在这个位置可以对滚动查询到的从1001条数据开始的数据进行处理。
*/
for (SearchHit h : hitNew) {
searchHitList.add(h);
}
}
//清除滚动,否则影响下次查询
ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
clearScrollRequest.addScrollId(scrollId);
ClearScrollResponse clearScrollResponse = null;
if (searchHitList.size() > 0) {
clearScrollResponse = restHighLevelClient.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);
//清除滚动是否成功
boolean succeeded = clearScrollResponse.isSucceeded();
}
return searchHitList;
} catch (Exception e) {
throw new RuntimeException(e);
} finally {
try {
//关闭es连接,否则长时间连接导致java堆栈溢出异常
if (restHighLevelClient != null) {
restHighLevelClient.close();
}
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}
public static void main(String[] args) throws Exception {
List<SearchHit> hitsList = scrollQueryElasticsearch("index_2022-10-27", "127.0.0.1", 9200, "elastic", "123456@", "http", 0L, "@timestamp");
if (hitsList != null && hitsList.size() > 0) {
// 这里填写你的业务逻辑,即对每一条数据的处理
for (int x = 0; x < hitsList.size(); x++) {
Map<String, Object> sourceAsMap = hitsList.get(x).getSourceAsMap();
if (x == (hitsList.size() - 1)) {
String timeStr = sourceAsMap.get("@timestamp").toString();
timeStr = timeStr.replace("Z", " UTC");//UTC是本地时间
SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS Z");
Date d = format.parse(timeStr);
}
}
}
}
}
请求结果(如果单条日志内容过大,Map
注意:
searchRequest.source(builder.size(10000)); 设置每次滚动查询的数量
可以看出来,滚动查询也很简单,普通查询只能查一万笔,滚动查询就是从第一万零一笔数据开始,进行滚动,后续一直滚动,直到没有,所以滚动查询出来的结果逻辑处理,和上边普通查询是一样的,另外,滚动查询是建立在普通查询基础上的!!!(首次使用es查询,记录有遗漏的地方帮忙指正,谢谢!!!!!)
先赞后看,养成习惯!!!^ _ ^ ❤️ ❤️ ❤️
码字不易,大家的支持就是我的坚持下去的动力。点赞后不要忘了关注我哦!