ES使用scroll游标查询

使用from+size存在的问题?

由于ES单次查询数据上限1W条,正常查询1W条后程序会异常,ES提供了scroll-api来解决

例子:基于scroll实现月度销售数据批量下载

2条数据,做一个演示,每个批次下载一条宝马的销售记录,分2个批次给它下载完

SearchResponse scrollResp = client.prepareSearch("car_shop")
        .addTypes("sales")
        .setScroll(new TimeValue(60000))
        .setQuery(termQuery("brand.raw", "宝马"))
        .setSize(1)
        .get(); 
do {
    for (SearchHit hit : scrollResp.getHits().getHits()) {
        
    }
    
    scrollResp = client.prepareSearchScroll(scrollResp.getScrollId())
            .setScroll(new TimeValue(60000))
            .execute()
            .actionGet();
} while(scrollResp.getHits().getHits().length != 0);

源代码如下:

public class ScollDownloadSalesDataApp {
    
    @SuppressWarnings({ "resource", "unchecked" })
    public static void main(String[] args) throws Exception {
        Settings settings = Settings.builder()
                .put("cluster.name", "elasticsearch")
                .build();
        
        TransportClient client = new PreBuiltTransportClient(settings)
                .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("localhost"), 9300)); 
    
        SearchResponse searchResponse = client.prepareSearch("car_shop") 
                .setTypes("sales")
                .setQuery(QueryBuilders.termQuery("brand.keyword", "宝马"))
                .setScroll(new TimeValue(60000))
                .setSize(1)
                .get();
        
        int batchCount = 0;
        
        do {
            for(SearchHit searchHit : searchResponse.getHits().getHits()) {
                System.out.println("batch: " + ++batchCount); 
                //获取查询到的数据
                System.out.println(searchHit.getSourceAsString());  
                
                // 每次查询一批数据,比如1000行,然后写入本地的一个excel文件中
                // 如果说你一下子查询几十万条数据,不现实,jvm内存可能都会爆掉
            }
            
            searchResponse = client.prepareSearchScroll(searchResponse.getScrollId())
                    .setScroll(new TimeValue(60000))
                    .execute()
                    .actionGet();
        } while(searchResponse.getHits().getHits().length != 0);
        
        client.close();
    }
    
}

文档参考:

86_熟练掌握ES Java API_基于scroll实现月度销售数据批量下载 - 简书86_熟练掌握ES Java API_基于scroll实现月度销售数据批量下载 比如说,现在要下载大批量的数据,从es,放到excel中,我们说,月度,或者年度,销售记录,很...https://www.jianshu.com/p/cba2b4019bd4Spring集成elasticSearch,使用elasticTemplate的scroll查询分页拉取全量数据_GarveyWang的微博-CSDN博客写在前面因为工作上的需要,记录下spring-data-elasticsearch与spring的配置,以及对应的海量数据全量拉取的过程。es中的分页拉取数据有两种分页形式,各适用于不同的场景1. From + Size在Kibana中,使用的查询的DSL如下GET /{index_name}/_search{ "from":10, "size":20, "...https://blog.csdn.net/garvey_wong/article/details/86624016?spm=1001.2101.3001.6661.1&utm_medium=distribute.pc_relevant_t0.none-task-blog-2~default~CTRLIST~default-1.no_search_link&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-2~default~CTRLIST~default-1.no_search_link

Spring集成elasticSearch,使用elasticTemplate的scroll查询分页拉取全量数据_GarveyWang的微博-CSDN博客写在前面因为工作上的需要,记录下spring-data-elasticsearch与spring的配置,以及对应的海量数据全量拉取的过程。es中的分页拉取数据有两种分页形式,各适用于不同的场景1. From + Size在Kibana中,使用的查询的DSL如下GET /{index_name}/_search{ "from":10, "size":20, "...https://blog.csdn.net/garvey_wong/article/details/86624016?spm=1001.2101.3001.6661.1&utm_medium=distribute.pc_relevant_t0.none-task-blog-2%7Edefault%7ECTRLIST%7Edefault-1.no_search_link&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-2%7Edefault%7ECTRLIST%7Edefault-1.no_search_link 

你可能感兴趣的:(工作,elasticsearch,搜索引擎,大数据)