Java使用ES滚动查询大数据量信息

如果一次性要查出来比如10万条数据,那么性能会很差,此时一般会采取用scoll滚动查询,一批一批的查,直到所有数据都查询完处理完
使用scoll滚动搜索,可以先搜索一批数据,然后下次再搜索一批数据,以此类推,直到搜索出全部的数据来

scoll搜索会在第一次搜索的时候,保存一个当时的视图快照,之后只会基于该旧的视图快照提供数据搜索,如果这个期间数据变更,是不会让用户看到的
采用基于_doc进行排序的方式,性能较高
每次发送scroll请求,我们还需要指定一个scoll参数,指定一个时间窗口,每次搜索请求只要在这个时间窗口内能完成就可以了

	@Autowired
    private RestHighLevelClientUtil restHighLevelClientUtil;
	
	@Value("${es.index-list.search-keywords-index}")
    String indexName;

	VideoQuery changeQuery(VideoQuery query){
        query.setCreateBy(getUsername());
        //通过search_keyword查找video_origin.id
        if(query.getSearchKeywords() != null && !"".equals(query.getSearchKeywords())) {
            try {
                String[] includeFields = new String[] {"video_id"};
                String[] excludeFields = new String[] {"content"};

                List hitListContent = restHighLevelClientUtil.searchAllByMatch("content", query.getSearchKeywords(), includeFields, excludeFields, indexName);
                List hitListTitle = restHighLevelClientUtil.searchAllByWildcard("title", query.getSearchKeywords(), includeFields, excludeFields, indexName);
                
                List hitList = new ArrayList<>();
                hitList.addAll(hitListTitle);
                hitList.addAll(hitListContent);

                List video_ids = hitList.stream().map(i -> i.getSourceAsMap().get("video_id")).distinct().collect(Collectors.toList());
                video_ids.add(-1);
                String ids = video_ids.toString().substring(1, video_ids.toString().length()-1).replace(" ", "");
                query.setIds(ids);
                query.setSearchKeywords(""); }
            catch (IOException e) {
                log.error(e + "search keywords failed");
            }
        }
        return query;
    }


/**
     * 使用scroll查询所有符合条件的不分页
     * @param field
     * @param key
     * @param includeFields
     * @param excludeFields
     * @param indexNames
     * @return
     * @throws IOException
     */
    public List searchAllByMatch(String field, String key, String[] includeFields, String[] excludeFields, String ... indexNames) throws IOException{
        // 建立查询请求
        SearchRequest request = new SearchRequest(indexNames);
        SearchSourceBuilder builder = new SearchSourceBuilder();
        builder.size(9000); 
        builder.query(new MatchQueryBuilder(field, key));
        if (includeFields.length != 0 && excludeFields.length !=0) {
            builder.fetchSource(includeFields, excludeFields);
        }
        request.source(builder);
        return this.searchAllByScroll(request);
    }
 /**
     * 针对查询请求,通过 Scroll 查询所有,不分页
     * @param request 查询请求
     * @return
     */
    public List searchAllByScroll(SearchRequest request) throws IOException {
        request.scroll(TimeValue.timeValueSeconds(10));

        List searchHitList = new ArrayList<>();

        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        String scrollId = response.getScrollId();
        SearchHit[] searchHits = response.getHits().getHits();

        while (searchHits != null && searchHits.length > 0) {
            searchHitList.addAll(Arrays.asList(searchHits));
            response = client.scroll(new SearchScrollRequest(scrollId).scroll(TimeValue.timeValueSeconds(10)), RequestOptions.DEFAULT);
            scrollId = response.getScrollId();
            searchHits = response.getHits().getHits();
        }

        ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
        clearScrollRequest.addScrollId(scrollId);
        ClearScrollResponse clearScrollResponse = client.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);
        boolean succeeded = clearScrollResponse.isSucceeded();
        System.out.println("clearScroll: " + succeeded);
        return searchHitList;
    }


 
  

https://blog.csdn.net/star1210644725/article/details/110670759?utm_term=es%E6%BB%9A%E5%8A%A8%E6%9F%A5%E8%AF%A2&utm_medium=distribute.pc_aggpage_search_result.none-task-blog-2allsobaiduweb~default-3-110670759&spm=3001.4430

https://blog.csdn.net/weixin_40341116/article/details/80821655

https://blog.csdn.net/weixin_44993313/article/details/106244335?utm_medium=distribute.pc_relevant.none-task-blog-baidujs_title-13&spm=1001.2101.3001.4242

你可能感兴趣的:(java)