es深分页问题解决小记

问题描述

在分页查询中，当查询数据总量超过10000时，es为了避免大量数据加载到内存导致内存溢出默认情况下会加限制最大1w条

当数量超过的时候会提示异常:

org.springframework.data.elasticsearch.RestStatusException: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]; nested exception is ElasticsearchStatusException[Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Result window is too large, from + size must be less than or equal to: [10000] but was [52030]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.]]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Result window is too large, from + size must be less than or equal to: [10000] but was [52030]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.]];

解决方案

可以使用滚动查询(Scroll API)来解决这个问题。其实原理也简单，就是将本次查询最后id当作下次查询条件，一直轮询，直到查询没数据就返回。

public void page(Req req) {

    NativeSearchQuery nsq = new NativeSearchQueryBuilder()
	        //取消es中最大10000条限制
            .withTrackTotalHits(Boolean.TRUE)
            .withQuery(assemblePageBoolQueryBuilder(req))
            .withSorts(SortBuilders.fieldSort("created_date").order(SortOrder.DESC))
            .withPageable(PageRequest.of(req.getPage(), req.getRows())).build();

    SearchScrollHits<MonitorOrderSearchEsDto> search = elasticsearchRestTemplate.searchScrollStart(60000, nsq, MonitorOrderSearchEsDto.class, ES_INDEX);
    //滚动id，记录当前查询的终止位置
    String scrollId = search.getScrollId();
    //快照在es缓存中保存时长，自定义
    long scrollTimeInMillis = 60 * 1000;
    //滚动次数，模拟分页数(page)
    int scrollTimes = 0;
    //当滚动查询无数据返回 或 滚动次数大于分页数，不再查询
    while (search.hasSearchHits() && scrollTimes < req.getPage()) {
        search = elasticsearchRestTemplate.searchScrollContinue(scrollId, scrollTimeInMillis, MonitorOrderSearchEsDto.class, ES_INDEX);
        //记录每次的滚动id
        scrollId = search.getScrollId();
        scrollTimes = scrollTimes + 1;
    }
    //因为es每次滚动查询会生成快照，需要清除当前滚动id
    elasticsearchRestTemplate.searchScrollClear(Collections.singletonList(scrollId));
	
    //业务处理...
}

60000，只是一个示例的时间数据，该参数表明查询结果在es中保存时效时间
需分别调用3个方法 searchScrollStart，searchScrollContinue，searchScrollClear

总结：

滚动查询是建立在普通查询基础上的
滚动查询相当于快照，如果在使用scroll进行滚动查询期间有所增删改操作，那么查询结果不会同步最新