Skip to content

Issue with elasticsearch pagination; index.max_result_window #83

Open
@nazrulworld

Description

@nazrulworld

background

I am doing a portal catalog search, which is bringing more 100K brains. When I going iterate all brains, got bellows error

2020-05-18 17:45:57,857 INFO    [elasticsearch:83][waitress] POST http://127.0.0.1:9200/danbioapp-backend-portal_catalog/portal_catalog/_bulk [status:200 request:0.024s]
2020-05-18 17:45:58,861 INFO    [elasticsearch:83][waitress] GET http://127.0.0.1:9200/_nodes/_all/http [status:200 request:0.002s]
2020-05-18 17:45:58,912 WARNING [elasticsearch:97][waitress] GET http://127.0.0.1:9200/danbioapp-backend-portal_catalog/portal_catalog/_search?from=10000&stored_fields=path.path&size=50 [status:500 request:0.050s]
2020-05-18 17:45:59,971 ERROR   [Zope.SiteErrorLog:251][waitress] 1589816759.920.727241015445 http://localhost:9090/danbioapp-backend/f....
Traceback (innermost last):
  Module ZPublisher.WSGIPublisher, line 156, in transaction_pubevents
  Module ZPublisher.WSGIPublisher, line 338, in publish_module
  Module ZPublisher.WSGIPublisher, line 256, in publish
  Module ZPublisher.mapply, line 85, in mapply
  Module ZPublisher.WSGIPublisher, line 62, in call_object
  Module Products.ExternalMethod.ExternalMethod, line 230, in __call__
   - __traceback_info__: ((<PloneSite at /danbioapp-backend>,), {}, None)
  Module <string>, line 32, in main
  Module ZTUtils.Lazy, line 201, in __getitem__
  Module collective.elasticsearch.es, line 104, in __getitem__
  Module collective.elasticsearch.es, line 170, in _search
  Module elasticsearch.client.utils, line 76, in _wrapped
  Module elasticsearch.client, line 660, in search
  Module elasticsearch.transport, line 318, in perform_request
  Module elasticsearch.connection.http_urllib3, line 186, in perform_request
  Module elasticsearch.connection.base, line 125, in _raise_error
TransportError: TransportError(500, u'search_phase_execution_exception',
u'Result window is too large, from + size must be less than or equal to: [10000]
but was [10050]. See the scroll api for a more efficient way to request large data sets.
This limit can be set by changing the [index.max_result_window] index level setting.')

Found some related https://stackoverflow.com/questions/35206409/elasticsearch-2-1-result-window-is-too-large-index-max-result-window

I know it is possible to increase index.max_result_window , but it would consist of memory usage.

My Idea here, if it is possible to use Elasticsearch scroll API here https://github.com/collective/collective.elasticsearch/blob/master/src/collective/elasticsearch/es.py#L48
Not sure if that would solve the problem, your expert opinion requested.
Thanks
@vangheem

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions