Limit of 10,000 results

The upstream Globus Search index [limits result fetching to 10,000 using limit and offset](https://docs.globus.org/api/search/reference/post_query/) due to the [resource intensity of deep pagination](https://bigdataboutique.com/blog/opensearch-and-elasticsearch-scroll-and-deep-paging-methods-compared-7dc96f). This API is intended to mimic the old Solr API which all community clients used and therefore also used limit and offset for pagination.

The recommended solution to this problem is to [use Scroll Queries](https://docs.globus.org/api/search/reference/scroll_query/) wherein the first page of results returns an opaque string used as a "bookmark" to the next page of results. The Globus Search interface for Scroll Queries is not subject to the 10,000 result limit.

The problems with this solution are:

1. All existing clients expect to provide two `int`s (`limit` and `offset`) to define what subset of results they are interested in. There is no way to map this pair of `int`s to a bookmark string as each bookmark is unique to a search.
2. Scroll Queries do not allow for "slicing" into a result set. For example, if the client knows a priori that they need N through N+10 results (even if N is less than 10,000), it's not possible to select for those results directly and the client must page through all results from the beginning.
3. These bookmarks have a shelf life. Elasticsearch must store the results server side for the life of the bookmark. If a client uses an expired bookmark (e.g., continuing a search after a week vacation), it is not clear how that would be handled in either Elasticsearch or the enclosing Globus Search API.

Possible solutions:

1. The Globus Search team removes the 10,000 result limit. This is the most desirable from our standpoint as there are no workarounds or adaptations involved to document for future evolutions, but the most resource intensive on the Globus side.
2. A breaking change in the bridge API. This is the least desirable from our standpoint as the whole point of the bridge API was to keep existing community clients working while removing the aging Solr infrastructure.
3. Document and accept the limitation. This is undesirable since the Solr infrastructure did not have this limitation and thus could be considered a breaking change, but it should also be rare that clients have an actual need to _**retrieve**_ (vs search) more than 10,000 results. It should also be easier for clients to adapt to by breaking their searches down into more segments via other facets so that each search result is less than the 10,000 limit. We've already had [at least one user hit this limit](https://github.com/esgf2-us/metagrid/issues/805)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Limit of 10,000 results #98

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Limit of 10,000 results #98

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions