-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
The upstream Globus Search index limits result fetching to 10,000 using limit and offset due to the resource intensity of deep pagination. This API is intended to mimic the old Solr API which all community clients used and therefore also used limit and offset for pagination.
The recommended solution to this problem is to use Scroll Queries wherein the first page of results returns an opaque string used as a "bookmark" to the next page of results. The Globus Search interface for Scroll Queries is not subject to the 10,000 result limit.
The problems with this solution are:
- All existing clients expect to provide two
ints (limitandoffset) to define what subset of results they are interested in. There is no way to map this pair ofints to a bookmark string as each bookmark is unique to a search. - Scroll Queries do not allow for "slicing" into a result set. For example, if the client knows a priori that they need N through N+10 results (even if N is less than 10,000), it's not possible to select for those results directly and the client must page through all results from the beginning.
- These bookmarks have a shelf life. Elasticsearch must store the results server side for the life of the bookmark. If a client uses an expired bookmark (e.g., continuing a search after a week vacation), it is not clear how that would be handled in either Elasticsearch or the enclosing Globus Search API.
Possible solutions:
- The Globus Search team removes the 10,000 result limit. This is the most desirable from our standpoint as there are no workarounds or adaptations involved to document for future evolutions, but the most resource intensive on the Globus side.
- A breaking change in the bridge API. This is the least desirable from our standpoint as the whole point of the bridge API was to keep existing community clients working while removing the aging Solr infrastructure.
- Document and accept the limitation. This is undesirable since the Solr infrastructure did not have this limitation and thus could be considered a breaking change, but it should also be rare that clients have an actual need to retrieve (vs search) more than 10,000 results. It should also be easier for clients to adapt to by breaking their searches down into more segments via other facets so that each search result is less than the 10,000 limit. We've already had at least one user hit this limit
Metadata
Metadata
Assignees
Labels
No labels