#1645: add support for "search after" pagination#2299
#1645: add support for "search after" pagination#2299klimov-paul wants to merge 2 commits intoruflin:9.xfrom
Conversation
📝 WalkthroughWalkthroughAdds search_after pagination: a Query setter, a Document getter for sort values, two generator-based Index iterators (each, batch) implementing search_after traversal, a changelog entry, and functional tests validating the new behavior. Changes
Sequence Diagram(s)sequenceDiagram
actor Client
participant Index
participant Query
participant ES as Elasticsearch
participant Doc as Document
Client->>Index: each(query, batchSize, options)
activate Index
Index->>Query: Query::create(clone)
Index->>Query: setSize(batchSize)
Index->>Query: setSearchAfter([])
loop while page count == batchSize
Index->>ES: search(query, options)
activate ES
ES-->>Index: results[]
deactivate ES
alt results not empty
Index->>Doc: lastDoc.getSort()
activate Doc
Doc-->>Index: sort values
deactivate Doc
Index->>Query: setSearchAfter(lastDocSort)
Index->>Client: yield Document(s)
end
end
deactivate Index
sequenceDiagram
actor Client
participant Index
participant Query
participant ES as Elasticsearch
participant Doc as Document
Client->>Index: batch(query, batchSize, options)
activate Index
Index->>Query: Query::create(clone)
Index->>Query: setSize(batchSize)
Index->>Query: setSearchAfter([])
loop until empty or underfilled page
Index->>ES: search(query, options)
activate ES
ES-->>Index: results[]
deactivate ES
alt results not empty
Index->>Doc: lastDoc.getSort()
activate Doc
Doc-->>Index: sort values
deactivate Doc
Index->>Query: setSearchAfter(lastDocSort)
end
Index->>Client: yield docs[]
end
deactivate Index
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (5)
tests/QueryTest.php (1)
652-663: Strengthen the second-page assertion to verify continuity, not only count.Current check at Line 662 only validates size. Also assert the expected document ID and remove the no-op call at Line 654.
✅ Suggested test tightening
/** `@var` Document $lastDocument */ $lastDocument = array_pop($documents); -$lastDocument->getParam('sort'); $this->assertNotEmpty($lastDocument->getSort()); $query->setSearchAfter($lastDocument->getSort()); $secondPageResultSet = $index->search($query); $documents = $secondPageResultSet->getDocuments(); $this->assertCount(1, $documents); +$this->assertSame('2', $documents[0]->getId());🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/QueryTest.php` around lines 652 - 663, Remove the no-op call $lastDocument->getParam('sort'); and instead assert that the second page continues from the last document by using $lastDocument->getSort() to setSearchAfter on $query (via $query->setSearchAfter($lastDocument->getSort())), then after $secondPageResultSet = $index->search($query) and $documents = $secondPageResultSet->getDocuments() keep the assertCount(1, $documents) and add an assertion that the returned document's id equals the expected id (e.g. compare $documents[0]->getId() to the known next document id or to a stored expected value), ensuring continuity rather than only count.src/Index.php (1)
566-576: Avoid rebuilding documents twice per page ineach().
$resultSet->getDocuments()is called multiple times in the same loop. Cache once, reuse count, and compute the last document from that array.♻️ Proposed refactor
while (true) { - $resultSet = $this->search($query, $options); - foreach ($resultSet->getDocuments() as $document) { + $documents = $this->search($query, $options)->getDocuments(); + $count = \count($documents); + if (0 === $count) { + break; + } + + foreach ($documents as $document) { yield $document; } - if (count($resultSet->getDocuments()) < $batchSize) { + if ($count < $batchSize) { break; } - $query->setSearchAfter($document->getSort()); + $lastDocument = $documents[$count - 1]; + $query->setSearchAfter($lastDocument->getSort()); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Index.php` around lines 566 - 576, The loop in each() repeatedly calls $resultSet->getDocuments() and reuses $document from the inner foreach after the loop; fix by caching the documents once per iteration: assign $docs = $resultSet->getDocuments(), iterate over $docs to yield each document, use count($docs) for the batchSize check, and compute the last document with $last = end($docs) (or equivalent) then call $query->setSearchAfter($last->getSort()); update references to $resultSet->getDocuments(), $document, $batchSize and $query->setSearchAfter accordingly.tests/IndexTest.php (1)
861-921: Add guard-path tests for iterator validation branches.Happy-path coverage is good; consider adding cases for missing
sortandbatchSize < 1so both exception branches inIndex::each()/Index::batch()are protected.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/IndexTest.php` around lines 861 - 921, Add negative-path unit tests to cover the validation branches in Index::each() and Index::batch(): create tests that call $index->each($query, $batchSize) and $index->batch($query, $batchSize) with (a) a Query that has no sort set and (b) an invalid batchSize < 1 (e.g. 0 or -1), and assert the appropriate exceptions (e.g. InvalidArgumentException) are thrown. Reference the existing test methods testIterateEach and testIterateBatch as templates—reuse _createIndex(), addDocuments(), refresh(), and construct Query objects—then use PHPUnit's expectException to verify both guard paths in each() and batch() are covered.src/Document.php (1)
244-247: Make missingsorthandling explicit ingetSort().At Line 246,
getParam('sort')throws if the document has no sort metadata. Consider a guard and a clearer exception message so failures are explicit for non-search-hit documents.♻️ Suggested tweak
public function getSort(): array { + if (!$this->hasParam('sort')) { + throw new InvalidException('Sort values are not available on this document.'); + } + return $this->getParam('sort'); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Document.php` around lines 244 - 247, The getSort() method currently calls getParam('sort') which will throw a generic error when sort metadata is absent; add an explicit guard in Document::getSort (use $this->hasParam('sort') or equivalent) and if missing throw a clear RuntimeException (e.g. "Document has no sort metadata; getSort is only valid for search-hit documents") or otherwise return the sort as an array (cast to array) so callers get an explicit, informative failure or a consistent type; update references to getParam('sort') accordingly.src/Query.php (1)
489-493: Consider validating non-emptysearch_afterinput at setter time for improved fail-fast behavior.Line 491 accepts any array without validation. While Elasticsearch will reject an empty
search_afterat request time, validating at the setter provides clearer early feedback during development.Proposed guard
public function setSearchAfter(array $searchAfter): self { - $this->setParam('search_after', $searchAfter); - - return $this; + if ([] === $searchAfter) { + throw new InvalidException('Search after values must not be empty.'); + } + + return $this->setParam('search_after', $searchAfter); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Query.php` around lines 489 - 493, The setSearchAfter method currently allows an empty array which Elasticsearch will later reject; add a guard at the start of setSearchAfter(array $searchAfter): self to validate that $searchAfter is not empty (e.g., if (empty($searchAfter)) throw new InvalidArgumentException('search_after must be a non-empty array')); then call $this->setParam('search_after', $searchAfter) and return $this; this provides immediate, fail-fast feedback in the setSearchAfter method.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/Index.php`:
- Around line 560-562: Both each() and batch() perform search_after iteration
but only validate presence of 'sort'; add a check that 'from' is either absent
or zero and throw InvalidException if it's > 0. In the Index.php methods where
you currently check if (!$query->hasParam('sort')) (refer to the existing check
in each() and the analogous block around the 598–600 region), inspect
$query->getParam('from', 0) (or cast/int) and if that value !== 0 throw
InvalidException('When using "search_after" iteration, "from" must be 0 or
omitted.'); to prevent invalid pagination.
---
Nitpick comments:
In `@src/Document.php`:
- Around line 244-247: The getSort() method currently calls getParam('sort')
which will throw a generic error when sort metadata is absent; add an explicit
guard in Document::getSort (use $this->hasParam('sort') or equivalent) and if
missing throw a clear RuntimeException (e.g. "Document has no sort metadata;
getSort is only valid for search-hit documents") or otherwise return the sort as
an array (cast to array) so callers get an explicit, informative failure or a
consistent type; update references to getParam('sort') accordingly.
In `@src/Index.php`:
- Around line 566-576: The loop in each() repeatedly calls
$resultSet->getDocuments() and reuses $document from the inner foreach after the
loop; fix by caching the documents once per iteration: assign $docs =
$resultSet->getDocuments(), iterate over $docs to yield each document, use
count($docs) for the batchSize check, and compute the last document with $last =
end($docs) (or equivalent) then call $query->setSearchAfter($last->getSort());
update references to $resultSet->getDocuments(), $document, $batchSize and
$query->setSearchAfter accordingly.
In `@src/Query.php`:
- Around line 489-493: The setSearchAfter method currently allows an empty array
which Elasticsearch will later reject; add a guard at the start of
setSearchAfter(array $searchAfter): self to validate that $searchAfter is not
empty (e.g., if (empty($searchAfter)) throw new
InvalidArgumentException('search_after must be a non-empty array')); then call
$this->setParam('search_after', $searchAfter) and return $this; this provides
immediate, fail-fast feedback in the setSearchAfter method.
In `@tests/IndexTest.php`:
- Around line 861-921: Add negative-path unit tests to cover the validation
branches in Index::each() and Index::batch(): create tests that call
$index->each($query, $batchSize) and $index->batch($query, $batchSize) with (a)
a Query that has no sort set and (b) an invalid batchSize < 1 (e.g. 0 or -1),
and assert the appropriate exceptions (e.g. InvalidArgumentException) are
thrown. Reference the existing test methods testIterateEach and testIterateBatch
as templates—reuse _createIndex(), addDocuments(), refresh(), and construct
Query objects—then use PHPUnit's expectException to verify both guard paths in
each() and batch() are covered.
In `@tests/QueryTest.php`:
- Around line 652-663: Remove the no-op call $lastDocument->getParam('sort');
and instead assert that the second page continues from the last document by
using $lastDocument->getSort() to setSearchAfter on $query (via
$query->setSearchAfter($lastDocument->getSort())), then after
$secondPageResultSet = $index->search($query) and $documents =
$secondPageResultSet->getDocuments() keep the assertCount(1, $documents) and add
an assertion that the returned document's id equals the expected id (e.g.
compare $documents[0]->getId() to the known next document id or to a stored
expected value), ensuring continuity rather than only count.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e82fd2c3-027b-42c8-b17f-e671a04b11c0
📒 Files selected for processing (6)
CHANGELOG.mdsrc/Document.phpsrc/Index.phpsrc/Query.phptests/IndexTest.phptests/QueryTest.php
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/Index.php (2)
598-600: Useend()instead ofarray_pop()to avoid unnecessary mutation.
array_pop()modifies the array to retrieve the last element. Since the array isn't used after this point,end()achieves the same result without mutation and better conveys intent.♻️ Suggested refactor
- $lastDocument = array_pop($documents); - - $query->setSearchAfter($lastDocument->getSort()); + $query->setSearchAfter(end($documents)->getSort());🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Index.php` around lines 598 - 600, The code currently uses array_pop($documents) which mutates $documents to get the last element; replace that with $lastDocument = end($documents) to avoid modifying the array (keep intent clear) and then call $query->setSearchAfter($lastDocument->getSort()); ensure $lastDocument is checked for falsy (e.g., when $documents is empty) before calling getSort() to avoid errors; update references to $lastDocument and $documents accordingly in the surrounding scope (look for the array_pop usage around the $lastDocument variable and the $query->setSearchAfter call).
556-567: Consider cachinggetDocuments()result to avoid redundant calls.
$resultSet->getDocuments()is called twice per iteration (once inforeach, once incount). Storing the result in a variable improves clarity and avoids potential redundant computation.♻️ Suggested refactor
while (true) { $resultSet = $this->search($query, $options); - foreach ($resultSet->getDocuments() as $document) { + $documents = $resultSet->getDocuments(); + foreach ($documents as $document) { yield $document; } - if (count($resultSet->getDocuments()) < $batchSize) { + if (count($documents) < $batchSize) { break; } $query->setSearchAfter($document->getSort()); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Index.php` around lines 556 - 567, Cache $resultSet->getDocuments() into a local variable (e.g. $documents = $resultSet->getDocuments()) before iterating so you only call getDocuments() once; iterate over $documents in the foreach, use count($documents) for the batch-size check, and ensure you set the search-after using the last yielded document’s sort value (either by using end($documents)->getSort() or by tracking the last $document inside the foreach) when calling $query->setSearchAfter(...).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/Index.php`:
- Around line 598-600: The code currently uses array_pop($documents) which
mutates $documents to get the last element; replace that with $lastDocument =
end($documents) to avoid modifying the array (keep intent clear) and then call
$query->setSearchAfter($lastDocument->getSort()); ensure $lastDocument is
checked for falsy (e.g., when $documents is empty) before calling getSort() to
avoid errors; update references to $lastDocument and $documents accordingly in
the surrounding scope (look for the array_pop usage around the $lastDocument
variable and the $query->setSearchAfter call).
- Around line 556-567: Cache $resultSet->getDocuments() into a local variable
(e.g. $documents = $resultSet->getDocuments()) before iterating so you only call
getDocuments() once; iterate over $documents in the foreach, use
count($documents) for the batch-size check, and ensure you set the search-after
using the last yielded document’s sort value (either by using
end($documents)->getSort() or by tracking the last $document inside the foreach)
when calling $query->setSearchAfter(...).
Resolves #1645.
Adds support for "search after" pagination and infinite documents scroll.
See: https://www.elastic.co/guide/en/elasticsearch/reference/8.18/paginate-search-results.html#search-after
Covers "9.x" branch.
Should be ported to "8.x" separately, if PR is accepted.
Migrated from #2298.
Summary by CodeRabbit
New Features
Documentation
Tests