Skip to content

#1645: add support for "search after" pagination#2299

Open
klimov-paul wants to merge 2 commits intoruflin:9.xfrom
klimov-paul:1645-search-after-9x
Open

#1645: add support for "search after" pagination#2299
klimov-paul wants to merge 2 commits intoruflin:9.xfrom
klimov-paul:1645-search-after-9x

Conversation

@klimov-paul
Copy link
Copy Markdown

@klimov-paul klimov-paul commented Apr 6, 2026

Resolves #1645.

Adds support for "search after" pagination and infinite documents scroll.

See: https://www.elastic.co/guide/en/elasticsearch/reference/8.18/paginate-search-results.html#search-after

Covers "9.x" branch.
Should be ported to "8.x" separately, if PR is accepted.

Migrated from #2298.

Summary by CodeRabbit

  • New Features

    • Added "search after" pagination and new iteration APIs for processing results as individual items or batches with configurable batch sizes
  • Documentation

    • Updated changelog to note the new pagination support
  • Tests

    • Added functional tests validating search-after pagination and iteration behavior

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 6, 2026

📝 Walkthrough

Walkthrough

Adds search_after pagination: a Query setter, a Document getter for sort values, two generator-based Index iterators (each, batch) implementing search_after traversal, a changelog entry, and functional tests validating the new behavior.

Changes

Cohort / File(s) Summary
Documentation
CHANGELOG.md
Added an Unreleased entry documenting support for search_after pagination.
Query API
src/Query.php
Added public function setSearchAfter(array $searchAfter): self to set the search_after request parameter (fluent).
Document API
src/Document.php
Added public function getSort(): array to return the document's sort values.
Index Pagination
src/Index.php
Added each($query, int $batchSize, ?array $options): \Generator and batch($query, int $batchSize, ?array $options): \Generator that implement search_after pagination with query cloning, validation (requires sort, forbids non-zero from, enforces batchSize >= 1), size management, and cursor advancement using last document sort values.
Tests
tests/IndexTest.php, tests/QueryTest.php
Added functional tests: testIterateEach(), testIterateBatch() verifying iteration order, and testSearchAfter() verifying search_after behavior across pages.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant Index
    participant Query
    participant ES as Elasticsearch
    participant Doc as Document

    Client->>Index: each(query, batchSize, options)
    activate Index
    Index->>Query: Query::create(clone)
    Index->>Query: setSize(batchSize)
    Index->>Query: setSearchAfter([])

    loop while page count == batchSize
        Index->>ES: search(query, options)
        activate ES
        ES-->>Index: results[]
        deactivate ES

        alt results not empty
            Index->>Doc: lastDoc.getSort()
            activate Doc
            Doc-->>Index: sort values
            deactivate Doc

            Index->>Query: setSearchAfter(lastDocSort)
            Index->>Client: yield Document(s)
        end
    end
    deactivate Index
Loading
sequenceDiagram
    actor Client
    participant Index
    participant Query
    participant ES as Elasticsearch
    participant Doc as Document

    Client->>Index: batch(query, batchSize, options)
    activate Index
    Index->>Query: Query::create(clone)
    Index->>Query: setSize(batchSize)
    Index->>Query: setSearchAfter([])

    loop until empty or underfilled page
        Index->>ES: search(query, options)
        activate ES
        ES-->>Index: results[]
        deactivate ES

        alt results not empty
            Index->>Doc: lastDoc.getSort()
            activate Doc
            Doc-->>Index: sort values
            deactivate Doc

            Index->>Query: setSearchAfter(lastDocSort)
        end

        Index->>Client: yield docs[]
    end
    deactivate Index
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰
Hop, hop, through sorted lands I prance,
search_after gives my paws a chance,
Each page I gather, batch by batch,
trailing sort-tails, no cursor to scratch.
A tiny rabbit cheers this dance! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 38.46% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding support for 'search after' pagination, which aligns with the primary objective of implementing Elasticsearch's search-after feature.
Linked Issues check ✅ Passed The PR fully addresses both objectives from #1645: (1) adds setSearchAfter() setter for Query's search_after parameter, and (2) implements iteration methods (each() and batch()) for deep scrolling with search_after.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing search-after pagination: Query setter, Document.getSort() helper, Index iterator methods, and corresponding tests. No unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
tests/QueryTest.php (1)

652-663: Strengthen the second-page assertion to verify continuity, not only count.

Current check at Line 662 only validates size. Also assert the expected document ID and remove the no-op call at Line 654.

✅ Suggested test tightening
 /** `@var` Document $lastDocument */
 $lastDocument = array_pop($documents);
-$lastDocument->getParam('sort');

 $this->assertNotEmpty($lastDocument->getSort());

 $query->setSearchAfter($lastDocument->getSort());

 $secondPageResultSet = $index->search($query);
 $documents = $secondPageResultSet->getDocuments();
 $this->assertCount(1, $documents);
+$this->assertSame('2', $documents[0]->getId());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/QueryTest.php` around lines 652 - 663, Remove the no-op call
$lastDocument->getParam('sort'); and instead assert that the second page
continues from the last document by using $lastDocument->getSort() to
setSearchAfter on $query (via $query->setSearchAfter($lastDocument->getSort())),
then after $secondPageResultSet = $index->search($query) and $documents =
$secondPageResultSet->getDocuments() keep the assertCount(1, $documents) and add
an assertion that the returned document's id equals the expected id (e.g.
compare $documents[0]->getId() to the known next document id or to a stored
expected value), ensuring continuity rather than only count.
src/Index.php (1)

566-576: Avoid rebuilding documents twice per page in each().

$resultSet->getDocuments() is called multiple times in the same loop. Cache once, reuse count, and compute the last document from that array.

♻️ Proposed refactor
 while (true) {
-    $resultSet = $this->search($query, $options);
-    foreach ($resultSet->getDocuments() as $document) {
+    $documents = $this->search($query, $options)->getDocuments();
+    $count = \count($documents);
+    if (0 === $count) {
+        break;
+    }
+
+    foreach ($documents as $document) {
         yield $document;
     }

-    if (count($resultSet->getDocuments()) < $batchSize) {
+    if ($count < $batchSize) {
         break;
     }

-    $query->setSearchAfter($document->getSort());
+    $lastDocument = $documents[$count - 1];
+    $query->setSearchAfter($lastDocument->getSort());
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Index.php` around lines 566 - 576, The loop in each() repeatedly calls
$resultSet->getDocuments() and reuses $document from the inner foreach after the
loop; fix by caching the documents once per iteration: assign $docs =
$resultSet->getDocuments(), iterate over $docs to yield each document, use
count($docs) for the batchSize check, and compute the last document with $last =
end($docs) (or equivalent) then call $query->setSearchAfter($last->getSort());
update references to $resultSet->getDocuments(), $document, $batchSize and
$query->setSearchAfter accordingly.
tests/IndexTest.php (1)

861-921: Add guard-path tests for iterator validation branches.

Happy-path coverage is good; consider adding cases for missing sort and batchSize < 1 so both exception branches in Index::each() / Index::batch() are protected.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/IndexTest.php` around lines 861 - 921, Add negative-path unit tests to
cover the validation branches in Index::each() and Index::batch(): create tests
that call $index->each($query, $batchSize) and $index->batch($query, $batchSize)
with (a) a Query that has no sort set and (b) an invalid batchSize < 1 (e.g. 0
or -1), and assert the appropriate exceptions (e.g. InvalidArgumentException)
are thrown. Reference the existing test methods testIterateEach and
testIterateBatch as templates—reuse _createIndex(), addDocuments(), refresh(),
and construct Query objects—then use PHPUnit's expectException to verify both
guard paths in each() and batch() are covered.
src/Document.php (1)

244-247: Make missing sort handling explicit in getSort().

At Line 246, getParam('sort') throws if the document has no sort metadata. Consider a guard and a clearer exception message so failures are explicit for non-search-hit documents.

♻️ Suggested tweak
 public function getSort(): array
 {
+    if (!$this->hasParam('sort')) {
+        throw new InvalidException('Sort values are not available on this document.');
+    }
+
     return $this->getParam('sort');
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Document.php` around lines 244 - 247, The getSort() method currently
calls getParam('sort') which will throw a generic error when sort metadata is
absent; add an explicit guard in Document::getSort (use $this->hasParam('sort')
or equivalent) and if missing throw a clear RuntimeException (e.g. "Document has
no sort metadata; getSort is only valid for search-hit documents") or otherwise
return the sort as an array (cast to array) so callers get an explicit,
informative failure or a consistent type; update references to getParam('sort')
accordingly.
src/Query.php (1)

489-493: Consider validating non-empty search_after input at setter time for improved fail-fast behavior.

Line 491 accepts any array without validation. While Elasticsearch will reject an empty search_after at request time, validating at the setter provides clearer early feedback during development.

Proposed guard
 public function setSearchAfter(array $searchAfter): self
 {
-    $this->setParam('search_after', $searchAfter);
-
-    return $this;
+    if ([] === $searchAfter) {
+        throw new InvalidException('Search after values must not be empty.');
+    }
+
+    return $this->setParam('search_after', $searchAfter);
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Query.php` around lines 489 - 493, The setSearchAfter method currently
allows an empty array which Elasticsearch will later reject; add a guard at the
start of setSearchAfter(array $searchAfter): self to validate that $searchAfter
is not empty (e.g., if (empty($searchAfter)) throw new
InvalidArgumentException('search_after must be a non-empty array')); then call
$this->setParam('search_after', $searchAfter) and return $this; this provides
immediate, fail-fast feedback in the setSearchAfter method.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Index.php`:
- Around line 560-562: Both each() and batch() perform search_after iteration
but only validate presence of 'sort'; add a check that 'from' is either absent
or zero and throw InvalidException if it's > 0. In the Index.php methods where
you currently check if (!$query->hasParam('sort')) (refer to the existing check
in each() and the analogous block around the 598–600 region), inspect
$query->getParam('from', 0) (or cast/int) and if that value !== 0 throw
InvalidException('When using "search_after" iteration, "from" must be 0 or
omitted.'); to prevent invalid pagination.

---

Nitpick comments:
In `@src/Document.php`:
- Around line 244-247: The getSort() method currently calls getParam('sort')
which will throw a generic error when sort metadata is absent; add an explicit
guard in Document::getSort (use $this->hasParam('sort') or equivalent) and if
missing throw a clear RuntimeException (e.g. "Document has no sort metadata;
getSort is only valid for search-hit documents") or otherwise return the sort as
an array (cast to array) so callers get an explicit, informative failure or a
consistent type; update references to getParam('sort') accordingly.

In `@src/Index.php`:
- Around line 566-576: The loop in each() repeatedly calls
$resultSet->getDocuments() and reuses $document from the inner foreach after the
loop; fix by caching the documents once per iteration: assign $docs =
$resultSet->getDocuments(), iterate over $docs to yield each document, use
count($docs) for the batchSize check, and compute the last document with $last =
end($docs) (or equivalent) then call $query->setSearchAfter($last->getSort());
update references to $resultSet->getDocuments(), $document, $batchSize and
$query->setSearchAfter accordingly.

In `@src/Query.php`:
- Around line 489-493: The setSearchAfter method currently allows an empty array
which Elasticsearch will later reject; add a guard at the start of
setSearchAfter(array $searchAfter): self to validate that $searchAfter is not
empty (e.g., if (empty($searchAfter)) throw new
InvalidArgumentException('search_after must be a non-empty array')); then call
$this->setParam('search_after', $searchAfter) and return $this; this provides
immediate, fail-fast feedback in the setSearchAfter method.

In `@tests/IndexTest.php`:
- Around line 861-921: Add negative-path unit tests to cover the validation
branches in Index::each() and Index::batch(): create tests that call
$index->each($query, $batchSize) and $index->batch($query, $batchSize) with (a)
a Query that has no sort set and (b) an invalid batchSize < 1 (e.g. 0 or -1),
and assert the appropriate exceptions (e.g. InvalidArgumentException) are
thrown. Reference the existing test methods testIterateEach and testIterateBatch
as templates—reuse _createIndex(), addDocuments(), refresh(), and construct
Query objects—then use PHPUnit's expectException to verify both guard paths in
each() and batch() are covered.

In `@tests/QueryTest.php`:
- Around line 652-663: Remove the no-op call $lastDocument->getParam('sort');
and instead assert that the second page continues from the last document by
using $lastDocument->getSort() to setSearchAfter on $query (via
$query->setSearchAfter($lastDocument->getSort())), then after
$secondPageResultSet = $index->search($query) and $documents =
$secondPageResultSet->getDocuments() keep the assertCount(1, $documents) and add
an assertion that the returned document's id equals the expected id (e.g.
compare $documents[0]->getId() to the known next document id or to a stored
expected value), ensuring continuity rather than only count.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e82fd2c3-027b-42c8-b17f-e671a04b11c0

📥 Commits

Reviewing files that changed from the base of the PR and between bc9eeaf and b466e82.

📒 Files selected for processing (6)
  • CHANGELOG.md
  • src/Document.php
  • src/Index.php
  • src/Query.php
  • tests/IndexTest.php
  • tests/QueryTest.php

Comment thread src/Index.php Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/Index.php (2)

598-600: Use end() instead of array_pop() to avoid unnecessary mutation.

array_pop() modifies the array to retrieve the last element. Since the array isn't used after this point, end() achieves the same result without mutation and better conveys intent.

♻️ Suggested refactor
-            $lastDocument = array_pop($documents);
-
-            $query->setSearchAfter($lastDocument->getSort());
+            $query->setSearchAfter(end($documents)->getSort());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Index.php` around lines 598 - 600, The code currently uses
array_pop($documents) which mutates $documents to get the last element; replace
that with $lastDocument = end($documents) to avoid modifying the array (keep
intent clear) and then call $query->setSearchAfter($lastDocument->getSort());
ensure $lastDocument is checked for falsy (e.g., when $documents is empty)
before calling getSort() to avoid errors; update references to $lastDocument and
$documents accordingly in the surrounding scope (look for the array_pop usage
around the $lastDocument variable and the $query->setSearchAfter call).

556-567: Consider caching getDocuments() result to avoid redundant calls.

$resultSet->getDocuments() is called twice per iteration (once in foreach, once in count). Storing the result in a variable improves clarity and avoids potential redundant computation.

♻️ Suggested refactor
     while (true) {
         $resultSet = $this->search($query, $options);
-        foreach ($resultSet->getDocuments() as $document) {
+        $documents = $resultSet->getDocuments();
+        foreach ($documents as $document) {
             yield $document;
         }

-        if (count($resultSet->getDocuments()) < $batchSize) {
+        if (count($documents) < $batchSize) {
             break;
         }

         $query->setSearchAfter($document->getSort());
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Index.php` around lines 556 - 567, Cache $resultSet->getDocuments() into
a local variable (e.g. $documents = $resultSet->getDocuments()) before iterating
so you only call getDocuments() once; iterate over $documents in the foreach,
use count($documents) for the batch-size check, and ensure you set the
search-after using the last yielded document’s sort value (either by using
end($documents)->getSort() or by tracking the last $document inside the foreach)
when calling $query->setSearchAfter(...).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/Index.php`:
- Around line 598-600: The code currently uses array_pop($documents) which
mutates $documents to get the last element; replace that with $lastDocument =
end($documents) to avoid modifying the array (keep intent clear) and then call
$query->setSearchAfter($lastDocument->getSort()); ensure $lastDocument is
checked for falsy (e.g., when $documents is empty) before calling getSort() to
avoid errors; update references to $lastDocument and $documents accordingly in
the surrounding scope (look for the array_pop usage around the $lastDocument
variable and the $query->setSearchAfter call).
- Around line 556-567: Cache $resultSet->getDocuments() into a local variable
(e.g. $documents = $resultSet->getDocuments()) before iterating so you only call
getDocuments() once; iterate over $documents in the foreach, use
count($documents) for the batch-size check, and ensure you set the search-after
using the last yielded document’s sort value (either by using
end($documents)->getSort() or by tracking the last $document inside the foreach)
when calling $query->setSearchAfter(...).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f14c0d53-a33b-4ed9-b471-de59bc07d74d

📥 Commits

Reviewing files that changed from the base of the PR and between b466e82 and d350c9c.

📒 Files selected for processing (1)
  • src/Index.php

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Search After. With an iterator ?

1 participant