Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Backward Compatibility Breaks
### Added
* Added support for "search after" based pagination [#1645](https://github.com/ruflin/Elastica/issues/1645)

### Changed
### Deprecated
### Removed
Expand Down
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ tools/phive.phar:

vendor/autoload.php:
# Installing Symfony Flex: parallel download of dependency libs
composer global config --no-plugins allow-plugins.symfony/flex true
composer global require --no-progress --no-scripts --no-plugins symfony/flex
composer install --prefer-dist --no-interaction ${COMPOSER_FLAGS}

Expand Down
12 changes: 12 additions & 0 deletions src/Document.php
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,18 @@ public function hasPipeline(): bool
return $this->hasParam('pipeline');
}

/**
* @see \Elastica\Query::setSearchAfter()
*
* Returns the sort position for this document, which can be used as starting offset to search next hits page.
*
* @return array
*/
public function getSort(): array
{
return $this->getParam('sort');
}

/**
* Returns the document as an array.
*/
Expand Down
76 changes: 76 additions & 0 deletions src/Index.php
Original file line number Diff line number Diff line change
Expand Up @@ -539,6 +539,82 @@ public function count($query = '', string $method = Request::POST): int
return $search->count('', false, $method);
}

/**
* Iterates over all documents, matching given query, using "search after" based pagination.
*
* @see \Elastica\Query::setSearchAfter()
*
* @param mixed $query search query with sort by unique document field.
* @param int $batchSize the number of rows to be returned in each batch (e.g. each query size).
* @param array|null $options
* @param string $method request method.
* @return \Generator|\Elastica\Document[] list of all documents matched the given query as iterator.
*/
public function each($query = '', int $batchSize = 100, ?array $options = null, string $method = Request::POST): \Generator
{
Comment on lines +553 to +554
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate $batchSize before entering the loop.

With 0, each() reaches Line 573 without ever binding $document; with negative values, both methods forward an invalid size downstream. This should fail fast.

🛡️ Proposed fix
     public function each($query = '', int $batchSize = 100, ?array $options = null, string $method = Request::POST): \Generator
     {
+        if ($batchSize < 1) {
+            throw new InvalidException('Batch size must be greater than 0.');
+        }
+
         $query = Query::create($query);
@@
     public function batch($query = '', int $batchSize = 100, ?array $options = null, string $method = Request::POST): \Generator
     {
+        if ($batchSize < 1) {
+            throw new InvalidException('Batch size must be greater than 0.');
+        }
+
         $query = Query::create($query);

Also applies to: 588-589

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Index.php` around lines 553 - 554, Validate the $batchSize argument at
the start of the each() method and fail fast: if $batchSize is less than or
equal to 0 throw an InvalidArgumentException with a clear message (e.g.,
"batchSize must be a positive integer"); do this before entering the loop or
binding $document so you never forward an invalid size or iterate with zero-size
batches. Apply the same check to the other method in this class that accepts a
$batchSize parameter (the similar batched method around the adjacent block) so
both paths validate input consistently.

$query = Query::create($query);

if (!$query->hasParam('sort')) {
throw new \LogicException('Query must have "sort" parameter in order to use "search after" based iteration.');
}

$query->setSize($batchSize);

while (true) {
$resultSet = $this->search($query, $options, $method);
foreach ($resultSet->getDocuments() as $document) {
yield $document;
}

if (count($resultSet->getDocuments()) < $batchSize) {
break;
}

$query->setSearchAfter($document->getSort());
}
}
Comment on lines +553 to +575
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid mutating the caller's Query instance.

Query::create() returns the same object when $query is already a Query, so Lines 561/573 and 596/614 permanently overwrite the caller's size and search_after. Reusing that query after each() or batch() will resume from the last page instead of the original request.

♻️ Proposed fix
-        $query = Query::create($query);
+        $query = clone Query::create($query);
@@
-        $query = Query::create($query);
+        $query = clone Query::create($query);

Also applies to: 588-615

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Index.php` around lines 553 - 575, The issue is that
Query::create($query) can return the same Query instance passed by the caller
and each() (and batch()) mutates it via setSize and setSearchAfter; clone the
Query before mutating to avoid changing the caller's object: in each() (and
batch()) call Query::create($query) into a local variable and if the original
$query is an instance of Query, replace it with a cloned copy (e.g., $query =
clone $query) before calling setSize() and setSearchAfter(); then proceed to use
that cloned local query with search() so the caller's Query remains unchanged.


/**
* Iterates over all documents in batches, matching given query, using "search after" based pagination.
*
* @see \Elastica\Query::setSearchAfter()
*
* @param mixed $query search query with sort by unique document field.
* @param int $batchSize the number of rows to be returned in each batch (e.g. each query size).
* @param array|null $options
* @param string $method request method.
* @return \Generator|\Elastica\Document[][] list of document batches matched the given query as iterator.
*/
public function batch($query = '', int $batchSize = 100, ?array $options = null, string $method = Request::POST): \Generator
{
$query = Query::create($query);

if (!$query->hasParam('sort')) {
throw new \LogicException('Query must have "sort" parameter in order to use "search after" based iteration.');
}

$query->setSize($batchSize);

while (true) {
$resultSet = $this->search($query, $options, $method);

$documents = $resultSet->getDocuments();
if (empty($documents)) {
break;
}

yield $documents;

if (count($documents) < $batchSize) {
break;
}

$lastDocument = array_pop($documents);

$query->setSearchAfter($lastDocument->getSort());
}
}

/**
* Opens an index.
*
Expand Down
18 changes: 18 additions & 0 deletions src/Query.php
Original file line number Diff line number Diff line change
Expand Up @@ -476,4 +476,22 @@ public function setTrackTotalHits($trackTotalHits = true): self

return $this->setParam('track_total_hits', $trackTotalHits);
}

/**
* @see https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html#search-after
*
* Allows retrieval of the next page of hits using sort values from previous page.
* The value of {@see \Elastica\Document::getSort()} should be passed as argument here.
*
* @param array $searchAfter the sort of the last document from previous search result set.
* @return static self reference.
*/
public function setSearchAfter(array $searchAfter): self
{
foreach ($searchAfter as $value) {
$this->addParam('search_after', $value);
}

return $this;
Comment on lines +489 to +495
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Replace search_after instead of appending it.

Line 492 uses addParam(), so every new page keeps the previous cursor values and appends the new sort tuple. Index::each() / batch() call this repeatedly, which breaks pagination as soon as a result set spans more than two pages.

🐛 Proposed fix
     public function setSearchAfter(array $searchAfter): self
     {
-        foreach ($searchAfter as $value) {
-            $this->addParam('search_after', $value);
-        }
-
-        return $this;
+        return $this->setParam('search_after', \array_values($searchAfter));
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public function setSearchAfter(array $searchAfter): self
{
foreach ($searchAfter as $value) {
$this->addParam('search_after', $value);
}
return $this;
public function setSearchAfter(array $searchAfter): self
{
return $this->setParam('search_after', \array_values($searchAfter));
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Query.php` around lines 489 - 495, setSearchAfter currently calls
addParam('search_after', $value) in a loop which appends values across pages and
breaks pagination when Index::each()/batch() call it repeatedly; change
setSearchAfter to replace the entire search_after parameter instead of appending
by setting the full tuple at once (e.g. use a single setParam/setParameter-like
call or assign $this->params['search_after'] = $searchAfter) so the
'search_after' key is overwritten with the provided array rather than
accumulating values.

}
}
67 changes: 67 additions & 0 deletions tests/IndexTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
use Elastica\Document;
use Elastica\Index;
use Elastica\Mapping;
use Elastica\Query;
use Elastica\Query\QueryString;
use Elastica\Query\SimpleQueryString;
use Elastica\Query\Term;
Expand Down Expand Up @@ -926,4 +927,70 @@ public function testGetEmptyAliases(): void

$this->assertEquals([], $index->getAliases());
}

/**
* @group functional
*/
public function testIterateEach(): void
{
$index = $this->_createIndex();
$index->setMapping(new Mapping([
'country_id' => ['type' => 'integer'],
'region_id' => ['type' => 'integer'],
]));

$index->addDocuments([
new Document('1', ['country_id' => 1, 'region_id' => 1]),
new Document('2', ['country_id' => 1, 'region_id' => 2]),
new Document('3', ['country_id' => 2, 'region_id' => 3]),
]);
$index->refresh();

$query = new Query();
$query->setSize(2);
$query->setSort([
'region_id' => 'desc',
]);

$documentIds = [];
foreach ($index->each($query, 2) as $document) {
$documentIds[] = $document->getId();
}

$this->assertEquals(['3', '2', '1'], $documentIds);
}

/**
* @group functional
*/
public function testIterateBatch(): void
{
$index = $this->_createIndex();
$index->setMapping(new Mapping([
'country_id' => ['type' => 'integer'],
'region_id' => ['type' => 'integer'],
]));

$index->addDocuments([
new Document('1', ['country_id' => 1, 'region_id' => 1]),
new Document('2', ['country_id' => 1, 'region_id' => 2]),
new Document('3', ['country_id' => 2, 'region_id' => 3]),
]);
$index->refresh();

$query = new Query();
$query->setSize(2);
$query->setSort([
'region_id' => 'desc',
]);

$documentIds = [];
foreach ($index->batch($query, 2) as $documents) {
foreach ($documents as $document) {
$documentIds[] = $document->getId();
}
}

$this->assertEquals(['3', '2', '1'], $documentIds);
}
}
42 changes: 42 additions & 0 deletions tests/QueryTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -680,4 +680,46 @@ public function testSetTrackTotalHits(): void
$this->assertEquals(25, $resultSet->getTotalHits());
$this->assertEquals('gte', $resultSet->getTotalHitsRelation());
}

/**
* @group functional
*/
public function testSearchAfter(): void
{
$index = $this->_createIndex();
$index->setMapping(new Mapping([
'country_id' => ['type' => 'integer'],
'region_id' => ['type' => 'integer'],
]));

$index->addDocuments([
new Document('1', ['country_id' => 1, 'region_id' => 1]),
new Document('2', ['country_id' => 1, 'region_id' => 2]),
new Document('3', ['country_id' => 2, 'region_id' => 3]),
]);
$index->refresh();

$query = new Query();
$query->setSize(2);
$query->setSort([
'country_id' => 'desc',
'region_id' => 'asc',
]);
$firstPageResultSet = $index->search($query);

$documents = $firstPageResultSet->getDocuments();
$this->assertCount(2, $documents);

/** @var Document $lastDocument */
$lastDocument = array_pop($documents);
$lastDocument->getParam('sort');

$this->assertNotEmpty($lastDocument->getSort());

$query->setSearchAfter($lastDocument->getSort());

$secondPageResultSet = $index->search($query);
$documents = $secondPageResultSet->getDocuments();
$this->assertCount(1, $documents);
}
}