Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
2e43e14
[Cosmos Spark] Surface empty change feed pages to avoid end-to-end ti…
tvaron3 May 26, 2026
8616057
Add PR link to CHANGELOG entries
tvaron3 May 26, 2026
08f690b
Restore noChanges short-circuit in isFullyDrained when emptyPagesAllo…
tvaron3 May 26, 2026
48a9e99
Address pass-3 review: defense-in-depth on NO_RETRY + DRY cleanup
tvaron3 May 27, 2026
0304dfe
Address pass-4 review: lock contract on data-page non-termination
tvaron3 May 27, 2026
34d3da4
Shrink bridge accessor surface; reclassify CHANGELOG entry
tvaron3 May 27, 2026
255bad9
Address pass-5 review (Kushagra + Annie): drift hazard, test coverage…
tvaron3 May 28, 2026
5683b08
De-flake CosmosContainerChangeFeedTest.asyncChangeFeedPrefetching
tvaron3 May 28, 2026
635d86d
change
xinlian12 May 28, 2026
0fd65d3
add one more change
xinlian12 May 28, 2026
277db10
Fix asyncChangeFeedPrefetching FULL_FIDELITY: insert AFTER subscribe
tvaron3 May 29, 2026
2e62c35
Merge pull request #1 from xinlian12/tvaron3/spark-allow-empty-pages
tvaron3 May 29, 2026
7fa6a1a
Address @xinlian12 review: rename changefeed flag, fix broken test, a…
tvaron3 May 29, 2026
c7478c9
Stabilize asyncChangeFeedPrefetching on Windows EmulatorTcp
tvaron3 May 29, 2026
578ddc8
Revert asyncChangeFeedPrefetching to original Thread.sleep shape + ad…
tvaron3 May 29, 2026
098f378
Remove stale TODO-style comment on surfaceOrRetryNoChangesPage
tvaron3 May 29, 2026
87a90bb
Merge branch 'main' into tvaron3/spark-allow-empty-pages
tvaron3 May 29, 2026
e8804d2
Merge remote-tracking branch 'upstream/main' into tvaron3/spark-allow…
tvaron3 Jun 2, 2026
3985e2f
Revert change-feed half; keep query-path empty-pages fix and CF optio…
tvaron3 Jun 4, 2026
92c1e94
Merge customOptions in inheritNonContinuationFieldsFrom to preserve t…
tvaron3 Jun 5, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions sdk/cosmos/azure-cosmos-spark_3-3_2-12/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

#### Bugs Fixed
* Improved partition planning performance for change feed with large number of feed ranges. - See [PR 49086](https://github.com/Azure/azure-sdk-for-java/pull/49086)
* Fixed `OperationCancelledException` ("End-to-end timeout hit when trying to retrieve the next page") for query and change feed reads against workloads that produce long runs of empty pages (for example a cross-partition query that effectively performs a full-table scan and returns only a few documents). The connector now opts into the SDK's `emptyPagesAllowed` behavior so the per-page end-to-end timeout applies to each individual page rather than being exceeded by serial empty-page drains.

#### Other Changes

Expand Down
1 change: 1 addition & 0 deletions sdk/cosmos/azure-cosmos-spark_3-4_2-12/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

#### Bugs Fixed
* Improved partition planning performance for change feed with large number of feed ranges. - See [PR 49086](https://github.com/Azure/azure-sdk-for-java/pull/49086)
* Fixed `OperationCancelledException` ("End-to-end timeout hit when trying to retrieve the next page") for query and change feed reads against workloads that produce long runs of empty pages (for example a cross-partition query that effectively performs a full-table scan and returns only a few documents). The connector now opts into the SDK's `emptyPagesAllowed` behavior so the per-page end-to-end timeout applies to each individual page rather than being exceeded by serial empty-page drains.

#### Other Changes

Expand Down
1 change: 1 addition & 0 deletions sdk/cosmos/azure-cosmos-spark_3-5_2-12/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

#### Bugs Fixed
* Improved partition planning performance for change feed with large number of feed ranges. - See [PR 49086](https://github.com/Azure/azure-sdk-for-java/pull/49086)
* Fixed `OperationCancelledException` ("End-to-end timeout hit when trying to retrieve the next page") for query and change feed reads against workloads that produce long runs of empty pages (for example a cross-partition query that effectively performs a full-table scan and returns only a few documents). The connector now opts into the SDK's `emptyPagesAllowed` behavior so the per-page end-to-end timeout applies to each individual page rather than being exceeded by serial empty-page drains.

#### Other Changes

Expand Down
1 change: 1 addition & 0 deletions sdk/cosmos/azure-cosmos-spark_3-5_2-13/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

#### Bugs Fixed
* Improved partition planning performance for change feed with large number of feed ranges. - See [PR 49086](https://github.com/Azure/azure-sdk-for-java/pull/49086)
* Fixed `OperationCancelledException` ("End-to-end timeout hit when trying to retrieve the next page") for query and change feed reads against workloads that produce long runs of empty pages (for example a cross-partition query that effectively performs a full-table scan and returns only a few documents). The connector now opts into the SDK's `emptyPagesAllowed` behavior so the per-page end-to-end timeout applies to each individual page rather than being exceeded by serial empty-page drains.

#### Other Changes

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,12 @@ private case class ChangeFeedPartitionReader
.setEndLSN(options, this.partition.endLsn.get)
}

// Bubble empty pages up to the iterator so the per-page end-to-end timeout
// applies to each individual page rather than being exceeded by serial
// empty-page drains inside ChangeFeedFetcher.
ImplementationBridgeHelpers.CosmosChangeFeedRequestOptionsHelper.getCosmosChangeFeedRequestOptionsAccessor
Comment thread
tvaron3 marked this conversation as resolved.
Outdated
.setAllowEmptyPages(options, true)

options.setCustomItemSerializer(itemDeserializer)
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,14 @@ private case class ItemsPartitionReader
.getCosmosQueryRequestOptionsAccessor
.disallowQueryPlanRetrieval(new CosmosQueryRequestOptions())

// Bubble empty pages up to the iterator so the per-page end-to-end timeout
// applies to each individual page rather than being exceeded by serial
// empty-page drains inside ParallelDocumentQueryExecutionContext.
ImplementationBridgeHelpers
.CosmosQueryRequestOptionsHelper
.getCosmosQueryRequestOptionsAccessor
.setAllowEmptyPages(queryOptions, true)

private val readConfig = CosmosReadConfig.parseCosmosReadConfig(config)
ThroughputControlHelper.populateThroughputControlGroupName(
ImplementationBridgeHelpers
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,68 @@ class TransientIOErrorsRetryingIteratorSpec extends UnitSpec with BasicLoggingTr
factoryCallCount.get shouldEqual 1
}

"TransientIOErrors" should "drain long runs of empty pages without hitting the end-to-end timeout" in {
// Regression test for the empty-page drain scenario: when the SDK is configured with
// emptyPagesAllowed=true the iterator must surface many consecutive empty
// pages without busy-waiting beyond the per-page end-to-end timeout. Even
// with hundreds of empty pages followed by real data, the iterator should
// return all real rows.
val emptyLeadingPages = 200
val realPages = 5
val totalPages = emptyLeadingPages + realPages
val iterator = new TransientIOErrorsRetryingIterator(
continuationToken => generateMockedCosmosPagedFluxWithEmptyPrefix(
continuationToken, totalPages, emptyLeadingPages),
pageSize,
1,
None,
None
)
iterator.maxRetryIntervalInMs = 5

// 2 producers (Left/Right) each emit realPages * pageSize rows
iterator.count(_ => true) shouldEqual (realPages * pageSize * 2)
}

private def generateMockedCosmosPagedFluxWithEmptyPrefix
(
continuationToken: String,
initialPageCount: Int,
leadingEmptyPageCount: Int
) = {

val leftProducer = generateFeedResponseFluxWithEmptyPrefix(
"Left", initialPageCount, leadingEmptyPageCount, Option.apply(continuationToken))
val rightProducer = generateFeedResponseFluxWithEmptyPrefix(
"Right", initialPageCount, leadingEmptyPageCount, Option.apply(continuationToken))
val toBeMerged = Array(leftProducer, rightProducer).toIterable.asJava
val mergedFlux = Flux.mergeSequential(toBeMerged, 1, 2)
UtilBridgeInternal.createCosmosPagedFlux(_ => mergedFlux)
}

private def generateFeedResponseFluxWithEmptyPrefix
(
prefix: String,
pageCount: Int,
leadingEmptyPageCount: Int,
requestContinuationToken: Option[String]
): Flux[FeedResponse[SparkRowItem]] = {

// generateFeedResponse uses documentStartIndex=-1 as the "emit an empty page" sentinel.
val emptyPageSentinel = -1
val firstDataPageStartIndex = 1

val responses = Array.range(1, pageCount + 1)
.map(i => generateFeedResponse(
prefix,
i,
if (i <= leadingEmptyPageCount) emptyPageSentinel else firstDataPageStartIndex))
.filter(response => requestContinuationToken.isEmpty ||
requestContinuationToken.get < response.getContinuationToken)

Flux.fromArray(responses)
}

private val objectMapper = new ObjectMapper

@throws[JsonProcessingException]
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.

package com.azure.cosmos.implementation;

import com.azure.cosmos.implementation.changefeed.common.ChangeFeedMode;
import com.azure.cosmos.implementation.changefeed.common.ChangeFeedStartFromInternal;
import com.azure.cosmos.implementation.changefeed.common.ChangeFeedStateV1;
import com.azure.cosmos.implementation.feedranges.FeedRangeEpkImpl;
import com.azure.cosmos.models.CosmosChangeFeedRequestOptions;
import com.azure.cosmos.models.ModelBridgeInternal;
import org.testng.annotations.Test;

import static org.assertj.core.api.Assertions.assertThat;

/**
* Unit tests for the paged-flux pull continuation path on
* {@link CosmosChangeFeedRequestOptions#withCosmosPagedFluxOptions(CosmosPagedFluxOptions)} (package-visible via
* {@link ModelBridgeInternal#getEffectiveChangeFeedRequestOptions(CosmosChangeFeedRequestOptions, CosmosPagedFluxOptions)}).
*
* <p>That method silently builds a brand-new {@code CosmosChangeFeedRequestOptionsImpl} when the caller supplies a
* continuation token via {@link CosmosPagedFluxOptions}, so any field NOT explicitly copied is dropped. These tests
* lock in the propagation of fields whose loss would silently break a feature.
*/
public class CosmosChangeFeedRequestOptionsWithPagedFluxOptionsTest {

@Test(groups = { "unit" })
public void emptyPagesAllowed_isPropagated_whenContinuationTokenSupplied() {
// GIVEN a CosmosChangeFeedRequestOptions with emptyPagesAllowed=true (the value the Spark connector sets)
CosmosChangeFeedRequestOptions src = CosmosChangeFeedRequestOptions
.createForProcessingFromBeginning(FeedRangeEpkImpl.forFullRange());
ImplementationBridgeHelpers.CosmosChangeFeedRequestOptionsHelper
.getCosmosChangeFeedRequestOptionsAccessor()
.setAllowEmptyPages(src, true);

// AND a continuation token supplied via the paged-flux pull mechanism
CosmosPagedFluxOptions pagedFluxOptions = new CosmosPagedFluxOptions();
pagedFluxOptions.setRequestContinuation(buildContinuationToken());

// WHEN computing the effective options
CosmosChangeFeedRequestOptions effective = ModelBridgeInternal
.getEffectiveChangeFeedRequestOptions(src, pagedFluxOptions);

// THEN emptyPagesAllowed must be preserved on the freshly-built impl
assertThat(ImplementationBridgeHelpers.CosmosChangeFeedRequestOptionsHelper
.getCosmosChangeFeedRequestOptionsAccessor()
.getAllowEmptyPages(effective))
.describedAs("emptyPagesAllowed must survive the paged-flux pull continuation rebuild")
.isTrue();
}

@Test(groups = { "unit" })
public void emptyPagesAllowedFalse_isPropagated_whenContinuationTokenSupplied() {
// The default value should also round-trip cleanly (sanity check that we're not just hard-coding true).
CosmosChangeFeedRequestOptions src = CosmosChangeFeedRequestOptions
.createForProcessingFromBeginning(FeedRangeEpkImpl.forFullRange());

CosmosPagedFluxOptions pagedFluxOptions = new CosmosPagedFluxOptions();
pagedFluxOptions.setRequestContinuation(buildContinuationToken());

CosmosChangeFeedRequestOptions effective = ModelBridgeInternal
.getEffectiveChangeFeedRequestOptions(src, pagedFluxOptions);

assertThat(ImplementationBridgeHelpers.CosmosChangeFeedRequestOptionsHelper
.getCosmosChangeFeedRequestOptionsAccessor()
.getAllowEmptyPages(effective))
.describedAs("emptyPagesAllowed default (false) must survive the paged-flux pull continuation rebuild")
.isFalse();
}

@Test(groups = { "unit" })
public void emptyPagesAllowed_isPreserved_whenNoContinuationTokenSupplied() {
// No continuation → withCosmosPagedFluxOptions returns `this` unchanged.
CosmosChangeFeedRequestOptions src = CosmosChangeFeedRequestOptions
.createForProcessingFromBeginning(FeedRangeEpkImpl.forFullRange());
ImplementationBridgeHelpers.CosmosChangeFeedRequestOptionsHelper
.getCosmosChangeFeedRequestOptionsAccessor()
.setAllowEmptyPages(src, true);

CosmosPagedFluxOptions pagedFluxOptions = new CosmosPagedFluxOptions();
pagedFluxOptions.setMaxItemCount(50);

CosmosChangeFeedRequestOptions effective = ModelBridgeInternal
.getEffectiveChangeFeedRequestOptions(src, pagedFluxOptions);

assertThat(ImplementationBridgeHelpers.CosmosChangeFeedRequestOptionsHelper
.getCosmosChangeFeedRequestOptionsAccessor()
.getAllowEmptyPages(effective))
.isTrue();
}

private static String buildContinuationToken() {
// Build a real ChangeFeedState so we can serialize a valid (base64-encoded) continuation token.
// We use the state's own toString() which round-trips through createForProcessingFromContinuation.
ChangeFeedStateV1 state = new ChangeFeedStateV1(
"someContainerRid",
FeedRangeEpkImpl.forFullRange(),
ChangeFeedMode.INCREMENTAL,
ChangeFeedStartFromInternal.createFromBeginning(),
null);
return state.toString();
}
}
Loading
Loading