Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
2e43e14
[Cosmos Spark] Surface empty change feed pages to avoid end-to-end ti…
tvaron3 May 26, 2026
8616057
Add PR link to CHANGELOG entries
tvaron3 May 26, 2026
08f690b
Restore noChanges short-circuit in isFullyDrained when emptyPagesAllo…
tvaron3 May 26, 2026
48a9e99
Address pass-3 review: defense-in-depth on NO_RETRY + DRY cleanup
tvaron3 May 27, 2026
0304dfe
Address pass-4 review: lock contract on data-page non-termination
tvaron3 May 27, 2026
34d3da4
Shrink bridge accessor surface; reclassify CHANGELOG entry
tvaron3 May 27, 2026
255bad9
Address pass-5 review (Kushagra + Annie): drift hazard, test coverage…
tvaron3 May 28, 2026
5683b08
De-flake CosmosContainerChangeFeedTest.asyncChangeFeedPrefetching
tvaron3 May 28, 2026
635d86d
change
xinlian12 May 28, 2026
0fd65d3
add one more change
xinlian12 May 28, 2026
277db10
Fix asyncChangeFeedPrefetching FULL_FIDELITY: insert AFTER subscribe
tvaron3 May 29, 2026
2e62c35
Merge pull request #1 from xinlian12/tvaron3/spark-allow-empty-pages
tvaron3 May 29, 2026
7fa6a1a
Address @xinlian12 review: rename changefeed flag, fix broken test, a…
tvaron3 May 29, 2026
c7478c9
Stabilize asyncChangeFeedPrefetching on Windows EmulatorTcp
tvaron3 May 29, 2026
578ddc8
Revert asyncChangeFeedPrefetching to original Thread.sleep shape + ad…
tvaron3 May 29, 2026
098f378
Remove stale TODO-style comment on surfaceOrRetryNoChangesPage
tvaron3 May 29, 2026
87a90bb
Merge branch 'main' into tvaron3/spark-allow-empty-pages
tvaron3 May 29, 2026
e8804d2
Merge remote-tracking branch 'upstream/main' into tvaron3/spark-allow…
tvaron3 Jun 2, 2026
3985e2f
Revert change-feed half; keep query-path empty-pages fix and CF optio…
tvaron3 Jun 4, 2026
92c1e94
Merge customOptions in inheritNonContinuationFieldsFrom to preserve t…
tvaron3 Jun 5, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions sdk/cosmos/azure-cosmos-spark_3-3_2-12/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
#### Bugs Fixed
* Improved partition planning performance for change feed with large number of feed ranges. - See [PR 49086](https://github.com/Azure/azure-sdk-for-java/pull/49086)
* Fixed `UnsupportedOperationException` when using `readManyByPartitionKeys` for empty pages. - See [PR 49311](https://github.com/Azure/azure-sdk-for-java/pull/49311)
* Fixed `OperationCancelledException` ("End-to-end timeout hit") on sparse cross-partition queries by opting into the SDK's `allowEmptyPages` behavior, so the per-page timeout applies per page instead of being exceeded by serial empty-page drains. Note: this surfaces one iterator callback per empty page where previously a single callback could drain many. - See [PR 49276](https://github.com/Azure/azure-sdk-for-java/pull/49276)

#### Other Changes

Expand Down
1 change: 1 addition & 0 deletions sdk/cosmos/azure-cosmos-spark_3-4_2-12/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
#### Bugs Fixed
* Improved partition planning performance for change feed with large number of feed ranges. - See [PR 49086](https://github.com/Azure/azure-sdk-for-java/pull/49086)
* Fixed `UnsupportedOperationException` when using `readManyByPartitionKeys` for empty pages. - See [PR 49311](https://github.com/Azure/azure-sdk-for-java/pull/49311)
* Fixed `OperationCancelledException` ("End-to-end timeout hit") on sparse cross-partition queries by opting into the SDK's `allowEmptyPages` behavior, so the per-page timeout applies per page instead of being exceeded by serial empty-page drains. Note: this surfaces one iterator callback per empty page where previously a single callback could drain many. - See [PR 49276](https://github.com/Azure/azure-sdk-for-java/pull/49276)

#### Other Changes

Expand Down
1 change: 1 addition & 0 deletions sdk/cosmos/azure-cosmos-spark_3-5_2-12/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
#### Bugs Fixed
* Improved partition planning performance for change feed with large number of feed ranges. - See [PR 49086](https://github.com/Azure/azure-sdk-for-java/pull/49086)
* Fixed `UnsupportedOperationException` when using `readManyByPartitionKeys` for empty pages. - See [PR 49311](https://github.com/Azure/azure-sdk-for-java/pull/49311)
* Fixed `OperationCancelledException` ("End-to-end timeout hit") on sparse cross-partition queries by opting into the SDK's `allowEmptyPages` behavior, so the per-page timeout applies per page instead of being exceeded by serial empty-page drains. Note: this surfaces one iterator callback per empty page where previously a single callback could drain many. - See [PR 49276](https://github.com/Azure/azure-sdk-for-java/pull/49276)

#### Other Changes

Expand Down
1 change: 1 addition & 0 deletions sdk/cosmos/azure-cosmos-spark_3-5_2-13/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
#### Bugs Fixed
* Improved partition planning performance for change feed with large number of feed ranges. - See [PR 49086](https://github.com/Azure/azure-sdk-for-java/pull/49086)
* Fixed `UnsupportedOperationException` when using `readManyByPartitionKeys` for empty pages. - See [PR 49311](https://github.com/Azure/azure-sdk-for-java/pull/49311)
* Fixed `OperationCancelledException` ("End-to-end timeout hit") on sparse cross-partition queries by opting into the SDK's `allowEmptyPages` behavior, so the per-page timeout applies per page instead of being exceeded by serial empty-page drains. Note: this surfaces one iterator callback per empty page where previously a single callback could drain many. - See [PR 49276](https://github.com/Azure/azure-sdk-for-java/pull/49276)

#### Other Changes

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,14 @@ private case class ItemsPartitionReader
.getCosmosQueryRequestOptionsAccessor
.disallowQueryPlanRetrieval(new CosmosQueryRequestOptions())

// Bubble empty pages up to the iterator so the per-page end-to-end timeout
// applies to each individual page rather than being exceeded by serial
// empty-page drains inside ParallelDocumentQueryExecutionContext.
ImplementationBridgeHelpers
.CosmosQueryRequestOptionsHelper
.getCosmosQueryRequestOptionsAccessor
.setAllowEmptyPages(queryOptions, true)

private val readConfig = CosmosReadConfig.parseCosmosReadConfig(config)
ThroughputControlHelper.populateThroughputControlGroupName(
ImplementationBridgeHelpers
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,68 @@ class TransientIOErrorsRetryingIteratorSpec extends UnitSpec with BasicLoggingTr
factoryCallCount.get shouldEqual 1
}

"TransientIOErrors" should "drain long runs of empty pages without hitting the end-to-end timeout" in {
// Regression test for the empty-page drain scenario: when the SDK is configured with
// emptyPagesAllowed=true the iterator must surface many consecutive empty
// pages without busy-waiting beyond the per-page end-to-end timeout. Even
// with hundreds of empty pages followed by real data, the iterator should
// return all real rows.
val emptyLeadingPages = 200
val realPages = 5
val totalPages = emptyLeadingPages + realPages
val iterator = new TransientIOErrorsRetryingIterator(
continuationToken => generateMockedCosmosPagedFluxWithEmptyPrefix(
continuationToken, totalPages, emptyLeadingPages),
pageSize,
1,
None,
None
)
iterator.maxRetryIntervalInMs = 5

// 2 producers (Left/Right) each emit realPages * pageSize rows
iterator.count(_ => true) shouldEqual (realPages * pageSize * 2)
}

private def generateMockedCosmosPagedFluxWithEmptyPrefix
(
continuationToken: String,
initialPageCount: Int,
leadingEmptyPageCount: Int
) = {

val leftProducer = generateFeedResponseFluxWithEmptyPrefix(
"Left", initialPageCount, leadingEmptyPageCount, Option.apply(continuationToken))
val rightProducer = generateFeedResponseFluxWithEmptyPrefix(
"Right", initialPageCount, leadingEmptyPageCount, Option.apply(continuationToken))
val toBeMerged = Array(leftProducer, rightProducer).toIterable.asJava
val mergedFlux = Flux.mergeSequential(toBeMerged, 1, 2)
UtilBridgeInternal.createCosmosPagedFlux(_ => mergedFlux)
}

private def generateFeedResponseFluxWithEmptyPrefix
(
prefix: String,
pageCount: Int,
leadingEmptyPageCount: Int,
requestContinuationToken: Option[String]
): Flux[FeedResponse[SparkRowItem]] = {

// generateFeedResponse uses documentStartIndex=-1 as the "emit an empty page" sentinel.
val emptyPageSentinel = -1
val firstDataPageStartIndex = 1

val responses = Array.range(1, pageCount + 1)
.map(i => generateFeedResponse(
prefix,
i,
if (i <= leadingEmptyPageCount) emptyPageSentinel else firstDataPageStartIndex))
.filter(response => requestContinuationToken.isEmpty ||
requestContinuationToken.get < response.getContinuationToken)

Flux.fromArray(responses)
}

private val objectMapper = new ObjectMapper

@throws[JsonProcessingException]
Expand Down
1 change: 1 addition & 0 deletions sdk/cosmos/azure-cosmos-spark_4-0_2-13/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
#### Bugs Fixed
* Improved partition planning performance for change feed with large number of feed ranges. - See [PR 49086](https://github.com/Azure/azure-sdk-for-java/pull/49086)
* Fixed `UnsupportedOperationException` when using `readManyByPartitionKeys` for empty pages. - See [PR 49311](https://github.com/Azure/azure-sdk-for-java/pull/49311)
* Fixed `OperationCancelledException` ("End-to-end timeout hit") on sparse cross-partition queries by opting into the SDK's `allowEmptyPages` behavior, so the per-page timeout applies per page instead of being exceeded by serial empty-page drains. Note: this surfaces one iterator callback per empty page where previously a single callback could drain many. - See [PR 49276](https://github.com/Azure/azure-sdk-for-java/pull/49276)

#### Other Changes

Expand Down
1 change: 1 addition & 0 deletions sdk/cosmos/azure-cosmos-spark_4-1_2-13/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
#### Bugs Fixed
* Improved partition planning performance for change feed with large number of feed ranges. - See [PR 49086](https://github.com/Azure/azure-sdk-for-java/pull/49086)
* Fixed `UnsupportedOperationException` when using `readManyByPartitionKeys` for empty pages. - See [PR 49311](https://github.com/Azure/azure-sdk-for-java/pull/49311)
* Fixed `OperationCancelledException` ("End-to-end timeout hit") on sparse cross-partition queries by opting into the SDK's `allowEmptyPages` behavior, so the per-page timeout applies per page instead of being exceeded by serial empty-page drains. Note: this surfaces one iterator callback per empty page where previously a single callback could drain many. - See [PR 49276](https://github.com/Azure/azure-sdk-for-java/pull/49276)

#### Other Changes

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -332,8 +332,14 @@ public void asyncChangeFeed_fromBeginning_incremental_forLogicalPartition() thro
}
}

@Test(groups = { "emulator" }, dataProvider = "changeFeedQueryPrefetchingDataProvider", timeOut = TIMEOUT)
@Test(groups = { "emulator" }, dataProvider = "changeFeedQueryPrefetchingDataProvider",
timeOut = TIMEOUT, retryAnalyzer = FlakyTestRetryAnalyzer.class)
public void asyncChangeFeedPrefetching(ChangeFeedMode changeFeedMode) throws Exception {
// Note on shape: this test verifies Reactor's prefetch behavior on the change-feed
// byPage stream. The two fire-and-forget `.subscribe()` calls + `Thread.sleep(3000)`
// are intentional — they exercise the prefetch path without backpressure-bounded
// collection. retryAnalyzer = FlakyTestRetryAnalyzer absorbs occasional slow-runner
// jitter (Windows EmulatorTcp Java 8 can take >3s to deliver the first 3 pages).
this.createContainer(
(cp) -> {
if (changeFeedMode.equals(ChangeFeedMode.INCREMENTAL)) {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.

package com.azure.cosmos.implementation;

import com.azure.cosmos.CosmosItemSerializer;
import com.azure.cosmos.implementation.changefeed.common.ChangeFeedMode;
import com.azure.cosmos.implementation.changefeed.common.ChangeFeedStartFromInternal;
import com.azure.cosmos.implementation.changefeed.common.ChangeFeedStateV1;
import com.azure.cosmos.implementation.feedranges.FeedRangeEpkImpl;
import com.azure.cosmos.models.CosmosChangeFeedRequestOptions;
import com.azure.cosmos.models.ModelBridgeInternal;
import org.testng.annotations.Test;

import static org.assertj.core.api.Assertions.assertThat;

/**
* Unit tests for the paged-flux pull continuation path on
* {@link CosmosChangeFeedRequestOptions#withCosmosPagedFluxOptions(CosmosPagedFluxOptions)} (package-visible via
* {@link ModelBridgeInternal#getEffectiveChangeFeedRequestOptions(CosmosChangeFeedRequestOptions, CosmosPagedFluxOptions)}).
*
* <p>That method silently builds a brand-new {@code CosmosChangeFeedRequestOptionsImpl} when the caller supplies a
* continuation token via {@link CosmosPagedFluxOptions}, so any field NOT explicitly copied is dropped. These tests
* lock in the propagation of fields whose loss would silently break a feature.
*/
public class CosmosChangeFeedRequestOptionsWithPagedFluxOptionsTest {

@Test(groups = { "unit" })
public void endLSN_isPropagated_whenContinuationTokenSupplied() {
// Locks in the bounded-change-feed contract across a byPage(savedContinuation) round-trip:
// a caller who set endLSN=42 must continue to see iteration bounded by LSN 42 after resume.
// Before the inheritNonContinuationFieldsFrom fix, endLSN was silently dropped on the rebuild
// path, turning a bounded change feed into an unbounded one.
CosmosChangeFeedRequestOptions src = CosmosChangeFeedRequestOptions
.createForProcessingFromBeginning(FeedRangeEpkImpl.forFullRange());
ImplementationBridgeHelpers.CosmosChangeFeedRequestOptionsHelper
.getCosmosChangeFeedRequestOptionsAccessor()
.setEndLSN(src, 42L);

CosmosPagedFluxOptions pagedFluxOptions = new CosmosPagedFluxOptions();
pagedFluxOptions.setRequestContinuation(buildContinuationToken());

CosmosChangeFeedRequestOptions effective = ModelBridgeInternal
.getEffectiveChangeFeedRequestOptions(src, pagedFluxOptions);

assertThat(ImplementationBridgeHelpers.CosmosChangeFeedRequestOptionsHelper
.getCosmosChangeFeedRequestOptionsAccessor()
.getEndLSN(effective))
.describedAs("endLSN must survive the paged-flux pull continuation rebuild")
.isEqualTo(42L);
}

@Test(groups = { "unit" })
public void customSerializer_isPropagated_whenContinuationTokenSupplied() {
// Locks in custom-serializer preservation across a byPage(savedContinuation) round-trip:
// a caller who registered a custom CosmosItemSerializer must continue to see items
// deserialized through that serializer after resume. Before the inheritNonContinuationFieldsFrom
// fix, the customSerializer was silently dropped on the rebuild path, falling back to the
// SDK's internal default serializer and potentially producing wrong field values.
CosmosItemSerializer sentinel = new CosmosItemSerializer() {
@Override
public <T> java.util.Map<String, Object> serialize(T item) { return null; }

@Override
public <T> T deserialize(java.util.Map<String, Object> jsonNodeMap, Class<T> classType) { return null; }
};
CosmosChangeFeedRequestOptions src = CosmosChangeFeedRequestOptions
.createForProcessingFromBeginning(FeedRangeEpkImpl.forFullRange());
src.setCustomItemSerializer(sentinel);

CosmosPagedFluxOptions pagedFluxOptions = new CosmosPagedFluxOptions();
pagedFluxOptions.setRequestContinuation(buildContinuationToken());

CosmosChangeFeedRequestOptions effective = ModelBridgeInternal
.getEffectiveChangeFeedRequestOptions(src, pagedFluxOptions);

assertThat(effective.getCustomItemSerializer())
.describedAs("customSerializer must survive the paged-flux pull continuation rebuild")
.isSameAs(sentinel);
}

@Test(groups = { "unit" })
public void tokenEncodedFields_overrideCallerSuppliedValues_whenContinuationTokenSupplied() {
// Negative pin: the four token-encoded fields (continuationState, feedRangeInternal, mode,
// startFromInternal) MUST come from the token, not from the caller's pre-resume options.
// The caller's options here have continuationState=null (createForProcessingFromBeginning),
// but the resulting effective options must have a non-null continuationState parsed from
// the supplied token. If a future refactor accidentally inherits the token-encoded fields
// from the source impl (e.g. moving them into inheritNonContinuationFieldsFrom), this test
// catches the regression because the source's continuationState would clobber the token's.
CosmosChangeFeedRequestOptions src = CosmosChangeFeedRequestOptions
.createForProcessingFromBeginning(FeedRangeEpkImpl.forFullRange());

CosmosPagedFluxOptions pagedFluxOptions = new CosmosPagedFluxOptions();
pagedFluxOptions.setRequestContinuation(buildContinuationToken());

CosmosChangeFeedRequestOptions effective = ModelBridgeInternal
.getEffectiveChangeFeedRequestOptions(src, pagedFluxOptions);

assertThat(ImplementationBridgeHelpers.CosmosChangeFeedRequestOptionsHelper
.getCosmosChangeFeedRequestOptionsAccessor()
.getImpl(effective)
.getContinuation())
.describedAs("continuationState is encoded in the token and MUST override the caller's pre-resume value")
.isNotNull();
}

@Test(groups = { "unit" })
public void fullFidelityWireFormatHeader_isPreserved_whenSourceHasNoCustomHeaders() {
// Reviewer-found bug: inheritNonContinuationFieldsFrom used to do
// `this.customOptions = source.customOptions`, which clobbered the
// CHANGE_FEED_WIRE_FORMAT_VERSION header set by the constructor when mode==FULL_FIDELITY.
// If the source's customOptions is null (typical for callers who only set high-level
// options), the resume would produce a FULL_FIDELITY request WITHOUT the required wire
// format header. The merge-don't-clobber fix preserves token-mode-derived headers.
CosmosChangeFeedRequestOptions src = CosmosChangeFeedRequestOptions
.createForProcessingFromNow(FeedRangeEpkImpl.forFullRange());

CosmosPagedFluxOptions pagedFluxOptions = new CosmosPagedFluxOptions();
pagedFluxOptions.setRequestContinuation(buildFullFidelityContinuationToken());

CosmosChangeFeedRequestOptions effective = ModelBridgeInternal
.getEffectiveChangeFeedRequestOptions(src, pagedFluxOptions);

java.util.Map<String, String> headers = ImplementationBridgeHelpers
.CosmosChangeFeedRequestOptionsHelper
.getCosmosChangeFeedRequestOptionsAccessor()
.getHeaders(effective);

assertThat(headers)
.describedAs("token-derived FULL_FIDELITY wire format header must survive the rebuild")
.isNotNull()
.containsEntry(
HttpConstants.HttpHeaders.CHANGE_FEED_WIRE_FORMAT_VERSION,
HttpConstants.ChangeFeedWireFormatVersions.SEPARATE_METADATA_WITH_CRTS);
}

@Test(groups = { "unit" })
public void callerSuppliedCustomHeaders_areMergedWith_tokenDerivedHeaders() {
// Companion to the above: when the source HAS its own custom headers AND the token's
// mode triggers constructor-set headers, both must coexist after inherit. The merge
// semantics: token-mode headers win on key collision; source headers are added otherwise.
CosmosChangeFeedRequestOptions src = CosmosChangeFeedRequestOptions
.createForProcessingFromNow(FeedRangeEpkImpl.forFullRange());
ImplementationBridgeHelpers.CosmosChangeFeedRequestOptionsHelper
.getCosmosChangeFeedRequestOptionsAccessor()
.setHeader(src, "X-Caller-Header", "caller-value");

CosmosPagedFluxOptions pagedFluxOptions = new CosmosPagedFluxOptions();
pagedFluxOptions.setRequestContinuation(buildFullFidelityContinuationToken());

CosmosChangeFeedRequestOptions effective = ModelBridgeInternal
.getEffectiveChangeFeedRequestOptions(src, pagedFluxOptions);

java.util.Map<String, String> headers = ImplementationBridgeHelpers
.CosmosChangeFeedRequestOptionsHelper
.getCosmosChangeFeedRequestOptionsAccessor()
.getHeaders(effective);

assertThat(headers)
.describedAs("both caller-supplied and token-mode-derived headers must be present")
.containsEntry("X-Caller-Header", "caller-value")
.containsEntry(
HttpConstants.HttpHeaders.CHANGE_FEED_WIRE_FORMAT_VERSION,
HttpConstants.ChangeFeedWireFormatVersions.SEPARATE_METADATA_WITH_CRTS);
}

private static String buildContinuationToken() {
// Build a real ChangeFeedState so we can serialize a valid (base64-encoded) continuation token.
// We use the state's own toString() which round-trips through createForProcessingFromContinuation.
ChangeFeedStateV1 state = new ChangeFeedStateV1(
"someContainerRid",
FeedRangeEpkImpl.forFullRange(),
ChangeFeedMode.INCREMENTAL,
ChangeFeedStartFromInternal.createFromBeginning(),
null);
return state.toString();
}

private static String buildFullFidelityContinuationToken() {
ChangeFeedStateV1 state = new ChangeFeedStateV1(
"someContainerRid",
FeedRangeEpkImpl.forFullRange(),
ChangeFeedMode.FULL_FIDELITY,
ChangeFeedStartFromInternal.createFromNow(),
null);
return state.toString();
}
}
Loading
Loading