Skip to content

[Cosmos] Session container fixes #40366

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 65 commits into
base: main
Choose a base branch
from
Open

[Cosmos] Session container fixes #40366

wants to merge 65 commits into from

Conversation

simorenoh
Copy link
Member

@simorenoh simorenoh commented Apr 3, 2025

This PR initially aimed at closing several gaps in the session token handling logic of the Python SDK's session container, specifically sending the entire compound session token for a container for every request, but as a result of this has now grown beyond that in scope. This PR now also addresses and closes the following issues:

Pending items

  • [Cosmos] make queries fetch query plan in every query #38577 - ensuring we send query plan calls for every cross-partition query. this was addressed here since cross-partition queries that are sent without query plans do not provide partition information that can be used by the session container, and as such would be forced to either 1.) be sent without a continuation token, basically running in Eventual consistency, or 2.) be sent with the entire compound token for the container (which breaks users that have too many partitions due to request header size length). This is also the first follow up item marked in the PPCB PR, and would be needed to ensure we are fully covered on that front: Per Partition Circuit Breaker #40302. This is pending because pagination logic seemed to have completely broken after splitting this up. Further work will be needed to ensure our query pipeline can handle this scenario properly.

Current state

The Python SDK currently does several things that should be improved upon for session consistency behaviors:

  • We currently send out a session token for every single request so long as the default account consistency is Session, which is undesired behavior for write operations in single-write region scenarios.
  • The session token that we send out with our requests is a compound session token including every single partition in the container, which is unfeasible for large accounts since these can become large enough to cause request size issues.
  • The SDK had no pk cache refreshing logic for partition split scenarios since we don't receive 410/1002 status codes to react to for normal requests sending out a partition key value and not a partition key range id.
  • The SDK was not updating session tokens after read requests, allowing stale reads for workloads if other clients are interacting with the same container resource.

Changes introduced

In order to address the above issues, the following changes have been made:

  • We will now only send out session tokens for the relevant requests under session consistency - read operations, batch operations, or requests sent by multi-write configured accounts.
  • The session token that now gets sent out will only have the relevant information for its partition the same way the .NET and Java SDKs do, only sending the minimum information: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/SessionTokenHelper.java#L45
  • Now, once we receive a partition key range id in the response headers that is unaccounted for in the partition key range cache, we will force a refresh to the cache in order to obtain all the new ids to be used in session token computing for subsequent requests.
  • We now update session tokens on read requests as well, ensuring all requests are fetching the newest available session token.

Caveats

We currently only initiate the session container within a client if the user properly initializes their client. While this is not a problem for the sync client, it means that users that are not directly initializing their asynchronous clients as outlined in our README will not be able to leverage the session container, and will have to implement their own session token handling logic to achieve session consistency.

@azure-sdk
Copy link
Collaborator

azure-sdk commented Apr 3, 2025

API Change Check

APIView identified API level changes in this PR and created the following API reviews

azure-cosmos

@azure-sdk
Copy link
Collaborator

API change check

APIView has identified API level changes in this PR and created following API reviews.

azure-cosmos

@simorenoh simorenoh marked this pull request as ready for review May 19, 2025 18:42
@Copilot Copilot AI review requested due to automatic review settings May 19, 2025 18:42
@simorenoh simorenoh requested a review from a team as a code owner May 19, 2025 18:42
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors how session tokens are managed and sent for Azure Cosmos DB operations, introducing targeted token sending, caching improvements, and automatic refresh on partition splits.

  • Only relevant requests (reads, batches, multi-write scenarios) now include session tokens.
  • SessionContainer API expanded to handle partition key and routing info, with both sync/async support.
  • Header-building logic moved to _base.set_session_token_header(_async) and propagated throughout client connection flows.

Reviewed Changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
sdk/cosmos/azure-cosmos/azure/cosmos/_session.py Overhauled SessionContainer: changed get/set signatures and logic
sdk/cosmos/azure-cosmos/azure/cosmos/_base.py Introduced set_session_token_header(_async) helper, removed old logic
sdk/cosmos/azure-cosmos/azure/cosmos/_cosmos_client_connection.py Added session header injection in sync client methods
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_cosmos_client_connection_async.py Added session header injection in async client methods
sdk/cosmos/azure-cosmos/tests/test_session_container.py Deleted legacy session-container tests
Comments suppressed due to low confidence (3)

sdk/cosmos/azure-cosmos/azure/cosmos/_session.py:80

  • The variable collection_name is not defined in the non-name-based branch, leading to an UnboundLocalError. Ensure collection_name is initialized in both branches before use.
if collection_rid in self.rid_to_session_token and collection_name in container_properties_cache:

sdk/cosmos/azure-cosmos/azure/cosmos/_session.py:135

  • In the async get_session_token_async, collection_name is also undefined in the non-name-based path. Define collection_name consistently in both branches.
if collection_rid in self.rid_to_session_token and collection_name in container_properties_cache:

@simorenoh
Copy link
Member Author

/azp run python - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@simorenoh
Copy link
Member Author

/azp run python - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@simorenoh
Copy link
Member Author

/azp run python - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@simorenoh
Copy link
Member Author

/azp run python - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bambriz
Copy link
Member

bambriz commented Jun 11, 2025

/azp run python - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

5 participants