Skip to content

[Internal]Msdata/direct: Refactors msdata/direct refresh July 2025#5315

Closed
ananth7592 wants to merge 687 commits intomsdata/directfrom
users/nalutripician/updateMsDataDirect07-25
Closed

[Internal]Msdata/direct: Refactors msdata/direct refresh July 2025#5315
ananth7592 wants to merge 687 commits intomsdata/directfrom
users/nalutripician/updateMsDataDirect07-25

Conversation

@ananth7592
Copy link
Copy Markdown
Member

Pull Request Template

Description

Please include a summary of the change and which issue is fixed. Include samples if adding new API, and include relevant motivation and context. List any dependencies that are required for this change.

Type of change

Please delete options that are not relevant.

  • [] Bug fix (non-breaking change which fixes an issue)
  • [] New feature (non-breaking change which adds functionality)
  • [] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [] This change requires a documentation update

Closing issues

To automatically close an issue: closes #IssueNumber

kirankumarkolli and others added 30 commits October 4, 2024 15:57
- Tests updated to clean-up emulator for every class of tests
- It addressed tests non-graceful handling of orphaned containers
- Tests creating unnecessary higher number of partitions are optimized
…4750)

[INTERNAL] CI : Fixes Quarantine StreamOperationsTestDirectModeAsync
[INTERNAL] CI: Fixes adding auto-retries for tests

Only the failed tests are re-run. Its still possible that that re-run
tests might see the state of the container at that time.
Test needs to be proofed in future.
[INTERNAL] CI: Fixes weekend rolling build at 2 hour cadence
Weekend cadence of every 2 hours, helps measuring pipeline reliability
[INTERNAL] CI: Moving flaky tests into separate groups
# Pull Request Template

## Description

On Encryption path we are switching to MDE API with offsets. This allows
reducing of allocations.
To sustain compatibility, enhanced API is backward implemented to
obsoleted local copy of encryption algorithm. No performance
optimizations on obsoleted code paths.

## Type of change

Please delete options that are not relevant.

- [] Bug fix (non-breaking change which fixes an issue)

## Closing issues

Contributes to #4678

---------

Co-authored-by: Juraj Blazek <jublazek@microsoft.com>
Co-authored-by: juraj-blazek <53177060+juraj-blazek@users.noreply.github.com>
Co-authored-by: Santosh Kulkarni <66682828+kr-santosh@users.noreply.github.com>
Co-authored-by: Kiran Kumar Kolli <kirankk@microsoft.com>
## Description

1. Added `db.query.text` attribute to record queries in Traces.
2. Introduced new `QueryTextMode`, which will have valid values as
        a) `None` :Do not show query.
        b) `ParameterizedOnly` : Print parameterized query only.
        b) `All`
3. It can be set as part of `CosmosClientTelemetryOptions` and
`RequestOptions` (i.e. `QueryRequestOptions` and
`ChangeFeedRequestOptions`)

## Type of change

- [] New feature (non-breaking change which adds functionality)

---------

Co-authored-by: Kiran Kumar Kolli <kirankk@microsoft.com>
Co-authored-by: Justine Cocchi <jucocchi@microsoft.com>
# Pull Request Template

## Description

Cleanup of code styling issues in Cosmos.Encryption.Custom.* projects as
set up by editorconfig.

## Type of change

Please delete options that are not relevant.

- [] Bug fix (non-breaking change which fixes an issue)

## Closing issues

Contributes to #4678

---------

Co-authored-by: Juraj Blazek <jublazek@microsoft.com>
Co-authored-by: juraj-blazek <53177060+juraj-blazek@users.noreply.github.com>
Co-authored-by: Santosh Kulkarni <66682828+kr-santosh@users.noreply.github.com>
Co-authored-by: Kiran Kumar Kolli <kirankk@microsoft.com>
# Pull Request Template

## Description

Adds a new benchmark to simulate users inserting a document and
immediately reading it. This with the
#4478 will test
replication latency between different geo regions, and how SDK handles
various error scenarios.

## Type of change

Please delete options that are not relevant.

- [] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [] This change requires a documentation update

## Closing issues

To automatically close an issue: closes #IssueNumber

Co-authored-by: jakewilley_microsoft <--global>
Co-authored-by: Kiran Kumar Kolli <kirankk@microsoft.com>
…3987)

## Description

Add a remarks section to AllowBulkExecution explaining that it is not
recommended to be used with Resource Token authentication

## Type of change

Documentation fix

Fixes #1783
Closes #1783

---------

Co-authored-by: iain holmes <iaholmes@microsoft.com>
Co-authored-by: Ruben Bartelink <ruben@bartelink.com>
Co-authored-by: Kiran Kumar Kolli <kirankk@microsoft.com>
- Rolling runs: Flaky ones are excluded (except some emulator issues
remaining all failures are from them)
- EndToEndTraceWriterBaselineTests: Making them idempotent (retries are
failing)
# Pull Request Template

## Description

Switch both MDE encryption and encryption paths to use MDE2.0 api

## Type of change

Please delete options that are not relevant.

- [] New feature (non-breaking change which adds functionality)

## Closing issues

Contributes to #4678

---------

Co-authored-by: Juraj Blazek <jublazek@microsoft.com>
Co-authored-by: juraj-blazek <53177060+juraj-blazek@users.noreply.github.com>
Co-authored-by: Santosh Kulkarni <66682828+kr-santosh@users.noreply.github.com>
## Description

Removing dependency on `Task.Delay()
`
## Type of change
- [] Bug fix (non-breaking change which fixes an issue)

## Closing issues

To automatically close an issue: closes #4731
## Description

Removing dependency on `Task.Delay() `

## Type of change
- [] Bug fix (non-breaking change which fixes an issue)

## Closing issues

To automatically close an issue: closes #4732
…on failure status codes (#4762)

## Description
Added a check on Cosmos Exception to make sure events are generated only
for the allowed status and substatuscode

## Type of change
- [] Bug fix (non-breaking change which fixes an issue)

## Closing issues

To automatically close an issue: closes #4735
[INTERNA] CI: Fixes disabling Binskim as nuget repo failing
[INTERNAL] Build: Fix warnings in encryption project
[INTERNAL]STJ: Upgrade STJ to 8.0.5 as per dependa bot
[INTERNAL] CI: Fixes styling and documentation issues
# [Internal] Usage: Add README.md for ApplicationInsights usage sample

## Description

Add README.md for ApplicationInsights usage sample

## Type of change

Please delete options that are not relevant.

- [] Bug fix (non-breaking change which fixes an issue)
- [] New feature (non-breaking change which adds functionality)
- [] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [] This change requires a documentation update

## Closing issues

To automatically close an issue: closes #IssueNumber

---------

Co-authored-by: David Chaava <v-dchaava@microsoft.com>
Co-authored-by: Ruben Bartelink <ruben@bartelink.com>
Co-authored-by: Matias Quaranta <ealsur@users.noreply.github.com>
Co-authored-by: Kiran Kumar Kolli <kirankk@microsoft.com>
…4725)

# Pull Request Template

## Description

This PR makes several regions available for public usage.

Austria East
France Central
France South
Indonesia Central
Southeast US
Southwest US
Malaysia West
Germany Central
Germany North
Chile Central
South Central US 2



## Type of change

Please delete options that are not relevant.

- [] New feature (non-breaking change which adds functionality)


## Closing issues

To automatically close an issue: closes #IssueNumber

---------

Co-authored-by: Kiran Kumar Kolli <kirankk@microsoft.com>
[INTERNAL] CI: Fixes timeout for tasks to 60M
…nsights sdk (#4781)

## Description

As part of this PR, adding back attributes required by appinsights sdk.
It broke the customer experience with appinsight as Appinsight SDK
supports very specific set of attributes.

This implementation will change in future release, where open telemetry
attributes will be controlled by Env Variable as mentioned here.

https://github.com/open-telemetry/semantic-conventions/blob/main/docs/database/database-spans.md#semantic-conventions-for-database-client-calls

**4.42.3**

![image](https://github.com/user-attachments/assets/6c19eab0-4884-4648-b6d9-c941d58e8e0b)

**v4.43.0**

![image](https://github.com/user-attachments/assets/fd2a70b4-ed37-4618-b1a5-f654587bf6ab)

**v4.43.1**

![image](https://github.com/user-attachments/assets/5bbe54ec-34f3-4538-a7b4-abff1b1853ea)

After this PR:

![image](https://github.com/user-attachments/assets/2ea64a34-4e57-4dec-a29e-32ffcd77919a)

After this change, customer has to set `OTEL_SEMCONV_STABILITY_OPT_IN`
to `database/dup`, in order to see latest attributes. otherwise SDK will
emit only classic attributes which would be compatible with appinsights
sdk also.

## Type of change
- [] Bug fix (non-breaking change which fixes an issue)

## Closing issues

To automatically close an issue: closes #IssueNumber
# Pull Request Template

## Description

Refactor EncryptionProcessor - split code paths for deprecated local
implementation of Aes, split individual transformers to improve
maintainability and testability, split preview/stable

To be processed after #4753 

## Type of change

Please delete options that are not relevant.

- [] New feature (non-breaking change which adds functionality)

## Closing issues

Contributes to #4678

---------

Co-authored-by: Juraj Blazek <jublazek@microsoft.com>
Co-authored-by: juraj-blazek <53177060+juraj-blazek@users.noreply.github.com>
Co-authored-by: Santosh Kulkarni <66682828+kr-santosh@users.noreply.github.com>
Co-authored-by: Kiran Kumar Kolli <kirankk@microsoft.com>
[INTERNAL] CI: Adds .NET8 SDK support

RISK: not all pipelines are exercised part of CI gates and might be
surprises later.
# Pull Request Template

## Description

- This is preliminary step to enable Brotli compression on
Cosmos.Encryption.Custom.
- .NET8.0 target was added to the project
- New compiler complaints were addressed

To be processed after #4757 

## Type of change

Please delete options that are not relevant.

- [] New feature (non-breaking change which adds functionality)

## Closing issues

Contributes to #4678

---------

Co-authored-by: Juraj Blazek <jublazek@microsoft.com>
Co-authored-by: juraj-blazek <53177060+juraj-blazek@users.noreply.github.com>
Co-authored-by: Santosh Kulkarni <66682828+kr-santosh@users.noreply.github.com>
Regions: Fixes decommissioned regions

Two decommissioned regions which are removed
```c#
        public const string GermanyCentral = "Germany Central";
        public const string GermanyNortheast = "Germany Northeast";
```

---------

Co-authored-by: Debdatta Kunda <87335885+kundadebdatta@users.noreply.github.com>
# Pull Request Template

## Description

Adds hedging documentation

## Type of change

Please delete options that are not relevant.

- [] Bug fix (non-breaking change which fixes an issue)
- [] New feature (non-breaking change which adds functionality)
- [] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [] This change requires a documentation update

## Closing issues

To automatically close an issue: closes #IssueNumber

---------

Co-authored-by: Justine Cocchi <jucocchi@microsoft.com>
Co-authored-by: Kevin Pilch <kevinpi@microsoft.com>
Co-authored-by: Kiran Kumar Kolli <kirankk@microsoft.com>
Maya-Painter and others added 14 commits June 27, 2025 16:35
…cates (#5246)

## Description

This change updates the JsonStringDictionary class to allow duplicate
entries and removes size limit to match the server-side implementation.
…for Thin Client Feature (#5265)

# Pull Request Template

## Description

This PR adds the following new features in the
[UserAgentFeatureFlags](https://github.com/Azure/azure-cosmos-dotnet-v3/blob/master/Microsoft.Azure.Cosmos/src/Diagnostics/UserAgentFeatureFlags.cs),
so that the diagnostics user agent can contain the thin client
enablement flag:

- A new Feature Flag for Thin Client.
- A new Feature Flag for Binary Encoding.
- A new Feature Flag for Http 2.0.

```csharp
internal enum UserAgentFeatureFlags
{
 	PerPartitionAutomaticFailover = 1,
         PerPartitionCircuitBreaker = 2,
         ThinClient = 4,
         BinaryEncoding = 8,
         Http2 = 16
}
```
An updated user agent will look like the following: 

`"User Agent": "cosmos-netstandard-sdk/3.51.0|1|X64|Microsoft Windows
10.0.26100|.NET 6.0.36|L|F4|"` if only thin-client is enabled.

## Type of change

Please delete options that are not relevant.

- [x] New feature (non-breaking change which adds functionality)

## Closing issues

To automatically close an issue: closes #4615
…in Client Proxy. (#5264)

# Pull Request Template

## Description

This PR allows FaultInjection rules that target gateway requests to also
work with the ThinClient Proxy.

## Type of change

Please delete options that are not relevant.

- [x] New feature (non-breaking change which adds functionality)

## Closing issues

To automatically close an issue: closes #5256
## Description

Adds a new class for trace datum keys so that trace parsing in query
benchmarks stays up to date.
…lure on primary region (#5267)

# Pull Request Template

## Description


This pull request introduces a new test method to validate the behavior
of the Cosmos DB SDK under specific fault injection scenarios. The test
ensures that the SDK handles internal server errors during address
refresh operations correctly.

### New Test Implementation:

*
[`Microsoft.Azure.Cosmos/tests/Microsoft.Azure.Cosmos.EmulatorTests/CosmosItemIntegrationTests.cs`](diffhunk://#diff-16d429adf686a32936696d2014afab3fc8faf91f10c880850fb8b30f8b96bb33R1452-R1501):
Added a new test method, `AddressRefreshInternalServerErrorTest`, to
verify the SDK's behavior when an internal server error occurs during
metadata address refresh operations. The test uses fault injection to
simulate the error and validates that the SDK retries and successfully
completes the operation.

## Type of change

Please delete options that are not relevant.

- [] New feature (non-breaking change which adds functionality)

## Closing issues

To automatically close an issue: closes #IssueNumber
…ests (#5274)

# Pull Request Template

## Description

This PR fixes the account properties parsing logic for failing PPAF
tests

## Type of change

Please delete options that are not relevant.

- [x] Bug fix (non-breaking change which fixes an issue)

## Closing issues

To automatically close an issue: closes #IssueNumber
# Pull Request Template

## Description

This PR adds the Flaky tag to the TypedPointOperationsAsync test in the
EndToEndTraceWriterBaselineTests.cs file. This test has a failure rate
of 70% in our nightly rolling pipeline runs. Until the cause of this is
found then we will mark this as flaky to ensure the pipeline pass.

## Type of change

Please delete options that are not relevant.

- [] Bug fix (non-breaking change which fixes an issue)
- [] New feature (non-breaking change which adds functionality)
- [] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [] This change requires a documentation update

## Closing issues

To automatically close an issue: closes #IssueNumber

Co-authored-by: Kiran Kumar Kolli <kirankk@microsoft.com>
…hrough queries (#5273)

## Description

Since the addition of non-streaming order by, it is no longer safe to
treat single partition order by queries as passthrough, as it can lead
to incorrect results for `ORDER BY VectorDistance` and `ORDER BY RANK`
queries. This PR fixes the issue by adding a check for non-streaming
order by queries in the passthrough query creation logic.

## Type of change

Please delete options that are not relevant.

- [x] Bug fix (non-breaking change which fixes an issue)
…OM (#5262)

# Pull Request Template

## Description
This PR adds missing constants for function names in SqlIdentifier
class.
## Type of change

Please delete options that are not relevant.

- [x] Bug fix (non-breaking change which fixes an issue)
- [] New feature (non-breaking change which adds functionality)
- [] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [] This change requires a documentation update

## Closing issues

To automatically close an issue: closes #IssueNumber

---------

Co-authored-by: Minh Le <leminh@microsoft.com>
…x Cross Regional Retry Logic (#5276)

# Pull Request Template

## Description

This PR adds the support for the below:

- **Adds new HttpTimeoutPolicy:** Primarily create a new http timeout
policy to retry the proxy requests faster than the regular Gateway
requests. We need an aggressive timeout policy to meet the SLA bar.

- **Adds/ Fixes Cross Regional Retry Logic:** Currently the cross
regional retry logic for thin client has a gap and it does not resolve
the thin proxy endpoints correctly. This PR is fixing that behavior.

## Type of change

Please delete options that are not relevant.

- [x] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)

## Closing issues

To automatically close an issue: closes #5275
## Description

This bumps the SDK GA version from 3.52.1 to 3.52.0 and preview version
from 3.53.0-preview.0 to 3.53.0-preview.1 It is a patch upgrade to the
previous release.

**There are no contract changes**


## Type of change

Please delete options that are not relevant.

- [] Bug fix (non-breaking change which fixes an issue)
- [] New feature (non-breaking change which adds functionality)
- [] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [] This change requires a documentation update

## Closing issues

To automatically close an issue: closes #IssueNumber
…t.CreateDocumentClientExceptionAsync (#5291)

## Problem

During write region failover, Gateway returns a 403 response with
`application/json` content type but invalid JSON content. This triggers
a bug in `GatewayStoreClient.CreateDocumentClientExceptionAsync` where
the HTTP response stream is consumed twice:

1. First consumption happens when attempting to deserialize the response
as JSON (line 176)
2. When deserialization fails and the catch block is entered, execution
continues to the fallback logic (line 195) which tries to read the same
stream again
3. This results in an unhandled `InvalidOperationException: The stream
was already consumed. It cannot be read again.`

The original `DocumentClientException` with proper diagnostics is lost,
making debugging difficult.

## Root Cause

The issue was introduced when an `else` clause was removed and an empty
`catch` block was added to the JSON deserialization logic, causing the
same stream to be processed twice if deserialization fails.

## Solution

This PR implements a minimal fix that:

1. **Buffers the HTTP response content once** using
`ReadAsStringAsync()` before attempting JSON deserialization
2. **Creates a new MemoryStream** from the buffered content for the JSON
deserialization attempt
3. **Reuses the buffered content** in the fallback logic instead of
trying to read from the response stream again
4. **Fixes a typo** in the generic type parameter from `<e>` to
`<Error>`

## Changes Made

- Modified `CreateDocumentClientExceptionAsync` method to buffer content
once and reuse it
- Added explanatory comments for the stream consumption fix
- Added test case
`TestStreamConsumptionBugFixWhenJsonDeserializationFails` to verify the
fix
- All changes are surgical and preserve existing functionality

## Code Example

**Before (buggy):**
```csharp
// First read - consumes the stream
Stream contentAsStream = await responseMessage.Content.ReadAsStreamAsync();
Error error = JsonSerializable.LoadFrom<Error>(stream: contentAsStream);
// ... if this fails and throws, execution continues to:

// Second read - fails with "stream already consumed"
contextBuilder.AppendLine(await responseMessage.Content.ReadAsStringAsync());
```

**After (fixed):**
```csharp
// Buffer content once
contentString = await responseMessage.Content.ReadAsStringAsync();
using (MemoryStream contentStream = new MemoryStream(Encoding.UTF8.GetBytes(contentString)))
{
    Error error = JsonSerializable.LoadFrom<Error>(stream: contentStream);
    // ... 
}
// ... if this fails and throws, execution continues to:

// Reuse buffered content - no stream re-reading
contextBuilder.AppendLine(contentString ?? await responseMessage.Content.ReadAsStringAsync());
```

This ensures that when Gateway returns invalid JSON during failover
scenarios, clients get proper `DocumentClientException` instances with
diagnostics instead of unhandled `InvalidOperationException` errors.

Fixes #5243.

<!-- START COPILOT CODING AGENT TIPS -->
---

💡 You can make Copilot smarter by setting up custom instructions,
customizing its development environment and configuring Model Context
Protocol (MCP) servers. Learn more [Copilot coding agent
tips](https://gh.io/copilot-coding-agent-tips) in the docs.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: kirankumarkolli <6880899+kirankumarkolli@users.noreply.github.com>
Co-authored-by: Kiran Kumar Kolli <kirankk@microsoft.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good!

@ananth7592 ananth7592 force-pushed the users/nalutripician/updateMsDataDirect07-25 branch from db40b58 to fb2b494 Compare July 24, 2025 19:10
@NaluTripician NaluTripician changed the title [Internal]Msdata/direct: Refreshes msdata/direct refresh July 2025 [Internal]Msdata/direct: Refactors msdata/direct refresh July 2025 Jul 24, 2025
@ananth7592 ananth7592 force-pushed the users/nalutripician/updateMsDataDirect07-25 branch from fb2b494 to 4f47eaa Compare July 25, 2025 17:19
@ananth7592 ananth7592 closed this Jul 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.