Skip to content

Commit 6f3a581

Browse files
RamjotSinghCopilot
andcommitted
Diagnostics: Fixes GetQueryMetrics returning null inside custom RequestHandler
Fixes #5117. `response.Diagnostics.GetQueryMetrics()` previously returned `null` when invoked inside a custom `RequestHandler.SendAsync` for query operations. The `QueryMetricsTraceDatum` was attached to the trace tree in `CosmosQueryClientCore.GetCosmosElementResponse` -- i.e. after the response had finished unwinding back through the handler pipeline -- so handlers walking the trace saw nothing. Changes: - TransportHandler.ProcessMessageAsync now attaches the QueryMetricsTraceDatum to the per-request transport trace immediately after the ResponseMessage is constructed, guarded on OperationType (Query / SqlQuery) so non-query ops that may ever surface a query-metrics header are not mis-tagged. - Removes the now-redundant attach in CosmosQueryClientCore.GetCosmosElementResponse and the now-unused Query.Core.Metrics using. - Cross-references the two attach sites (REST/RNTBD via TransportHandler; thin-client / distributed-gateway via CosmosDistributedQueryClient.CreatePage) with code comments so future query transports stay in sync. - CosmosTraceDiagnostics is unchanged: it continues to climb to the operation root, so response.Diagnostics retains the full operation context for every typed Response<T>, ReadFeedResponse<T>, FeedResponse<T>, and CosmosException.Diagnostics. Inside a custom handler the walk also climbs to root, so GetQueryMetrics returns the per-partition metrics for the current response plus any sibling partition responses that have already completed. Adds emulator tests covering both single-partition and cross-partition fan-out scenarios. The cross-partition test verifies via GetFeedRangesAsync that the emulator actually split into >= 2 physical partitions (Assert.Inconclusive otherwise) and uses a ConcurrentBag to safely capture handler observations across parallel partition responses. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent ce8eaa4 commit 6f3a581

5 files changed

Lines changed: 235 additions & 12 deletions

File tree

Microsoft.Azure.Cosmos/src/Handler/TransportHandler.cs

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ namespace Microsoft.Azure.Cosmos.Handlers
99
using System.Linq;
1010
using System.Threading;
1111
using System.Threading.Tasks;
12+
using Microsoft.Azure.Cosmos.Query.Core.Metrics;
1213
using Microsoft.Azure.Cosmos.Resource.CosmosExceptions;
1314
using Microsoft.Azure.Cosmos.Tracing;
1415
using Microsoft.Azure.Cosmos.Tracing.TraceData;
@@ -155,6 +156,29 @@ internal async Task<ResponseMessage> ProcessMessageAsync(
155156
request,
156157
serviceRequest.RequestContext.RequestChargeTracker);
157158

159+
// Attach query metrics to the per-request transport trace so they are
160+
// reachable from a custom RequestHandler.SendAsync as the response unwinds
161+
// back through the handler pipeline (issue #5117). Before this, the datum
162+
// was added in CosmosQueryClientCore.GetCosmosElementResponse, which runs
163+
// after the handler pipeline has fully unwound, so handlers always saw
164+
// a null GetQueryMetrics result. Guarded on operation type to avoid
165+
// mis-tagging non-query operations that may ever surface a query-metrics
166+
// header. The thin-client / distributed-gateway path adds the same datum
167+
// from CosmosDistributedQueryClient.CreatePage (that path bypasses
168+
// TransportHandler).
169+
if ((request.OperationType == OperationType.Query
170+
|| request.OperationType == OperationType.SqlQuery)
171+
&& !string.IsNullOrEmpty(responseMessage?.Headers?.QueryMetricsText))
172+
{
173+
string queryMetricsText = responseMessage.Headers.QueryMetricsText;
174+
QueryMetricsTraceDatum queryMetricsDatum = new QueryMetricsTraceDatum(
175+
new Lazy<QueryMetrics>(() => new QueryMetrics(
176+
queryMetricsText,
177+
IndexUtilizationInfo.Empty,
178+
ClientSideMetrics.Empty)));
179+
processMessageAsyncTrace.AddDatum(TraceDatumKeys.QueryMetrics, queryMetricsDatum);
180+
}
181+
158182
// Enrich diagnostics context in-case of auth failures
159183
if (responseMessage?.StatusCode == System.Net.HttpStatusCode.Unauthorized || responseMessage?.StatusCode == System.Net.HttpStatusCode.Forbidden)
160184
{

Microsoft.Azure.Cosmos/src/Query/Core/QueryClient/CosmosDistributedQueryClient.cs

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -120,8 +120,11 @@ private DocumentServiceRequest CreateRequest(
120120
return request;
121121
}
122122

123-
private static TryCatch<QueryPage> CreatePage(DocumentServiceResponse response, Tracing.ITrace trace)
124-
{
123+
private static TryCatch<QueryPage> CreatePage(DocumentServiceResponse response, Tracing.ITrace trace)
124+
{
125+
// Attach QueryMetricsTraceDatum for the thin-client / distributed-gateway path
126+
// (this path bypasses Handler/TransportHandler.cs which writes the same datum
127+
// for the REST/RNTBD path). Keep both call sites in sync — see issue #5117.
125128
string queryMetricsText = response.Headers[HttpConstants.HttpHeaders.QueryMetrics];
126129
if (queryMetricsText != null)
127130
{

Microsoft.Azure.Cosmos/src/Query/v3Query/CosmosQueryClientCore.cs

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,6 @@ namespace Microsoft.Azure.Cosmos
1616
using Microsoft.Azure.Cosmos.CosmosElements;
1717
using Microsoft.Azure.Cosmos.Json;
1818
using Microsoft.Azure.Cosmos.Query.Core;
19-
using Microsoft.Azure.Cosmos.Query.Core.Metrics;
2019
using Microsoft.Azure.Cosmos.Query.Core.Monads;
2120
using Microsoft.Azure.Cosmos.Query.Core.Pipeline.Pagination;
2221
using Microsoft.Azure.Cosmos.Query.Core.QueryClient;
@@ -322,15 +321,11 @@ private static TryCatch<QueryPage> GetCosmosElementResponse(
322321
{
323322
using (cosmosResponseMessage)
324323
{
325-
if (cosmosResponseMessage.Headers.QueryMetricsText != null)
326-
{
327-
QueryMetricsTraceDatum datum = new QueryMetricsTraceDatum(
328-
new Lazy<QueryMetrics>(() => new QueryMetrics(
329-
cosmosResponseMessage.Headers.QueryMetricsText,
330-
IndexUtilizationInfo.Empty,
331-
ClientSideMetrics.Empty)));
332-
trace.AddDatum(TraceDatumKeys.QueryMetrics, datum);
333-
}
324+
// Note: QueryMetricsTraceDatum is attached at the transport layer
325+
// (Handler/TransportHandler.cs) for the REST/RNTBD path so the datum
326+
// is reachable from custom RequestHandlers as the response unwinds.
327+
// The thin-client path attaches it in
328+
// Query/Core/QueryClient/CosmosDistributedQueryClient.CreatePage.
334329

335330
if (!cosmosResponseMessage.IsSuccessStatusCode)
336331
{

Microsoft.Azure.Cosmos/tests/Microsoft.Azure.Cosmos.EmulatorTests/CosmosHandlersTests.cs

Lines changed: 200 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,9 @@
55
namespace Microsoft.Azure.Cosmos.SDK.EmulatorTests
66
{
77
using System;
8+
using System.Collections.Concurrent;
89
using System.Collections.Generic;
10+
using System.Linq;
911
using System.Threading;
1012
using System.Threading.Tasks;
1113
using Microsoft.Azure.Documents;
@@ -95,6 +97,197 @@ public async Task TestBatchRequiredHeadersWithHandler()
9597
}
9698
}
9799

100+
[TestMethod]
101+
public async Task QueryMetricsAvailableInsideRequestHandler_SinglePartition()
102+
{
103+
// Regression test for issue #5117:
104+
// response.Diagnostics.GetQueryMetrics() must be non-null when invoked inside
105+
// a custom RequestHandler.SendAsync for a query operation.
106+
ConcurrentBag<ServerSideCumulativeMetrics> observedMetrics = new ConcurrentBag<ServerSideCumulativeMetrics>();
107+
108+
RequestHandlerHelper testHandler = new RequestHandlerHelper
109+
{
110+
CallBackOnResponse = (request, response) =>
111+
{
112+
if (request.OperationType == OperationType.Query
113+
&& response.IsSuccessStatusCode
114+
&& !string.IsNullOrEmpty(response.Headers?.QueryMetricsText))
115+
{
116+
ServerSideCumulativeMetrics metrics = response.Diagnostics.GetQueryMetrics();
117+
Assert.IsNotNull(metrics, "GetQueryMetrics() must be non-null inside the handler (issue #5117).");
118+
observedMetrics.Add(metrics);
119+
}
120+
121+
return response;
122+
}
123+
};
124+
125+
CosmosClient customClient = TestCommon.CreateCosmosClient(
126+
builder => builder.AddCustomHandlers(testHandler));
127+
128+
try
129+
{
130+
Container container = customClient.GetContainer(this.database.Id, this.Container.Id);
131+
QueryMetricsItem item = new QueryMetricsItem
132+
{
133+
id = Guid.NewGuid().ToString(),
134+
pk = Guid.NewGuid().ToString(),
135+
description = "single-partition"
136+
};
137+
await container.CreateItemAsync(item, new Cosmos.PartitionKey(item.pk));
138+
139+
FeedIterator<QueryMetricsItem> iterator = container.GetItemQueryIterator<QueryMetricsItem>(
140+
$"SELECT * FROM c WHERE c.id = '{item.id}'",
141+
requestOptions: new QueryRequestOptions
142+
{
143+
PartitionKey = new Cosmos.PartitionKey(item.pk)
144+
});
145+
146+
int totalItems = 0;
147+
while (iterator.HasMoreResults)
148+
{
149+
FeedResponse<QueryMetricsItem> page = await iterator.ReadNextAsync();
150+
totalItems += page.Count;
151+
}
152+
153+
Assert.AreEqual(1, totalItems);
154+
Assert.IsTrue(
155+
observedMetrics.Count >= 1,
156+
"Handler must be invoked for at least one query response.");
157+
158+
foreach (ServerSideCumulativeMetrics metrics in observedMetrics)
159+
{
160+
Assert.IsNotNull(metrics);
161+
Assert.AreEqual(
162+
1,
163+
metrics.PartitionedMetrics.Count,
164+
"Single-partition query should report exactly one PartitionedMetrics entry.");
165+
}
166+
}
167+
finally
168+
{
169+
customClient.Dispose();
170+
}
171+
}
172+
173+
[TestMethod]
174+
public async Task QueryMetricsAvailableInsideRequestHandler_CrossPartition()
175+
{
176+
// Companion to the single-partition test. Verifies that for cross-partition
177+
// queries the handler still sees non-null metrics on every query response.
178+
// Option C semantics: response.Diagnostics walks from the operation root, so
179+
// each handler invocation sees its own partition's metrics plus any sibling
180+
// partitions that have already completed at that point. The exact per-page
181+
// distribution is timing-dependent; what we guarantee is (a) metrics are not
182+
// null inside the handler and (b) the iterator-level aggregated view exposes
183+
// metrics from multiple partitions.
184+
ConcurrentBag<int> observedPartitionCounts = new ConcurrentBag<int>();
185+
186+
RequestHandlerHelper testHandler = new RequestHandlerHelper
187+
{
188+
CallBackOnResponse = (request, response) =>
189+
{
190+
if (request.OperationType == OperationType.Query
191+
&& response.IsSuccessStatusCode
192+
&& !string.IsNullOrEmpty(response.Headers?.QueryMetricsText))
193+
{
194+
ServerSideCumulativeMetrics metrics = response.Diagnostics.GetQueryMetrics();
195+
Assert.IsNotNull(metrics, "GetQueryMetrics() must be non-null inside the handler (issue #5117).");
196+
observedPartitionCounts.Add(metrics.PartitionedMetrics.Count);
197+
}
198+
199+
return response;
200+
}
201+
};
202+
203+
CosmosClient customClient = TestCommon.CreateCosmosClient(
204+
builder => builder.AddCustomHandlers(testHandler));
205+
206+
Cosmos.Database multiPartitionDatabase = null;
207+
try
208+
{
209+
multiPartitionDatabase = await customClient.CreateDatabaseAsync(
210+
"MultiPkDb_" + Guid.NewGuid().ToString());
211+
212+
Container multiPartitionContainer = await multiPartitionDatabase.CreateContainerAsync(
213+
new ContainerProperties(id: Guid.NewGuid().ToString(), partitionKeyPath: "/pk"),
214+
throughput: 15000);
215+
216+
IReadOnlyList<FeedRange> feedRanges = await multiPartitionContainer.GetFeedRangesAsync();
217+
if (feedRanges.Count < 2)
218+
{
219+
Assert.Inconclusive(
220+
$"Emulator did not split the 15000 RU container into >= 2 physical partitions (got {feedRanges.Count}); cannot exercise cross-partition fan-out.");
221+
}
222+
223+
for (int i = 0; i < 30; i++)
224+
{
225+
QueryMetricsItem item = new QueryMetricsItem
226+
{
227+
id = Guid.NewGuid().ToString(),
228+
pk = Guid.NewGuid().ToString(),
229+
description = "cross-partition-fanout"
230+
};
231+
await multiPartitionContainer.CreateItemAsync(item, new Cosmos.PartitionKey(item.pk));
232+
}
233+
234+
FeedIterator<QueryMetricsItem> iterator = multiPartitionContainer.GetItemQueryIterator<QueryMetricsItem>(
235+
"SELECT * FROM c",
236+
requestOptions: new QueryRequestOptions
237+
{
238+
MaxConcurrency = -1
239+
});
240+
241+
int totalItems = 0;
242+
int maxPartitionedCountOnIterator = 0;
243+
bool iteratorSawMetrics = false;
244+
while (iterator.HasMoreResults)
245+
{
246+
FeedResponse<QueryMetricsItem> page = await iterator.ReadNextAsync();
247+
totalItems += page.Count;
248+
249+
ServerSideCumulativeMetrics pageMetrics = page.Diagnostics.GetQueryMetrics();
250+
if (pageMetrics != null)
251+
{
252+
iteratorSawMetrics = true;
253+
if (pageMetrics.PartitionedMetrics.Count > maxPartitionedCountOnIterator)
254+
{
255+
maxPartitionedCountOnIterator = pageMetrics.PartitionedMetrics.Count;
256+
}
257+
}
258+
}
259+
260+
Assert.AreEqual(30, totalItems);
261+
262+
Assert.IsTrue(
263+
observedPartitionCounts.Count >= 2,
264+
$"Cross-partition query should invoke the handler for at least 2 partition responses, got {observedPartitionCounts.Count}.");
265+
266+
foreach (int count in observedPartitionCounts)
267+
{
268+
Assert.IsTrue(
269+
count >= 1,
270+
"Each handler invocation should see at least one partition's metrics (issue #5117).");
271+
}
272+
273+
Assert.IsTrue(
274+
iteratorSawMetrics,
275+
"Expected at least one page's iterator-level diagnostics to expose query metrics.");
276+
Assert.IsTrue(
277+
maxPartitionedCountOnIterator >= 2,
278+
$"Iterator-level diagnostics should aggregate metrics from multiple partitions on at least one page (max observed was {maxPartitionedCountOnIterator}).");
279+
}
280+
finally
281+
{
282+
if (multiPartitionDatabase != null)
283+
{
284+
await multiPartitionDatabase.DeleteAsync();
285+
}
286+
287+
customClient.Dispose();
288+
}
289+
}
290+
98291
private async Task<IList<ToDoActivity>> CreateRandomItems(int pkCount, int perPKItemCount = 1, bool randomPartitionKey = true)
99292
{
100293
Assert.IsFalse(!randomPartitionKey && perPKItemCount > 1);
@@ -146,5 +339,12 @@ public class ToDoActivity
146339
public string description { get; set; }
147340
public string status { get; set; }
148341
}
342+
343+
private class QueryMetricsItem
344+
{
345+
public string id { get; set; }
346+
public string pk { get; set; }
347+
public string description { get; set; }
348+
}
149349
}
150350
}

changelog.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2626
#### Bugs Fixed
2727

2828
- [5827](https://github.com/Azure/azure-cosmos-dotnet-v3/pull/5827) ChangeFeedEstimator: Change feed estimator threw `ArgumentNullException` when an inmemory lease container was being used. Update validations so in-memory lease containers work with change feed estimator
29+
- [5879](https://github.com/Azure/azure-cosmos-dotnet-v3/pull/5879) Diagnostics: Fixes `response.Diagnostics.GetQueryMetrics()` returning `null` when called inside a custom `RequestHandler.SendAsync` for query operations. Query metrics for the REST/RNTBD path are now attached at the transport layer, so they are visible to handlers as the response unwinds through the pipeline. `response.Diagnostics` continues to expose the full operation context (handlers see metrics from any sibling partition responses that have already completed). `FeedResponse.Diagnostics.GetQueryMetrics()` at the iterator level is unchanged.
2930

3031
#### Other Changes
3132

0 commit comments

Comments
 (0)