Skip to content

Commit d6d01d1

Browse files
Annie LiangCopilot
andcommitted
docs: clarify JFR allocation weight is cumulative alloc, not heap usage
JFR ObjectAllocationSample weight = estimated cumulative bytes allocated over the recording (10 min), not heap residency. Heap was 8 GB committed. The ~271 GB 'targeted' is allocation throughput (~4 GB/s alloc rate), most objects immediately GC'd. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 7f9a136 commit d6d01d1

1 file changed

Lines changed: 32 additions & 30 deletions

File tree

sdk/cosmos/azure-cosmos/benchmark-results/PR-DESCRIPTION.md

Lines changed: 32 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -2,31 +2,33 @@
22

33
### Motivation
44

5-
JFR profiling of the baseline (`main`) under high-concurrency gateway workloads revealed that `HashMap`-related allocations (`HashMap$Node`, `HashMap`, `HashMap$ValueIterator`) and HTTP header collections (`DefaultHeaders$HeaderEntry`, `HttpHeader`) account for a significant portion of total allocation pressure approximately **815%** of total sampled allocation weight depending on concurrency.
5+
JFR profiling of the baseline (`main`) under high-concurrency gateway workloads revealed that `HashMap`-related allocations (`HashMap$Node`, `HashMap`, `HashMap$ValueIterator`) and HTTP header collections (`DefaultHeaders$HeaderEntry`, `HttpHeader`) account for a significant portion of total allocation pressure -- approximately **8-15%** of total sampled allocation weight depending on concurrency.
66

7-
**Key findings from baseline JFR recordings** (c128 Read HTTP/1, `ObjectAllocationSample`):
7+
**Key findings from baseline JFR recordings** (c128 Read HTTP/1, `ObjectAllocationSample`, 10-min recording):
88

9-
| Class | Alloc Weight | % of Total |
10-
|-------|:-----------:|:----------:|
11-
| `HashMap$Node` | 171.0 GB | 6.9% |
12-
| `DefaultHeaders$HeaderEntry` | 169.9 GB | 6.8% |
13-
| `HashMap$ValueIterator` | 31.4 GB | 1.3% |
14-
| `HttpHeader` | 21.4 GB | 0.9% |
15-
| `HashMap` | 18.2 GB | 0.7% |
16-
| `HttpHeaders` | 15.6 GB | 0.6% |
17-
| `HashMap$Node[]` | 13.7 GB | 0.5% |
18-
| **Total targeted** | **271.1 GB** | **10.9%** |
9+
> **What is "Alloc Weight"?** JFR `ObjectAllocationSample` uses statistical sampling. The `weight` field is the **estimated cumulative bytes allocated** over the recording duration -- NOT heap residency. Most objects are short-lived and immediately GC'd. For reference, the JVM heap was 8 GB committed / ~5 GB used, while total allocation throughput was ~4 GB/s (typical for reactive workloads with high object churn).
10+
11+
| Class | Cumulative Alloc (10 min) | % of Total Alloc |
12+
|-------|:-------------------------:|:----------------:|
13+
| `HashMap$Node` | 171 GB | 6.9% |
14+
| `DefaultHeaders$HeaderEntry` | 170 GB | 6.8% |
15+
| `HashMap$ValueIterator` | 31 GB | 1.3% |
16+
| `HttpHeader` | 21 GB | 0.9% |
17+
| `HashMap` | 18 GB | 0.7% |
18+
| `HttpHeaders` | 16 GB | 0.6% |
19+
| `HashMap$Node[]` | 14 GB | 0.5% |
20+
| **Total targeted** | **~271 GB** | **~10.9%** |
1921

2022
Root causes identified:
21-
1. `HashMap<>()` default initial capacity (16) forces 12 resize+rehash cycles for typical gateway responses with 2030 headers
22-
2. `StoreResponse` constructor converts `HttpHeaders` `Map<String, String>` via `HttpUtils.asMap()` on every response, allocating a throwaway `HashMap$ValueIterator` and rebuilding `HashMap$Node` entries
23+
1. `HashMap<>()` default initial capacity (16) forces 1-2 resize+rehash cycles for typical gateway responses with 20-30 headers, creating throwaway `HashMap$Node[]` arrays and re-hashed `HashMap$Node` entries
24+
2. `StoreResponse` constructor converts `HttpHeaders` to `Map<String, String>` via `HttpUtils.asMap()` on every response, allocating a throwaway `HashMap$ValueIterator` and rebuilding `HashMap$Node` entries
2325
3. `HttpHeaders` in `RxGatewayStoreModel.getHttpRequestHeaders()` is undersized, causing internal HashMap resize
2426
4. Redundant `toLowerCase()` calls on header keys that are already normalized
2527

2628
### Changes
2729

2830
1. **Right-sized HashMap initial capacity**: `HashMap<>(32)` instead of `HashMap<>()` in `RxDocumentServiceRequest`, and `mapCapacityForSize()` helper in `HttpUtils` to avoid rehashing
29-
2. **Eliminate HashMap HttpHeaders HashMap round-trip**: `StoreResponse` now accepts `HttpHeaders` directly, removing intermediate `asMap()` conversion that created throwaway `HashMap$ValueIterator` and `HashMap$Node` arrays
31+
2. **Eliminate HashMap to HttpHeaders to HashMap round-trip**: `StoreResponse` now accepts `HttpHeaders` directly, removing intermediate `asMap()` conversion that created throwaway `HashMap$ValueIterator` and `HashMap$Node` arrays
3032
3. **Pre-sized HttpHeaders in `RxGatewayStoreModel`**: sized to `defaultHeaders.size() + headers.size()` to avoid internal HashMap resize
3133
4. **Remove redundant `toLowerCase()` calls**: `HttpHeaders.set()` already normalizes keys; callers no longer double-normalize creating extra `String` objects
3234

@@ -71,49 +73,49 @@ Root causes identified:
7173
7274
#### JFR Allocation Comparison (c128 Read HTTP/1, r1)
7375

74-
`ObjectAllocationSample` weight comparison for HashMap/header-related classes:
76+
`ObjectAllocationSample` cumulative allocation weight comparison (10-min recording, 8 GB heap):
7577

76-
| Class | main (GB) | hashmap-alloc (GB) | Change |
77-
|-------|:---------:|:------------------:|:------:|
78-
| `HashMap$Node` | 171.0 | 131.1 | -23% |
79-
| `HashMap$ValueIterator` | 31.4 | 0.0 | -100% |
80-
| `DefaultHeaders$HeaderEntry` | 169.9 | 110.8 | -35% |
81-
| `DefaultHeadersImpl` | 32.7 | 1.0 | -97% |
82-
| `HttpHeader` | 21.4 | 11.2 | -48% |
78+
| Class | main | hashmap-alloc | Reduction |
79+
|-------|:----:|:------------:|:---------:|
80+
| `HashMap$Node` | 171 GB | 131 GB | -23% |
81+
| `HashMap$ValueIterator` | 31 GB | 0 GB | -100% |
82+
| `DefaultHeaders$HeaderEntry` | 170 GB | 111 GB | -35% |
83+
| `DefaultHeadersImpl` | 33 GB | 1 GB | -97% |
84+
| `HttpHeader` | 21 GB | 11 GB | -48% |
8385

84-
> Note: `HashMap` object weight increased (18 -> 99 GB) due to JFR `ObjectAllocationSample` being statistical — pre-sized HashMap objects are sampled at a different rate than resize-triggered ones. The overall `HashMap$Node` reduction (23%) confirms fewer resize/rehash operations.
86+
> Note: `HashMap` object allocation weight increased (18 to 99 GB) -- this is a JFR sampling artifact. Pre-sized HashMap objects are sampled at a different rate than resize-triggered ones. The `HashMap$Node` reduction (23%) confirms fewer resize/rehash operations, which is the actual goal.
8587
8688
![JFR Allocation Comparison](https://raw.githubusercontent.com/xinlian12/azure-sdk-for-java/perf/hashmap-collection-allocation/sdk/cosmos/azure-cosmos/benchmark-results/1t-c128-ReadThroughput-http1-jfr-alloc.png)
8789

8890
#### Timeline Charts
8991

9092
Each chart shows throughput (ops/s) and P99 latency over time, with individual rounds (thin lines) and 3-round average (bold).
9193

92-
<details><summary><b>Read HTTP/1 c1 (low concurrency)</b></summary>
94+
<details><summary><b>Read HTTP/1 -- c1 (low concurrency)</b></summary>
9395

9496
![Read HTTP/1 c1](https://raw.githubusercontent.com/xinlian12/azure-sdk-for-java/perf/hashmap-collection-allocation/sdk/cosmos/azure-cosmos/benchmark-results/1t-c1-ReadThroughput-http1-timeline.png)
9597

9698
</details>
9799

98-
<details><summary><b>Read HTTP/1 c32 (mid concurrency, shows outlier pattern)</b></summary>
100+
<details><summary><b>Read HTTP/1 -- c32 (mid concurrency, shows outlier pattern)</b></summary>
99101

100102
![Read HTTP/1 c32](https://raw.githubusercontent.com/xinlian12/azure-sdk-for-java/perf/hashmap-collection-allocation/sdk/cosmos/azure-cosmos/benchmark-results/1t-c32-ReadThroughput-http1-timeline.png)
101103

102104
</details>
103105

104-
<details><summary><b>Read HTTP/1 c128 (high concurrency)</b></summary>
106+
<details><summary><b>Read HTTP/1 -- c128 (high concurrency)</b></summary>
105107

106108
![Read HTTP/1 c128](https://raw.githubusercontent.com/xinlian12/azure-sdk-for-java/perf/hashmap-collection-allocation/sdk/cosmos/azure-cosmos/benchmark-results/1t-c128-ReadThroughput-http1-timeline.png)
107109

108110
</details>
109111

110-
<details><summary><b>Read HTTP/2 c128 (shows +3.7% improvement)</b></summary>
112+
<details><summary><b>Read HTTP/2 -- c128 (shows +3.7% improvement)</b></summary>
111113

112114
![Read HTTP/2 c128](https://raw.githubusercontent.com/xinlian12/azure-sdk-for-java/perf/hashmap-collection-allocation/sdk/cosmos/azure-cosmos/benchmark-results/1t-c128-ReadThroughput-http2-timeline.png)
113115

114116
</details>
115117

116-
<details><summary><b>Write HTTP/1 c128</b></summary>
118+
<details><summary><b>Write HTTP/1 -- c128</b></summary>
117119

118120
![Write HTTP/1 c128](https://raw.githubusercontent.com/xinlian12/azure-sdk-for-java/perf/hashmap-collection-allocation/sdk/cosmos/azure-cosmos/benchmark-results/1t-c128-WriteThroughput-http1-timeline.png)
119121

@@ -127,7 +129,7 @@ Each chart shows throughput (ops/s) and P99 latency over time, with individual r
127129

128130
- **Overall average throughput change**: -1.0% (within noise; driven by main r1 outliers at mid-concurrency)
129131
- **Excluding outlier rounds**: essentially tied across all configurations
130-
- **Allocation reduction**: 23-100% reduction in targeted HashMap/header classes
132+
- **Allocation reduction**: 23-100% reduction in targeted HashMap/header allocation throughput
131133
- **Variance improvement**: hashmap-alloc consistently shows tighter round-to-round variance
132134
- **Write throughput**: neutral (+/-0.2% at high concurrency), confirming no regression on the write path
133135
- The changes are a **net improvement in allocation efficiency** with **no measurable throughput regression** once run-order artifacts are accounted for.

0 commit comments

Comments
 (0)