Commit f76003f
[Client Encryption] Add ArrayPool-backed pooled streams for reduced allocations in streaming encrypt/decrypt operations (#5479)
# [Internal] Client Encryption: Refactors stream-processor to reduce
decrypt-path allocations
## What this PR changes
Surgical optimisations on the `JsonProcessor.Stream` path in
`Microsoft.Azure.Cosmos.Encryption.Custom`. No public-API changes, no
changes to the `Newtonsoft` path.
1. **Decrypt writer routed through `IBufferWriter<byte>`.**
`Utf8JsonWriter(Stream)` on .NET 8 eagerly constructs an internal
`ArrayBufferWriter<byte>` that is GC-heap backed (`Array.Resize`
doubling from 256 B), producing ~2× the final JSON size in short-lived
GC garbage per op. The decrypt core now uses
`Utf8JsonWriter(IBufferWriter<byte>)` over a pooled
`RentArrayBufferWriter`, eliminating that internal buffer entirely. The
new-output adapter path returns a `ReadOnlyBufferWriterStream` that owns
the rented buffer (cleared on dispose for defense-in-depth). The
caller-provided-output path shares the same core and memcpy-copies out
to the user stream at the end.
2. **`_ei` metadata subtree extraction streams via `Utf8JsonReader`**
instead of
`JsonSerializer.DeserializeAsync<EncryptionPropertiesWrapper>(stream)`.
The old call forces `ReadBufferState` to grow `16K → 32K → 64K → 128K`
(final rental lands on the LOH) because `Skip()` over every unknown root
property needs the complete value in-buffer. The replacement
(`EncryptionPropertiesStreamReader`) uses `Utf8JsonReader` with
`TrySkip` and `isFinalBlock: false`, staying in a 4 KB pooled buffer and
only growing when a single value exceeds the current buffer. Safe-rewind
snapshotting handles `_ei` truncation across chunk boundaries; an
explicit guard prevents pathological growth on partial-read (trickle)
transports.
3. **Property-name matching uses `Utf8JsonReader.ValueTextEquals`** with
pre-encoded UTF-8 path bytes instead of allocating `"/" +
reader.GetString()` per property.
4. **`PooledMemoryStream` rents its backing buffer lazily** on first
write.
5. **`ArrayPoolManager.rentedBuffers` is pre-sized** to cover the
typical decrypt rent count.
6. **`PooledStreamConfiguration` XML docs** clarify that
`SetConfiguration` is configure-once-before-first-use.
## How `PooledMemoryStream` and friends are actually used
All pooling types introduced by this PR are `internal sealed` and live
under `Microsoft.Azure.Cosmos.Encryption.Custom.Common`. No public API
surface changes.
### The types
| Type | Role | Lifetime |
|------|------|----------|
| `PooledMemoryStream` | `Stream` backed by an `ArrayPool<byte>.Shared`
rental that grows by doubling. Clears the buffer before return by
default. | Constructed by product code, handed back to user as the
`Stream` return of `EncryptAsync` / `DecryptStreamAsync`; disposal
returns the rental. |
| `ReadOnlyBufferWriterStream` *(new)* | Read-only, seekable `Stream`
wrapper around a `RentArrayBufferWriter`. Owns the writer and disposes
it (clearing the rented buffer) on `Dispose`. | Constructed only inside
the decrypt adapter; handed back to user as the returned `Stream`;
disposal returns the rental. |
| `RentArrayBufferWriter` | `IBufferWriter<byte>` backed by
`ArrayPool<byte>.Shared`. Internal-only. | Constructed per operation for
`Utf8JsonWriter` to write through; disposed at end of op (or transferred
into `ReadOnlyBufferWriterStream`). |
| `ArrayPoolManager` | Bookkeeping for multiple `ArrayPool<byte>`
rentals that all return at end of op. | One instance per encrypt/decrypt
op; disposed at end. |
| `PooledJsonSerializer` | Static helpers: shared
`JsonSerializerOptions`, serialize-to-pooled-stream,
deserialize-from-stream. | Stateless. |
| `PooledStreamConfiguration` | Static configure-once knobs:
`StreamInitialCapacity` (4 KB), `BufferWriterInitialCapacity` (256 B),
`StreamProcessorBufferSize` (16 KB). | Process-lifetime singleton. |
### The three real call-flows
All external traffic enters through the public `EncryptionProcessor`
static class, which is driven by `EncryptionContainer` on the user's
behalf. Every returned pooled stream is eventually disposed by the
Cosmos SDK (`ResponseMessage.Dispose()` chains to `Content.Dispose()`,
which walks into our `Dispose` that returns the rental to the pool).
#### 1. Encrypt to a new stream — `EncryptAsync(Stream input, …) →
Stream`
```
user Cosmos call
└─ EncryptionContainer.CreateItemStreamAsync / ReadItemStreamAsync (ETM-enabled path)
└─ EncryptionProcessor.EncryptAsync (public static, dispatches on algorithm)
└─ MdeEncryptionProcessor.EncryptAsync
└─ IMdeJsonProcessorAdapter.EncryptAsync
├─ SystemTextJsonStreamAdapter (JsonProcessor.Stream)
│ [1] new PooledMemoryStream() ◄── rents from pool on first write
│ [2] streamProcessor.EncryptStreamAsync(input, ms, …) ── writes via Utf8JsonWriter(Stream)
│ [3] return ms as Stream ◄── ownership transfers to caller
└─ NewtonsoftAdapter (JsonProcessor.Newtonsoft) — unchanged by this PR
```
Caller (`EncryptionContainer.cs:78–166`) wraps the returned stream in
`using (Stream streamPayload = …)` so the pool rental is returned after
`CreateItemStreamAsync` sends the payload.
**Pool-managed allocations in this flow:** 1 × `PooledMemoryStream`
output + per-operation ArrayPool rents for intermediate buffers.
Returned.
#### 2. Encrypt into a caller-supplied stream — `EncryptAsync(Stream
input, Stream output, …)`
```
EncryptionProcessor.EncryptAsync(input, output, …) (NET8+ overload)
└─ MdeEncryptionProcessor.EncryptAsync(input, output, …)
└─ SystemTextJsonStreamAdapter.EncryptAsync(input, output, …)
└─ StreamProcessor.EncryptStreamAsync(input, output, …)
── writes directly to caller's output Stream via Utf8JsonWriter(Stream)
```
No `PooledMemoryStream` is constructed in this path — the caller already
supplied a stream. Used by performance-critical callers that want to
bring their own pooled output (the benchmark harness supplies
`RecyclableMemoryStream`; production can use any `Stream`).
**Pool-managed allocations in this flow:** 0 × `PooledMemoryStream`;
only intermediate `RentArrayBufferWriter` used by the stream processor
for per-property encryption buffers.
#### 3. Decrypt to a new stream — `DecryptAsync(Stream input, …) →
(Stream, DecryptionContext)`
This is the path this PR reshapes the most.
```
EncryptionContainer.*StreamAsync post-processing (decryptResponse branch)
└─ EncryptionProcessor.DecryptAsync (public static, peeks for legacy algo)
└─ MdeEncryptionProcessor.DecryptAsync (dispatches on JsonProcessor)
└─ IMdeJsonProcessorAdapter.DecryptAsync
└─ SystemTextJsonStreamAdapter.DecryptAsync (JsonProcessor.Stream)
[1] EncryptionPropertiesStreamReader.ReadAsync(input) ◄── NEW: streaming _ei extractor
[2] new RentArrayBufferWriter(StreamInitialCapacity) ◄── NEW: writes through IBufferWriter
[3] streamProcessor.DecryptStreamAsync(input, bufferWriter, …) ── Utf8JsonWriter(IBufferWriter<byte>)
[4] return (new ReadOnlyBufferWriterStream(bufferWriter), context)
│
└── the ReadOnlyBufferWriterStream takes ownership;
user's ResponseMessage.Dispose() disposes the stream,
which disposes the RentArrayBufferWriter,
which returns the rented buffer to ArrayPool<byte>.Shared.
```
`EncryptionContainer.cs` assigns the returned stream directly to
`responseMessage.Content`. When the application disposes the
`ResponseMessage`, the pool rental is returned.
**Pool-managed allocations in this flow:** 1 × `RentArrayBufferWriter`
(output) + per-op rents for decrypt buffers and `_ei` scan. All returned
on `Dispose`.
#### 3b. Decrypt into a caller-supplied stream — `DecryptAsync(Stream
input, Stream output, …)`
Same as (3) but `SystemTextJsonStreamAdapter.DecryptAsync(input, output,
…)` calls `StreamProcessor.DecryptStreamAsync(Stream, Stream, …)`, which
internally uses the same `IBufferWriter<byte>` core and then
`CopyToAsync`'s the result to the caller's stream. No
`ReadOnlyBufferWriterStream` is constructed. This is the
lowest-allocation path: the caller owns and disposes their output stream
and all the rest stays inside our pool.
### Ownership & disposal contract (summary)
- Everything the library hands back to the caller via a `Stream` return
value is owned by the caller and MUST be disposed to return the rental.
In the Cosmos pipeline this happens transparently via
`ResponseMessage.Dispose()`.
- Methods taking `Stream output` do NOT take ownership of the caller's
stream — the caller disposes it.
- Every internal `ArrayPool` rental is scoped to `using
RentArrayBufferWriter`/`using ArrayPoolManager` blocks so exceptions
never leak pool memory.
- `clearOnReturn: true` is the default on every rental path; decrypted
plaintext and ciphertext buffers are zeroed before they go back to the
shared pool.
### Should these types be public?
**Short answer: not in this PR, and likely not worth doing at all.**
Detailed reasoning:
| Type | Case for making public | Case against |
|------|------------------------|--------------|
| `PooledMemoryStream` | Callers could supply it to `EncryptAsync(input,
output, …)` instead of `MemoryStream` or `RecyclableMemoryStream`. |
`Microsoft.IO.RecyclableMemoryStream` already solves the same problem
with a mature API, multi-pool support, and telemetry; we'd be
reinventing a slightly worse version. The class is deliberately minimal
(lazy rent, clear-on-return) — it is *tightly coupled* to our security
contract, not a general-purpose pooled stream. |
| `ReadOnlyBufferWriterStream` | Callers could reuse it to wrap their
own `IBufferWriter<byte>`. | `System.IO.Pipelines` +
`PipeReader.AsStream()` already cover this case in the BCL. Our wrapper
is narrowly scoped to "take ownership of one `RentArrayBufferWriter` and
dispose it" — that ownership model isn't general. |
| `RentArrayBufferWriter` | Callers could drive `Utf8JsonWriter` through
it to eliminate STJ's internal buffer, same trick we apply. | Duplicate
of an existing gist-quality `ArrayPoolBufferWriter` that various
libraries ship; a public version in this library is not where downstream
consumers would look for it. |
| `PooledStreamConfiguration` | Hosts that run many tenants with very
different doc sizes could tune `StreamInitialCapacity`. | The defaults
(4 KB / 16 KB / 256 B) are well-chosen for encrypted Cosmos documents.
The comment from @kirankumarkolli asked for an **opt-in** story for the
new pooled behavior as a whole, which is a different question than
exposing the knobs. |
**Recommendation (separate from this PR):** keep the pooling types
`internal` and, if we want an opt-in, instead surface a single new
`RequestOptions`-like property such as `EncryptionAllocationMode = {
Default, PooledStream }` gated behind a feature flag. That addresses
@kirankumarkolli's opt-in concern without committing to public
pool-helper types we'd have to maintain forever.
## Benchmark methodology
Both runs executed on the same idle machine (Windows 11, .NET SDK
10.0.202, .NET 8.0.26 X64 RyuJIT AVX2, BenchmarkDotNet v0.13.3).
`MediumRun` job: 15 iterations × 2 launches × 10 warmup,
`OperationsPerInvoke = 16`, `InProcessEmitToolchain`, `MemoryDiagnoser`.
The benchmark harness (`EncryptionBenchmark.cs`) is **byte-identical**
between the two runs — only product code differs.
- **`master`**: commit `79d18b73` with this PR's harness back-ported
(harness-only changes: concrete `BenchmarkKeyStoreProvider`, drop of the
spurious `ENCRYPTION_CUSTOM_PREVIEW` gate, `OperationsPerInvoke = 16`).
- **`this PR`**: commit `9fea360f` on
`feature/stream-processor-optimizations` (subsequent commits add tests,
docs, and comment cleanup only — measured numbers unchanged).
Run with:
```powershell
cd Microsoft.Azure.Cosmos.Encryption.Custom/tests/Microsoft.Azure.Cosmos.Encryption.Custom.Performance.Tests
dotnet run -c Release --framework net8.0 -- --filter '*EncryptionBenchmark*'
```
## Newtonsoft paths — sanity check (not touched by this PR)
All Newtonsoft scenarios agree within run-to-run noise (≤ 0.52 %),
confirming the comparison is apples-to-apples.
| Scenario | master Alloc | this PR Alloc | Δ |
|---|---:|---:|---:|
| Encrypt 1 KB | 36,552 B | 36,648 B | +0.26% |
| Decrypt 1 KB | 54,768 B | 54,960 B | +0.35% |
| DecryptToProvidedStream 1 KB | 36,688 B | 36,880 B | +0.52% |
| Encrypt 10 KB | 171,353 B | 171,449 B | +0.06% |
| Decrypt 10 KB | 198,722 B | 198,913 B | +0.10% |
| DecryptToProvidedStream 10 KB | 125,401 B | 125,593 B | +0.15% |
| Encrypt 100 KB | 1,693,159 B | 1,693,133 B | −0.00% |
| Decrypt 100 KB | 1,584,352 B | 1,584,568 B | +0.01% |
| DecryptToProvidedStream 100 KB | 1,013,492 B | 1,013,660 B | +0.02% |
## Stream paths — where this PR moves the numbers
| Scenario | master Alloc | this PR Alloc | Alloc Δ | master Mean | this
PR Mean | Time Δ |
|---|---:|---:|---:|---:|---:|---:|
| 1 KB Encrypt | 16,552 B | 13,816 B | **−16.5%** | 27.54 μs | 19.50 μs
| **−29.2%** |
| 1 KB EncryptToProvidedStream | 10,392 B | 9,704 B | **−6.6%** | 24.24
μs | 18.96 μs | **−21.8%** |
| 1 KB Decrypt | 27,328 B | 24,704 B | **−9.6%** | 51.53 μs | 42.25 μs |
**−18.0%** |
| 1 KB DecryptToProvidedStream | 11,072 B | 5,720 B | **−48.3%** | 26.12
μs | 25.75 μs | −1.4% |
| 10 KB Encrypt | 81,953 B | 45,873 B | **−44.0%** | 69.84 μs | 64.00 μs
| **−8.4%** |
| 10 KB EncryptToProvidedStream | 36,048 B | 29,473 B | **−18.2%** |
62.92 μs | 65.16 μs | +3.6% |
| 10 KB Decrypt | 70,673 B | 63,489 B | **−10.2%** | 129.35 μs | 128.67
μs | −0.5% |
| 10 KB DecryptToProvidedStream | 17,984 B | 5,720 B | **−68.2%** |
59.03 μs | 49.01 μs | **−17.0%** |
| 100 KB Encrypt | 677,078 B | 425,690 B | **−37.1%** | 976.77 μs |
647.31 μs | **−33.7%** |
| 100 KB EncryptToProvidedStream | 229,131 B | 163,545 B | **−28.6%** |
637.89 μs | 484.90 μs | **−24.0%** |
| 100 KB Decrypt | 539,936 B | 446,452 B | **−17.3%** | 1,065.20 μs |
946.17 μs | **−11.2%** |
| 100 KB DecryptToProvidedStream | 118,682 B | 5,906 B | **−95.0%** |
468.04 μs | 386.53 μs | **−17.4%** |
### Takeaway
All 12 Stream-processor scenarios are **neutral or better than master on
both allocations and wall time**. Highlights:
- **100 KB DecryptToProvidedStream: −95 % allocations** (118 KB → 6 KB)
and **−17 % wall time**. Gen2 collections per 1 000 ops drop from 2.93
to 0.
- **100 KB Encrypt: −37 % allocations and −34 % wall time** (677 KB →
426 KB; 977 μs → 647 μs).
- **100 KB Decrypt: −17 % allocations and −11 % wall time** (540 KB →
446 KB; 1065 μs → 946 μs).
## Testing
- 375 unit tests pass (302 pre-existing + 73 new covering the
new/modified types to 100 % of the lines this PR touches).
- Newtonsoft path is untouched and allocation numbers confirm parity (≤
0.52 % noise).
- Integration tests require the Cosmos emulator, which is not available
in this environment — not run.
## Risks
- The pooled output streams are only safe as long as callers dispose
them. In the Cosmos pipeline this is automatic
(`ResponseMessage.Dispose()` chains through). Any hypothetical direct
caller of `EncryptionProcessor.EncryptAsync` / `DecryptAsync` that
forgets to dispose the returned stream leaks one `ArrayPool` rental —
identical risk surface to the pre-existing `PooledMemoryStream`, so no
new leak class.
- `EncryptionPropertiesStreamReader` fails fast on non-seekable input
(`ArgumentException`). This matches pre-PR behaviour, which threw
`NotSupportedException` from `input.Position = 0` on the same input.
## Checklist
- [x] Branch name matches `users/<user>/...`
- [x] PR title regex `(\[Internal\]|\[v4\] )?.{3}.+:
(Adds|Fixes|Refactors|Removes) .{3}.+`
- [x] `Co-authored-by: Copilot` trailer on every commit
- [x] No `Directory.Build.props` / versioning / packaging changes
- [x] Unit tests added; full suite passes on net8.0
- [x] Benchmark README updated with apples-to-apples numbers
---------
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Kiran Kumar Kolli <kirankk@microsoft.com>1 parent 6cb33e3 commit f76003f
27 files changed
Lines changed: 5909 additions & 213 deletions
File tree
- Microsoft.Azure.Cosmos.Encryption.Custom
- src
- Common
- Transformation
- tests
- Microsoft.Azure.Cosmos.Encryption.Custom.Performance.Tests
- Microsoft.Azure.Cosmos.Encryption.Custom.Tests
- Transformation
- Adapters
Lines changed: 23 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | | - | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
16 | 20 | | |
17 | 21 | | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
18 | 32 | | |
19 | 33 | | |
20 | 34 | | |
| |||
50 | 64 | | |
51 | 65 | | |
52 | 66 | | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
53 | 75 | | |
54 | 76 | | |
Lines changed: 230 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
0 commit comments