Skip to content

Encryption.Custom: Refactors Stream-opt-in legacy detection to Utf8JsonReader peek#5903

Draft
adamnova wants to merge 11 commits into
Azure:mainfrom
adamnova:users/adamnovak/encryption-custom-stream-fastpath-v2
Draft

Encryption.Custom: Refactors Stream-opt-in legacy detection to Utf8JsonReader peek#5903
adamnova wants to merge 11 commits into
Azure:mainfrom
adamnova:users/adamnovak/encryption-custom-stream-fastpath-v2

Conversation

@adamnova
Copy link
Copy Markdown
Contributor

Description

Follow-up to #5902. Replaces the exception-driven legacy AE-AES detection (try/catch over NotSupportedException from SystemTextJsonStreamAdapter) with a purpose-built LegacyAlgorithmDetector (Utf8JsonReader scan) for JsonProcessor.Stream opt-in callers.

The original detector triggered a full JsonSerializer.DeserializeAsync<EncryptionPropertiesWrapper> (base64-decoding the entire _ed payload) before throwing, so legacy documents paid for two full parses. The new detector classifies the document in a single pass over the top-level _ei._ea property without allocating either a JObject or an EncryptionPropertiesWrapper.

Strict behavioural parity

Stream-opt-in callers see byte-for-byte the same output as the non-opt-in (Newtonsoft) path on every input that does not match the MDE fast-path classification. Two correctness fixes landed alongside the detector to make this guarantee hold:

  1. LegacyAlgorithmDetector exact-matches both the legacy AND the MDE algorithm names. Empty strings, unknown algorithms, case variants, prefixes, and suffixes all classify as Unknown (routed through the JObject peek path) instead of being optimistically forwarded to the streaming MDE adapter.
  2. DecryptViaJObjectPeekAsync gained stripStreamOverrideOnSuccessFallthrough: when the synchronous JObject parse succeeded, the post-peek MDE fallthrough strips the JsonProcessor.Stream override (parity with non-opt-in). When the parse threw (async-only stream, malformed JSON), the override is preserved so the streaming MDE adapter still gets a shot - preserves the pre-existing Decrypt_Cancellation_Aborts contract on SlowCancelableStream.

Coverage

  • 44 direct LegacyAlgorithmDetector unit tests (empty/whitespace/non-object inputs, _ei shape variants, _ea position variants, JSON escapes, case-sensitivity, prefix/suffix collisions, MDE exact-match enforcement).
  • 10 Stream-opt-in legacy round-trip integration tests in EncryptionProcessorTests (full property parity, DecryptionContext path lists, large docs, repeated round-trips, null sensitive values, non-MemoryStream wrappers, byte-for-byte parity with Newtonsoft, unknown-algorithm propagation).
  • 37 strict-parity tests in EncryptionProcessorStreamParityTests that run every input through BOTH paths and assert agreement on: thrown vs succeeded, exception family, output bytes, PathsDecrypted. Sections: round-trip, legacy, edge-case payloads (empty/null/non-string _ea, _ei shape variants, truncated/empty/whitespace JSON, deep nesting), input-stream variants, cancellation, repeated calls.
  • All 285 tests pass on net8.0 (up from 194 on master). Zero warnings under TreatWarningsAsErrors=true.

Benchmarks

New LegacyDetectionBenchmark with [MemoryDiagnoser] measures the detector against an inlined replica of the PR #5902 try/catch baseline (MediumRun, InProcessEmit, 15 iters x 2 launches):

Scenario Doc New (us / KB alloc) Baseline (us / KB alloc) Time Alloc
Legacy 1 KB 62 / 45.3 146 / 52.3 -58% -13%
Legacy 10 KB 316 / 215.4 401 / 232.1 -21% -7%
Legacy 100 KB 2847 / 1859 3266 / 1974 -13% -6%
MDE 1 KB 25 / 12.3 24 / 12.3 neutral +-0
MDE 10 KB 60 / 28.7 58 / 28.7 neutral +-0
MDE 100 KB 677 / 225.3 680 / 225.2 neutral +-0

Strictly equal-or-better on allocations in every configuration; removes the exception-throw + wasted MDE deserialize cost from the legacy path entirely.

Also fixed the perf project so BenchmarkDotNet can load the existing EncryptionBenchmark (BDN refuses [Params] on non-public members like the internal JsonProcessor enum). Added a public JsonProcessorOption surrogate enum + BenchmarkSwitcher in Program.cs.

Gating

The detector and routing changes are gated behind NET8_0_OR_GREATER (the framework the Stream opt-in path requires).

Type of change

  • Refactor (no functional changes, no api changes)

adamnova and others added 2 commits May 25, 2026 11:44
…onReader peek

PR Azure#5902 added a fast path for callers that opt into JsonProcessor.Stream by
wrapping MdeEncryptionProcessor.DecryptAsync in try/catch(NotSupportedException)
to detect legacy AE-AES documents. That detector is exception-driven and, more
importantly, SystemTextJsonStreamAdapter.ReadMdeEncryptionPropertiesStreamingAsync
performs a full JsonSerializer.DeserializeAsync<EncryptionPropertiesWrapper> over
the entire stream (including base64-decoding the _ed payload) *before* throwing
NotSupportedException - so the "fast path" allocates the encrypted payload only
to discard it when the document is legacy. Legacy documents are then re-parsed
via JObject on the rewound stream, paying for two full parses.

This change replaces the try/catch with a purpose-built Utf8JsonReader scan
(LegacyAlgorithmDetector) that classifies a document as
NotEncrypted / MdeAlgorithm / LegacyAlgorithm / Unknown by reading only the
top-level _ei._ea property, without allocating either a JObject or an
EncryptionPropertiesWrapper. The detector is wired in only for the Stream
opt-in path; non-opt-in callers continue to use the original Newtonsoft
JObject peek (extracted verbatim into DecryptViaJObjectPeekAsync) with
byte-identical behavior.

Strict behavioural parity for Stream-opt-in callers
---------------------------------------------------
The Stream-opt-in path is now byte-for-byte equivalent to the non-opt-in path
on every input that does not match the MDE fast-path classification. Two
correctness fixes were required to land that guarantee:

1. LegacyAlgorithmDetector now exact-matches BOTH the legacy and MDE algorithm
   names. Empty strings, unknown algorithms, case variants, prefixes, and
   suffixes all classify as Unknown (not MdeAlgorithm), routing them through
   the permissive JObject peek path instead of straight into the streaming MDE
   adapter (which would have thrown NotSupportedException / JsonException on
   shapes the Newtonsoft path handles gracefully).

2. DecryptViaJObjectPeekAsync gained a stripStreamOverrideOnSuccessFallthrough
   parameter. When set (only the Stream-opt-in router sets it), the JObject
   path's post-peek MDE fallthrough swaps the caller's Properties dictionary
   for one with the JsonProcessor.Stream override removed - but only when the
   JObject parse SUCCEEDED synchronously. If the parse threw (async-only
   stream, malformed JSON), the original requestOptions is preserved so MDE's
   streaming adapter still gets a chance at the stream (preserves the
   pre-existing Decrypt_Cancellation_Aborts contract on SlowCancelableStream).
   The Properties reference is restored in a finally block; observable state
   on the caller's RequestOptions is unchanged after the call returns.

Behavior contract preserved:
- Stream opt-in + legacy AE-AES documents still transparently fall back to
  the Newtonsoft path
  (regression: Decrypt_StreamSelection_LegacyAlgorithm_FallsBackToNewtonsoft).
- Stream opt-in + malformed JSON still throws a JSON exception (collapsed to
  the same family as the non-opt-in path; new
  Decrypt_StreamSelection_MalformedJson_FallsBackSafely asserts parity).
- Stream opt-in + unencrypted JSON still returns null DecryptionContext at
  position 0 (covered by pre-existing Decrypt_StreamSelection_FallbackWhenUnencrypted
  and new Decrypt_StreamSelection_MdeAlgorithm_RoundTrips).
- Stream opt-in + async-only stream + cancellation still throws
  OperationCanceledException via the streaming MDE adapter
  (regression: Decrypt_Cancellation_Aborts in StreamProcessorConcurrencyAndCancellationTests).
- Non-MemoryStream inputs route through ArrayPool<byte> (covered by new
  Decrypt_StreamSelection_NonMemoryStream_StillRoundTrips).

The detector and routing helper are gated behind NET8_0_OR_GREATER, matching
the gating already used for the Stream opt-in pathways in the project.

Coverage added:
- 44 direct unit tests for LegacyAlgorithmDetector in
  Transformation/LegacyAlgorithmDetectorTests.cs covering: empty / whitespace /
  non-object inputs, _ei shape variants (missing / nested / non-string _ea),
  _ea position variants (first / last / surrounded by sibling keys),
  JSON escape sequences in the algorithm string, case-sensitivity, prefix
  and suffix collisions, MDE exact-match enforcement, and malformed-JSON
  short-circuit semantics.
- 10 Stream-opt-in legacy round-trip integration tests in EncryptionProcessorTests
  that encrypt with the legacy algorithm via JsonProcessor.Newtonsoft then
  decrypt via the new Stream-opt-in detector path, asserting full property
  parity, DecryptionContext path lists, large documents, repeated round-trips,
  null sensitive values, non-MemoryStream wrappers, byte-for-byte parity with
  the Newtonsoft path, and unknown-algorithm error propagation.
- 37 strict-parity tests in EncryptionProcessorStreamParityTests.cs that
  run every input through BOTH the Newtonsoft path and the Stream-opt-in
  path and assert agreement on: thrown vs succeeded, exception family,
  output bytes (byte-for-byte), and DecryptionContext.PathsDecrypted.
  Sections: round-trip parity, legacy parity, edge-case payload parity
  (empty/null/non-string _ea, _ei shape variants, truncated/empty/whitespace
  JSON, deep nesting), input-stream variants (MemoryStream vs
  NonMemoryStreamWrapper), cancellation behaviour, repeated calls.
- All 285 tests pass on net8.0 (up from 194 on master, then 246 in the
  first iteration of this commit); zero warnings under
  TreatWarningsAsErrors=true.

Benchmark added:
- New LegacyDetectionBenchmark in the perf project with [MemoryDiagnoser]
  measures the new detector against an inlined replica of PR Azure#5902's
  try/catch baseline across {Legacy, MDE} x {1, 10, 100} KB documents.
- Measured (Job=MediumRun, InProcessEmit, 15 iterations x 2 launches):
    Legacy 1KB:    62 us /  45.3 KB  vs  146 us /  52.3 KB  (-58% time, -13% alloc)
    Legacy 10KB:  316 us / 215.4 KB  vs  401 us / 232.1 KB  (-21% time,  -7% alloc)
    Legacy 100KB: 2847 us / 1859 KB  vs 3266 us / 1974 KB   (-13% time,  -6% alloc)
    MDE 1KB:       25 us /  12.3 KB  vs   24 us /  12.3 KB  (neutral allocations)
    MDE 10KB:      60 us /  28.7 KB  vs   58 us /  28.7 KB  (neutral allocations)
    MDE 100KB:    677 us / 225.3 KB  vs  680 us / 225.2 KB  (neutral allocations)
  The detector is strictly equal-or-better on allocations in every configuration
  and removes the exception-throw + wasted MDE deserialize cost from the legacy
  path entirely.
- Made the existing EncryptionBenchmark loadable via BenchmarkDotNet by adding
  a public JsonProcessorOption surrogate enum (BDN refuses [Params] on
  non-public members, blocking the whole perf assembly from loading on master).
  Program.cs now uses BenchmarkSwitcher so individual benchmarks can be
  selected with --filter.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ure#5902 baseline

Drops the LegacyDetectionBenchmark added in d716418 along with the perf-project
scaffolding (JsonProcessorOption surrogate enum, BenchmarkSwitcher in Program.cs,
EncryptionBenchmark refactor) that existed solely to host it. The benchmark
embedded an inlined copy of the PR Azure#5902 try/catch baseline as a stable
comparison anchor, which left a duplicate of pre-refactor production logic
living in the test tree. Removing the benchmark eliminates the inlined legacy
code entirely.

The benchmark numbers it produced are preserved in the parent commit message
for review purposes.

All 285 tests still pass on net8.0; perf project still builds cleanly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
adamnova and others added 9 commits May 25, 2026 23:38
…-opt-in path

Triples-down on the parity guarantee by adding 36 new edge-case tests

informed by three independent adversary code-review passes.

Adds (LegacyAlgorithmDetectorTests, +19 tests):

  * top-level null/true/false literals

  * duplicate _ei / _ea at root (first-wins semantics)

  * UTF-8 BOM prefix

  * decoy root-level _ea before real _ei

  * leading/trailing/internal whitespace and tab in algorithm string

  * near-miss property names (_eaa, __ea)

  * invalid UTF-8 bytes

  * MDE algorithm all-uppercase (not matched)

  * MDE algorithm with first three chars escaped (matched)

  * 1000-properties-before-_ei stress

  * 1 MB _ed before _ea stress

  * idempotent repeated calls

  * nested _ei inside _ed (outer wins)

Adds (EncryptionProcessorStreamParityTests, +10 tests):

  * duplicate _ei at root parity

  * decoy root-level _ea parity

  * UTF-8 BOM prefix parity

  * non-ASCII property values (Japanese, Chinese, emoji)

  * algorithm name case variants (legacy upper/lower, MDE upper)

  * leading/trailing whitespace in algorithm name

  * legacy-encrypted stream starting at non-zero Position

  * enhanced helper now asserts DecryptionInfoList.Count, per-entry

    DataEncryptionKeyId, and ordered PathsDecrypted

Adds (EncryptionProcessorTests, +7 tests):

  * Properties == null falls back to non-opt-in path

  * JsonProcessor.Newtonsoft override falls back to non-opt-in path

  * MDE fast path does NOT mutate RequestOptions.Properties

  * strip-restore fallthrough restores Properties after call

  * 32 concurrent decrypts on shared RequestOptions all round-trip

  * unencrypted document returns stream at Position 0

  * pre-cancelled token: both paths agree on succeed/throw

Two adversary-audit candidates were intentionally NOT added because they

test divergences between the underlying Newtonsoft and STJ JSON adapters

rather than router behaviour — both divergences exist on master for any

caller that toggles the Stream override and are out of scope for this PR.

See the inline comment in EncryptionProcessorStreamParityTests.cs for

the full rationale.

All 322 tests pass on net8.0 (was 285). netstandard2.0 build clean.

Zero warnings under TreatWarningsAsErrors=true.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ion-type contract

The JsonProcessor.Stream opt-in path routes decrypt through
SystemTextJsonStreamAdapter. For malformed inputs the streaming adapter
surfaces System.Text.Json.JsonException where the default Newtonsoft path
surfaces Newtonsoft.Json.JsonException or System.FormatException. Both
paths reject the same set of inputs - only the exception type differs
because the two adapters fail at different layers. This is a property of
the underlying adapter selection (pre-exists since JsonProcessor.Stream
was introduced), not of this PR.

Documenting in three places to make the contract discoverable for
adopters and SDK contributors:

- Microsoft.Azure.Cosmos.Encryption.Custom/changelog.md: new `Unreleased`
  section calling out the optimization and the observable exception-type
  difference for Stream-opt-in adopters.
- src/JsonProcessor.cs: `<remarks>` block on the `Stream` enum value
  spelling out the exception-type contract vs `Newtonsoft`.
- src/JsonProcessorRequestOptionsExtensions.cs: `<remarks>` block on the
  `JsonProcessorPropertyBagKey` constant - this is the only surface most
  adopters touch, because the enum itself is internal and they reach it
  via `RequestOptions.Properties[�ncryption-json-processor]`.

Avoids cref to `JsonProcessor.Stream` from the netstandard2.0-visible
constant (Stream is NET8+ only) - documents the member by name in
`<c>` instead so the doc compiles for all target frameworks.

No production-code change; no test change.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Builds the full (encrypt-adapter x decrypt-adapter x algorithm) matrix on
top of the existing parity sweep. The earlier sweep only exercised
Newtonsoft-encrypted documents; this fills in the missing legs and
asserts that any document encrypted by one adapter is decryptable by the
other.

Coverage (NET8 only, 31 active tests + 1 [Ignore] regression-tracker):

- Per-shape round-trips for all 4 MDE combinations
  (Newtonsoft/Stream encrypt x NewtonsoftDefault/StreamOptIn decrypt)
  across 6 document shapes: standard, null sensitive string, empty dict,
  large payload, unicode/emoji, numeric extremes.
- Convergence: all 4 combinations recover the same plaintext bytes from
  the same source document.
- Encrypter-semantic equivalence: root-key set, encrypted property byte
  values, and _ei body shape agree between the Newtonsoft and Stream
  encrypters (relies on the deterministic test mock encryptor).
- Legacy AE-AES contracts: Encrypt(Stream) throws NotSupportedException;
  Encrypt(Newtonsoft) round-trips via both decrypt paths (including the
  Stream-opt-in fallback to DecryptViaJObjectPeekAsync).
- Stability under repeated encrypt/decrypt cycles.

The [Ignore]-flagged Mde_StreamEncrypt_DictWithJsonEscapedChars_KnownLimitation
test documents a pre-existing master bug (verified on commit 79d18b7):
the Stream encrypter double-escapes JSON metacharacters (\" or \) when
they appear inside a dictionary that is a whole-property encryption
target. The recovered key/value gains an extra backslash. This is in the
encrypt path, NOT touched by PR Azure#5903 (which only optimizes decrypt
routing). The Unicode/emoji round-trip test is intentionally narrowed to
pure-ASCII dict keys/values to avoid this pre-existing bug.

Test results:
- 31/31 cross-adapter tests pass on net8.0; 1 skipped.
- Full encryption test suite: 353/354 pass on net8.0; 0 failures.
- src builds clean on both net8.0 and netstandard2.0; 0 warnings.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Companion to the cross-adapter encrypt/decrypt matrix added in
0f700e0. The matrix tests enforce structural coverage; these probes
hunt for behavioural inconsistencies between the Newtonsoft and Stream
encrypt/decrypt adapters along known JSON fault lines.

Coverage (NET8 only, 37 active tests + 1 [Ignore] regression-tracker):

- Input-JSON shape invariance: property ordering, whitespace
  (compact vs pretty-printed), and unicode escapes (\uXXXX vs literal)
  in source JSON must not change the recovered plaintext.
- Numeric edge values: int.MinValue, int.MaxValue, 0, -1 round-trip
  through every encrypt-adapter x decrypt-adapter combination.
- Degenerate containers: empty SensitiveArr and empty SensitiveDict
  round-trip as empty containers (not nulls) under every combination.
- No-sensitive-paths contract: both adapters agree on the empty-result
  shape they emit (_ei present, _ep:[], _ed:null) when the source has
  no sensitive paths configured.
- Plaintext decrypt is a no-op via both decrypt paths.
- Encrypt determinism: same input -> same ciphertext for both adapters
  under the deterministic mock encryptor (b => b + 1).
- Adapter-switched cycles: Newtonsoft encrypt -> Stream decrypt ->
  Stream encrypt -> Newtonsoft decrypt recovers the original.
- Cross-adapter concurrency: 32 concurrent workers mixing Newtonsoft
  and Stream encrypt/decrypt on the same Encryptor; every worker's
  round-trip recovers its per-worker input.

New finding (pre-existing master bug, verified on commit 79d18b7):

The Stream encrypter does NOT decode JSON \uXXXX escapes when
serialising values inside an array or dictionary that is a
whole-property encryption target. The recovered string contains the
literal source bytes (e.g. \u0041 round-trips as the 6-char string
'\u0041' instead of 'A'). Top-level string fields are handled
correctly; the bug is specific to nested container elements.

This is the same root cause as the previously-documented dict-key
double-escape bug: the Stream encrypter copies source bytes verbatim
for nested-container encryption targets instead of normalising through
a canonical JSON writer. Captured as
StreamEncrypt_UnicodeEscapesInArrayOrDict_KnownLimitation with [Ignore]
and a citation to master.

Test results:
- 37/38 adversarial tests pass on net8.0; 1 skipped (regression-tracker).
- Full encryption suite: 390/392 pass on net8.0; 0 failures; 2 skipped
  (both [Ignore] regression-trackers for pre-existing Stream-encrypter
  bugs surfaced by this PR's cross-adapter testing).
- src builds clean on both net8.0 and netstandard2.0; 0 warnings.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…t limitations

The cross-adapter parity tests added in 0f700e0 and 30c5eef surfaced
two pre-existing JsonProcessor.Stream encrypt-side limitations that are
not introduced by this PR but are now executable as [Ignore]-flagged
regression-tracker tests. Adds a section to the Unreleased changelog
entry so callers using JsonProcessor.Stream for encrypt operations can
discover these limitations without reading test source.

Both limitations share a single root cause: the Stream encrypter copies
source bytes verbatim for whole-property encryption targets that are
arrays or dictionaries, rather than normalising through a canonical JSON
writer. The Newtonsoft encrypter handles both cases correctly, and
either decrypt path reads the resulting documents.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extends the cross-adapter parity matrix established in 0f700e0 / 30c5eef
with four further fault lines that the existing tests did not exercise:

1. PathsToEncrypt edge cases - same document with ascending vs. reversed
   PathsToEncrypt round-trips identically; configured paths missing from the
   source are not invented and not listed in _ep; all-missing configured
   paths still emit an empty _ep array correctly.

2. _ei wire-format parity - Newtonsoft and Stream encrypters agree on the
   field set, the _ef/_en/_ea scalars, AND the _ed byte content (with the
   deterministic mock encryptor). Same input through the same adapter also
   produces a stable _ei key order across encrypt calls (downstream
   signature-verification dependency).

3. Null-vs-missing distinction - an explicit null SensitiveStr round-trips
   as null; a missing SensitiveStr stays missing (not invented as null) -
   verified across all four (encrypt-adapter x decrypt-adapter) combinations.

4. Large-data scaling - 256 KB SensitiveStr and 5,000-entry SensitiveDict
   round-trip intact across all four combinations.

26 tests total (NET8 only). All pass on first run; no new pre-existing
master bug surfaced this round. Two known limitations tracked previously
(commits 0f700e0 / 30c5eef) remain the only [Ignore]-flagged regressions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…n on Stream-opt-in decrypt

Dispatched 5 GPT-5.5 audit subagents in parallel to find further inconsistencies
in the Stream-opt-in decrypt path (concurrency, diagnostics, algorithm routing,
buffer lifecycle, version-skew). Two of the audits surfaced PR-introduced bugs
that this commit fixes; the audits also documented two pre-existing master
adapter divergences that are now tracked.

Two PR-introduced bug fixes (EncryptionProcessor.cs, MdeEncryptionProcessor.cs)

1. ArrayPool buffer return without clearArray:true (hygiene/security).
   The detector buffer rented from ArrayPool<byte>.Shared for non-MemoryStream
   inputs was returned without clearing. Every other pool consumer in the
   package (ArrayPoolManager:33, JsonArrayPool:25, RentArrayBufferWriter:149,
   RentArrayBufferWriter:204) uses clearArray:true. The detector reads
   plaintext document bytes including encrypted payloads, so the previous
   return would have left those bytes visible to subsequent tenants of the
   shared pool. Fix: pass clearArray:true at the Return call site, with a
   comment citing the package-wide convention.

2. RequestOptions.Properties mutation under await (concurrency).
   The Stream-opt-in legacy-fallthrough temporarily reassigned
   requestOptions.Properties to a stripped copy before awaiting the
   Newtonsoft-routed MDE decrypt, then restored in finally. Under
   concurrency, another decrypt call sharing the same ItemRequestOptions
   could observe the stripped state during the await window and route to
   the wrong adapter. Fix: added internal MdeEncryptionProcessor.DecryptAsync
   overload that takes a JsonProcessor argument directly, and call it with
   JsonProcessor.Newtonsoft from the fallthrough site instead of mutating
   caller-owned state. The public RequestOptions-taking overload now
   delegates to the new internal one after calling GetRequestedJsonProcessor.

New audit-findings test file
(CrossAdapterAuditFindingsTests.cs, NET8 only, 15 tests)

Sections:
- Concurrency (2 tests): RequestOptions.Properties postcondition after legacy
  decrypt + 32-parallel stress on shared RequestOptions instance.
- Output-shape parity (1 DataRow-driven test, 2 rows): adapters agree on
  output stream concrete type + Length for MDE and unencrypted inputs.
- Wire-format adversarial (4 DataRow-driven tests, 9 rows total): _ef=999
  unknown future format, _ef as non-integer / missing values
  ([bad, true, null, missing]), unknown extra fields under _ei, _ep
  referencing paths missing from the document.
- Pre-existing master divergences ([Ignore], 2 tests): see below.

Two newly-tracked pre-existing master adapter divergences
(both predate this PR and live in the adapters themselves, not the detector;
verified on origin/master at commit 79d18b7 via worktree probe)

3. _ei._ef as JSON-numeric-string ("3") vs JSON number (3).
   Newtonsoft decrypt silently coerces "3" to int 3 via JsonSerializer and
   decrypts. Stream decrypt strictly rejects the type mismatch and throws
   System.Text.Json.JsonException. Captured as
   WireFormat_EfAsStringTypedThree_AdapterDivergence_KnownLimitation.

4. Input stream not disposed on successful MDE decrypt via Stream adapter.
   NewtonsoftAdapter.DecryptAsync disposes the input stream before returning.
   SystemTextJsonStreamAdapter.DecryptAsync does not. Captured as
   StreamOptIn_MdeDecrypt_InputDisposalMismatchWithNewtonsoft_KnownLimitation.

Both [Ignore] reasons cite master commit 79d18b7.

Changelog (changelog.md)

Unreleased section expanded with: this PR's two fixes (with documented
observable behaviour), and the two new pre-existing-divergence entries
alongside the existing exception-type contract and Stream-encrypt
limitations.

Verification

- src builds clean on net8.0 and netstandard2.0; 0 warnings / 0 errors.
- Full test suite on net8.0: 429 passed, 4 skipped (the 4 [Ignore]
  regression-trackers across CrossAdapterAuditFindingsTests,
  CrossAdapterEncryptDecryptParityTests, and CrossAdapterAdversarialParityTests),
  0 failed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…or chunked-stream regressions

Five production-source fixes plus regression-tracker tests, driven by a 5-agent GPT-5.5 audit
of PR Azure#5903 (encoding-edge-cases / numeric-fidelity / chunked-stream-parity / blob-roundtrip /
deserialization-strictness, dispatched in parallel against the worktree).

1. EncryptionProcessor.TryDetectAlgorithm: short-circuits to Unknown for any non-MemoryStream
   input. The original detector pre-buffered the entire payload synchronously into a rented
   ArrayPool buffer (up to int.MaxValue bytes) before classifying, defeating the streaming
   property of large FileStream/NetworkStream inputs and risking OOM on multi-MB documents.
   It also called .Position / .Length on streams that may not support them. The fast path
   remains for the common Cosmos SDK MemoryStream case; non-MemoryStream inputs route through
   the original JObject-peek path, which handles them via incremental StreamReader /
   JsonTextReader reads. Three new regression tests in CrossAdapterAuditFindingsTests cover
   non-MemoryStream MDE round-trip, non-MemoryStream legacy round-trip, and a metadata-
   throwing wrapper that proves the detector does not touch .Length / .Position.

2. EncryptionProperties.EncryptionFormatVersion: annotated with
   [JsonNumberHandling(JsonNumberHandling.AllowReadingFromString)] on net8.0 so the
   System.Text.Json deserializer accepts _ei._ef = "3" (numeric string) identically to
   Newtonsoft's permissive coercion. Documents written with the string form now decrypt on
   both adapters.

3. SystemTextJsonStreamAdapter.DecryptAsync(Stream, Encryptor, ...): now disposes the
   caller's input stream on successful decrypt, matching NewtonsoftAdapter.DecryptAsync's
   documented stream-ownership contract.

4. StreamProcessor encrypt path: re-encodes JsonTokenType.PropertyName and
   JsonTokenType.String values through Utf8JsonWriter's canonical escape handling when
   reader.ValueIsEscaped is true. Previously the raw ValueSpan was copied verbatim, causing
   uXXXX escapes and escaped metacharacters in property names / top-level string values to
   be re-emitted as doubled escape sequences on the wire.

5. CrossAdapterAuditFindingsTests: 2 [Ignore] regression-trackers converted to positive
   assertions; 3 new chunked-stream tests added (non-MemoryStream MDE, non-MemoryStream
   legacy, ThrowOnMetadataAccessStream).

Changelog updated to document the 5 fixes and the remaining pre-existing master divergences
(non-finite numbers, BigInteger > Int64, UTF-8 BOM, lone surrogates, decimal exponent lexical
form) that audit subagents surfaced but that are out of scope for this PR.

Test results: 437 passed / 0 failed / 0 skipped on net8.0 (up from 429 passed + 4 [Ignore]).
src builds clean on both net8.0 and netstandard2.0 (0 warnings, 0 errors).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…known limitations

Fifth GPT-5.5 audit subagent (deserialization-strictness-aud) surfaced 6
asymmetries between the Newtonsoft and System.Text.Json adapters in how
they handle malformed/tampered `_ei` shapes. Triaged:

- Findings 4, 5 are masked at the public API by the JObject-fallback
  routing (detector returns Unknown, request routes through the
  Newtonsoft adapter), so they are not observable from real callers.
- Findings 1, 2, 3, 6 are reachable on the Stream-opt-in fast path
  because the detector routes valid-shape `_ea` strings directly to
  the streaming adapter.

None of the reachable findings is producible by `EncryptionProcessor.EncryptAsync`
(real encrypters always write canonical wire-format values), so no real
round-trip is affected. They only fire on tampered or hand-crafted
documents.

Closing each fully would require either tightening the Newtonsoft path
(breaking change for documents in production) or attaching a custom
JsonConverter that defeats the streaming optimisation. Both are out of
scope for this PR.

Documented in two places so callers and future maintainers can discover
the divergences without reading test source:

- changelog.md: extended the existing 'Known pre-existing JsonProcessor.Stream
  decrypt adapter divergences' section with five new bullets covering
  whitespace-padded `_ef`, non-string `_en`, non-string `_ep` entries,
  and duplicate `_ei` keys. Closing note clarifies real-encrypter output
  is unaffected.
- CrossAdapterAuditFindingsTests.cs: four `[Ignore]`-flagged regression-tracker
  tests with `[DataRow]` variants and detailed XML doc comments explaining
  why each divergence exists and why it is not fixed.

Test count: 437 passed, 4 skipped (new regression-trackers), 0 failed.
Source build clean on net8.0 and netstandard2.0.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant