Encryption.Custom: Refactors Stream-opt-in legacy detection to Utf8JsonReader peek#5903
Draft
adamnova wants to merge 11 commits into
Draft
Encryption.Custom: Refactors Stream-opt-in legacy detection to Utf8JsonReader peek#5903adamnova wants to merge 11 commits into
adamnova wants to merge 11 commits into
Conversation
…onReader peek PR Azure#5902 added a fast path for callers that opt into JsonProcessor.Stream by wrapping MdeEncryptionProcessor.DecryptAsync in try/catch(NotSupportedException) to detect legacy AE-AES documents. That detector is exception-driven and, more importantly, SystemTextJsonStreamAdapter.ReadMdeEncryptionPropertiesStreamingAsync performs a full JsonSerializer.DeserializeAsync<EncryptionPropertiesWrapper> over the entire stream (including base64-decoding the _ed payload) *before* throwing NotSupportedException - so the "fast path" allocates the encrypted payload only to discard it when the document is legacy. Legacy documents are then re-parsed via JObject on the rewound stream, paying for two full parses. This change replaces the try/catch with a purpose-built Utf8JsonReader scan (LegacyAlgorithmDetector) that classifies a document as NotEncrypted / MdeAlgorithm / LegacyAlgorithm / Unknown by reading only the top-level _ei._ea property, without allocating either a JObject or an EncryptionPropertiesWrapper. The detector is wired in only for the Stream opt-in path; non-opt-in callers continue to use the original Newtonsoft JObject peek (extracted verbatim into DecryptViaJObjectPeekAsync) with byte-identical behavior. Strict behavioural parity for Stream-opt-in callers --------------------------------------------------- The Stream-opt-in path is now byte-for-byte equivalent to the non-opt-in path on every input that does not match the MDE fast-path classification. Two correctness fixes were required to land that guarantee: 1. LegacyAlgorithmDetector now exact-matches BOTH the legacy and MDE algorithm names. Empty strings, unknown algorithms, case variants, prefixes, and suffixes all classify as Unknown (not MdeAlgorithm), routing them through the permissive JObject peek path instead of straight into the streaming MDE adapter (which would have thrown NotSupportedException / JsonException on shapes the Newtonsoft path handles gracefully). 2. DecryptViaJObjectPeekAsync gained a stripStreamOverrideOnSuccessFallthrough parameter. When set (only the Stream-opt-in router sets it), the JObject path's post-peek MDE fallthrough swaps the caller's Properties dictionary for one with the JsonProcessor.Stream override removed - but only when the JObject parse SUCCEEDED synchronously. If the parse threw (async-only stream, malformed JSON), the original requestOptions is preserved so MDE's streaming adapter still gets a chance at the stream (preserves the pre-existing Decrypt_Cancellation_Aborts contract on SlowCancelableStream). The Properties reference is restored in a finally block; observable state on the caller's RequestOptions is unchanged after the call returns. Behavior contract preserved: - Stream opt-in + legacy AE-AES documents still transparently fall back to the Newtonsoft path (regression: Decrypt_StreamSelection_LegacyAlgorithm_FallsBackToNewtonsoft). - Stream opt-in + malformed JSON still throws a JSON exception (collapsed to the same family as the non-opt-in path; new Decrypt_StreamSelection_MalformedJson_FallsBackSafely asserts parity). - Stream opt-in + unencrypted JSON still returns null DecryptionContext at position 0 (covered by pre-existing Decrypt_StreamSelection_FallbackWhenUnencrypted and new Decrypt_StreamSelection_MdeAlgorithm_RoundTrips). - Stream opt-in + async-only stream + cancellation still throws OperationCanceledException via the streaming MDE adapter (regression: Decrypt_Cancellation_Aborts in StreamProcessorConcurrencyAndCancellationTests). - Non-MemoryStream inputs route through ArrayPool<byte> (covered by new Decrypt_StreamSelection_NonMemoryStream_StillRoundTrips). The detector and routing helper are gated behind NET8_0_OR_GREATER, matching the gating already used for the Stream opt-in pathways in the project. Coverage added: - 44 direct unit tests for LegacyAlgorithmDetector in Transformation/LegacyAlgorithmDetectorTests.cs covering: empty / whitespace / non-object inputs, _ei shape variants (missing / nested / non-string _ea), _ea position variants (first / last / surrounded by sibling keys), JSON escape sequences in the algorithm string, case-sensitivity, prefix and suffix collisions, MDE exact-match enforcement, and malformed-JSON short-circuit semantics. - 10 Stream-opt-in legacy round-trip integration tests in EncryptionProcessorTests that encrypt with the legacy algorithm via JsonProcessor.Newtonsoft then decrypt via the new Stream-opt-in detector path, asserting full property parity, DecryptionContext path lists, large documents, repeated round-trips, null sensitive values, non-MemoryStream wrappers, byte-for-byte parity with the Newtonsoft path, and unknown-algorithm error propagation. - 37 strict-parity tests in EncryptionProcessorStreamParityTests.cs that run every input through BOTH the Newtonsoft path and the Stream-opt-in path and assert agreement on: thrown vs succeeded, exception family, output bytes (byte-for-byte), and DecryptionContext.PathsDecrypted. Sections: round-trip parity, legacy parity, edge-case payload parity (empty/null/non-string _ea, _ei shape variants, truncated/empty/whitespace JSON, deep nesting), input-stream variants (MemoryStream vs NonMemoryStreamWrapper), cancellation behaviour, repeated calls. - All 285 tests pass on net8.0 (up from 194 on master, then 246 in the first iteration of this commit); zero warnings under TreatWarningsAsErrors=true. Benchmark added: - New LegacyDetectionBenchmark in the perf project with [MemoryDiagnoser] measures the new detector against an inlined replica of PR Azure#5902's try/catch baseline across {Legacy, MDE} x {1, 10, 100} KB documents. - Measured (Job=MediumRun, InProcessEmit, 15 iterations x 2 launches): Legacy 1KB: 62 us / 45.3 KB vs 146 us / 52.3 KB (-58% time, -13% alloc) Legacy 10KB: 316 us / 215.4 KB vs 401 us / 232.1 KB (-21% time, -7% alloc) Legacy 100KB: 2847 us / 1859 KB vs 3266 us / 1974 KB (-13% time, -6% alloc) MDE 1KB: 25 us / 12.3 KB vs 24 us / 12.3 KB (neutral allocations) MDE 10KB: 60 us / 28.7 KB vs 58 us / 28.7 KB (neutral allocations) MDE 100KB: 677 us / 225.3 KB vs 680 us / 225.2 KB (neutral allocations) The detector is strictly equal-or-better on allocations in every configuration and removes the exception-throw + wasted MDE deserialize cost from the legacy path entirely. - Made the existing EncryptionBenchmark loadable via BenchmarkDotNet by adding a public JsonProcessorOption surrogate enum (BDN refuses [Params] on non-public members, blocking the whole perf assembly from loading on master). Program.cs now uses BenchmarkSwitcher so individual benchmarks can be selected with --filter. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ure#5902 baseline Drops the LegacyDetectionBenchmark added in d716418 along with the perf-project scaffolding (JsonProcessorOption surrogate enum, BenchmarkSwitcher in Program.cs, EncryptionBenchmark refactor) that existed solely to host it. The benchmark embedded an inlined copy of the PR Azure#5902 try/catch baseline as a stable comparison anchor, which left a duplicate of pre-refactor production logic living in the test tree. Removing the benchmark eliminates the inlined legacy code entirely. The benchmark numbers it produced are preserved in the parent commit message for review purposes. All 285 tests still pass on net8.0; perf project still builds cleanly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…-opt-in path
Triples-down on the parity guarantee by adding 36 new edge-case tests
informed by three independent adversary code-review passes.
Adds (LegacyAlgorithmDetectorTests, +19 tests):
* top-level null/true/false literals
* duplicate _ei / _ea at root (first-wins semantics)
* UTF-8 BOM prefix
* decoy root-level _ea before real _ei
* leading/trailing/internal whitespace and tab in algorithm string
* near-miss property names (_eaa, __ea)
* invalid UTF-8 bytes
* MDE algorithm all-uppercase (not matched)
* MDE algorithm with first three chars escaped (matched)
* 1000-properties-before-_ei stress
* 1 MB _ed before _ea stress
* idempotent repeated calls
* nested _ei inside _ed (outer wins)
Adds (EncryptionProcessorStreamParityTests, +10 tests):
* duplicate _ei at root parity
* decoy root-level _ea parity
* UTF-8 BOM prefix parity
* non-ASCII property values (Japanese, Chinese, emoji)
* algorithm name case variants (legacy upper/lower, MDE upper)
* leading/trailing whitespace in algorithm name
* legacy-encrypted stream starting at non-zero Position
* enhanced helper now asserts DecryptionInfoList.Count, per-entry
DataEncryptionKeyId, and ordered PathsDecrypted
Adds (EncryptionProcessorTests, +7 tests):
* Properties == null falls back to non-opt-in path
* JsonProcessor.Newtonsoft override falls back to non-opt-in path
* MDE fast path does NOT mutate RequestOptions.Properties
* strip-restore fallthrough restores Properties after call
* 32 concurrent decrypts on shared RequestOptions all round-trip
* unencrypted document returns stream at Position 0
* pre-cancelled token: both paths agree on succeed/throw
Two adversary-audit candidates were intentionally NOT added because they
test divergences between the underlying Newtonsoft and STJ JSON adapters
rather than router behaviour — both divergences exist on master for any
caller that toggles the Stream override and are out of scope for this PR.
See the inline comment in EncryptionProcessorStreamParityTests.cs for
the full rationale.
All 322 tests pass on net8.0 (was 285). netstandard2.0 build clean.
Zero warnings under TreatWarningsAsErrors=true.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ion-type contract The JsonProcessor.Stream opt-in path routes decrypt through SystemTextJsonStreamAdapter. For malformed inputs the streaming adapter surfaces System.Text.Json.JsonException where the default Newtonsoft path surfaces Newtonsoft.Json.JsonException or System.FormatException. Both paths reject the same set of inputs - only the exception type differs because the two adapters fail at different layers. This is a property of the underlying adapter selection (pre-exists since JsonProcessor.Stream was introduced), not of this PR. Documenting in three places to make the contract discoverable for adopters and SDK contributors: - Microsoft.Azure.Cosmos.Encryption.Custom/changelog.md: new `Unreleased` section calling out the optimization and the observable exception-type difference for Stream-opt-in adopters. - src/JsonProcessor.cs: `<remarks>` block on the `Stream` enum value spelling out the exception-type contract vs `Newtonsoft`. - src/JsonProcessorRequestOptionsExtensions.cs: `<remarks>` block on the `JsonProcessorPropertyBagKey` constant - this is the only surface most adopters touch, because the enum itself is internal and they reach it via `RequestOptions.Properties[�ncryption-json-processor]`. Avoids cref to `JsonProcessor.Stream` from the netstandard2.0-visible constant (Stream is NET8+ only) - documents the member by name in `<c>` instead so the doc compiles for all target frameworks. No production-code change; no test change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Builds the full (encrypt-adapter x decrypt-adapter x algorithm) matrix on top of the existing parity sweep. The earlier sweep only exercised Newtonsoft-encrypted documents; this fills in the missing legs and asserts that any document encrypted by one adapter is decryptable by the other. Coverage (NET8 only, 31 active tests + 1 [Ignore] regression-tracker): - Per-shape round-trips for all 4 MDE combinations (Newtonsoft/Stream encrypt x NewtonsoftDefault/StreamOptIn decrypt) across 6 document shapes: standard, null sensitive string, empty dict, large payload, unicode/emoji, numeric extremes. - Convergence: all 4 combinations recover the same plaintext bytes from the same source document. - Encrypter-semantic equivalence: root-key set, encrypted property byte values, and _ei body shape agree between the Newtonsoft and Stream encrypters (relies on the deterministic test mock encryptor). - Legacy AE-AES contracts: Encrypt(Stream) throws NotSupportedException; Encrypt(Newtonsoft) round-trips via both decrypt paths (including the Stream-opt-in fallback to DecryptViaJObjectPeekAsync). - Stability under repeated encrypt/decrypt cycles. The [Ignore]-flagged Mde_StreamEncrypt_DictWithJsonEscapedChars_KnownLimitation test documents a pre-existing master bug (verified on commit 79d18b7): the Stream encrypter double-escapes JSON metacharacters (\" or \) when they appear inside a dictionary that is a whole-property encryption target. The recovered key/value gains an extra backslash. This is in the encrypt path, NOT touched by PR Azure#5903 (which only optimizes decrypt routing). The Unicode/emoji round-trip test is intentionally narrowed to pure-ASCII dict keys/values to avoid this pre-existing bug. Test results: - 31/31 cross-adapter tests pass on net8.0; 1 skipped. - Full encryption test suite: 353/354 pass on net8.0; 0 failures. - src builds clean on both net8.0 and netstandard2.0; 0 warnings. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Companion to the cross-adapter encrypt/decrypt matrix added in 0f700e0. The matrix tests enforce structural coverage; these probes hunt for behavioural inconsistencies between the Newtonsoft and Stream encrypt/decrypt adapters along known JSON fault lines. Coverage (NET8 only, 37 active tests + 1 [Ignore] regression-tracker): - Input-JSON shape invariance: property ordering, whitespace (compact vs pretty-printed), and unicode escapes (\uXXXX vs literal) in source JSON must not change the recovered plaintext. - Numeric edge values: int.MinValue, int.MaxValue, 0, -1 round-trip through every encrypt-adapter x decrypt-adapter combination. - Degenerate containers: empty SensitiveArr and empty SensitiveDict round-trip as empty containers (not nulls) under every combination. - No-sensitive-paths contract: both adapters agree on the empty-result shape they emit (_ei present, _ep:[], _ed:null) when the source has no sensitive paths configured. - Plaintext decrypt is a no-op via both decrypt paths. - Encrypt determinism: same input -> same ciphertext for both adapters under the deterministic mock encryptor (b => b + 1). - Adapter-switched cycles: Newtonsoft encrypt -> Stream decrypt -> Stream encrypt -> Newtonsoft decrypt recovers the original. - Cross-adapter concurrency: 32 concurrent workers mixing Newtonsoft and Stream encrypt/decrypt on the same Encryptor; every worker's round-trip recovers its per-worker input. New finding (pre-existing master bug, verified on commit 79d18b7): The Stream encrypter does NOT decode JSON \uXXXX escapes when serialising values inside an array or dictionary that is a whole-property encryption target. The recovered string contains the literal source bytes (e.g. \u0041 round-trips as the 6-char string '\u0041' instead of 'A'). Top-level string fields are handled correctly; the bug is specific to nested container elements. This is the same root cause as the previously-documented dict-key double-escape bug: the Stream encrypter copies source bytes verbatim for nested-container encryption targets instead of normalising through a canonical JSON writer. Captured as StreamEncrypt_UnicodeEscapesInArrayOrDict_KnownLimitation with [Ignore] and a citation to master. Test results: - 37/38 adversarial tests pass on net8.0; 1 skipped (regression-tracker). - Full encryption suite: 390/392 pass on net8.0; 0 failures; 2 skipped (both [Ignore] regression-trackers for pre-existing Stream-encrypter bugs surfaced by this PR's cross-adapter testing). - src builds clean on both net8.0 and netstandard2.0; 0 warnings. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…t limitations The cross-adapter parity tests added in 0f700e0 and 30c5eef surfaced two pre-existing JsonProcessor.Stream encrypt-side limitations that are not introduced by this PR but are now executable as [Ignore]-flagged regression-tracker tests. Adds a section to the Unreleased changelog entry so callers using JsonProcessor.Stream for encrypt operations can discover these limitations without reading test source. Both limitations share a single root cause: the Stream encrypter copies source bytes verbatim for whole-property encryption targets that are arrays or dictionaries, rather than normalising through a canonical JSON writer. The Newtonsoft encrypter handles both cases correctly, and either decrypt path reads the resulting documents. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extends the cross-adapter parity matrix established in 0f700e0 / 30c5eef with four further fault lines that the existing tests did not exercise: 1. PathsToEncrypt edge cases - same document with ascending vs. reversed PathsToEncrypt round-trips identically; configured paths missing from the source are not invented and not listed in _ep; all-missing configured paths still emit an empty _ep array correctly. 2. _ei wire-format parity - Newtonsoft and Stream encrypters agree on the field set, the _ef/_en/_ea scalars, AND the _ed byte content (with the deterministic mock encryptor). Same input through the same adapter also produces a stable _ei key order across encrypt calls (downstream signature-verification dependency). 3. Null-vs-missing distinction - an explicit null SensitiveStr round-trips as null; a missing SensitiveStr stays missing (not invented as null) - verified across all four (encrypt-adapter x decrypt-adapter) combinations. 4. Large-data scaling - 256 KB SensitiveStr and 5,000-entry SensitiveDict round-trip intact across all four combinations. 26 tests total (NET8 only). All pass on first run; no new pre-existing master bug surfaced this round. Two known limitations tracked previously (commits 0f700e0 / 30c5eef) remain the only [Ignore]-flagged regressions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…n on Stream-opt-in decrypt Dispatched 5 GPT-5.5 audit subagents in parallel to find further inconsistencies in the Stream-opt-in decrypt path (concurrency, diagnostics, algorithm routing, buffer lifecycle, version-skew). Two of the audits surfaced PR-introduced bugs that this commit fixes; the audits also documented two pre-existing master adapter divergences that are now tracked. Two PR-introduced bug fixes (EncryptionProcessor.cs, MdeEncryptionProcessor.cs) 1. ArrayPool buffer return without clearArray:true (hygiene/security). The detector buffer rented from ArrayPool<byte>.Shared for non-MemoryStream inputs was returned without clearing. Every other pool consumer in the package (ArrayPoolManager:33, JsonArrayPool:25, RentArrayBufferWriter:149, RentArrayBufferWriter:204) uses clearArray:true. The detector reads plaintext document bytes including encrypted payloads, so the previous return would have left those bytes visible to subsequent tenants of the shared pool. Fix: pass clearArray:true at the Return call site, with a comment citing the package-wide convention. 2. RequestOptions.Properties mutation under await (concurrency). The Stream-opt-in legacy-fallthrough temporarily reassigned requestOptions.Properties to a stripped copy before awaiting the Newtonsoft-routed MDE decrypt, then restored in finally. Under concurrency, another decrypt call sharing the same ItemRequestOptions could observe the stripped state during the await window and route to the wrong adapter. Fix: added internal MdeEncryptionProcessor.DecryptAsync overload that takes a JsonProcessor argument directly, and call it with JsonProcessor.Newtonsoft from the fallthrough site instead of mutating caller-owned state. The public RequestOptions-taking overload now delegates to the new internal one after calling GetRequestedJsonProcessor. New audit-findings test file (CrossAdapterAuditFindingsTests.cs, NET8 only, 15 tests) Sections: - Concurrency (2 tests): RequestOptions.Properties postcondition after legacy decrypt + 32-parallel stress on shared RequestOptions instance. - Output-shape parity (1 DataRow-driven test, 2 rows): adapters agree on output stream concrete type + Length for MDE and unencrypted inputs. - Wire-format adversarial (4 DataRow-driven tests, 9 rows total): _ef=999 unknown future format, _ef as non-integer / missing values ([bad, true, null, missing]), unknown extra fields under _ei, _ep referencing paths missing from the document. - Pre-existing master divergences ([Ignore], 2 tests): see below. Two newly-tracked pre-existing master adapter divergences (both predate this PR and live in the adapters themselves, not the detector; verified on origin/master at commit 79d18b7 via worktree probe) 3. _ei._ef as JSON-numeric-string ("3") vs JSON number (3). Newtonsoft decrypt silently coerces "3" to int 3 via JsonSerializer and decrypts. Stream decrypt strictly rejects the type mismatch and throws System.Text.Json.JsonException. Captured as WireFormat_EfAsStringTypedThree_AdapterDivergence_KnownLimitation. 4. Input stream not disposed on successful MDE decrypt via Stream adapter. NewtonsoftAdapter.DecryptAsync disposes the input stream before returning. SystemTextJsonStreamAdapter.DecryptAsync does not. Captured as StreamOptIn_MdeDecrypt_InputDisposalMismatchWithNewtonsoft_KnownLimitation. Both [Ignore] reasons cite master commit 79d18b7. Changelog (changelog.md) Unreleased section expanded with: this PR's two fixes (with documented observable behaviour), and the two new pre-existing-divergence entries alongside the existing exception-type contract and Stream-encrypt limitations. Verification - src builds clean on net8.0 and netstandard2.0; 0 warnings / 0 errors. - Full test suite on net8.0: 429 passed, 4 skipped (the 4 [Ignore] regression-trackers across CrossAdapterAuditFindingsTests, CrossAdapterEncryptDecryptParityTests, and CrossAdapterAdversarialParityTests), 0 failed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…or chunked-stream regressions Five production-source fixes plus regression-tracker tests, driven by a 5-agent GPT-5.5 audit of PR Azure#5903 (encoding-edge-cases / numeric-fidelity / chunked-stream-parity / blob-roundtrip / deserialization-strictness, dispatched in parallel against the worktree). 1. EncryptionProcessor.TryDetectAlgorithm: short-circuits to Unknown for any non-MemoryStream input. The original detector pre-buffered the entire payload synchronously into a rented ArrayPool buffer (up to int.MaxValue bytes) before classifying, defeating the streaming property of large FileStream/NetworkStream inputs and risking OOM on multi-MB documents. It also called .Position / .Length on streams that may not support them. The fast path remains for the common Cosmos SDK MemoryStream case; non-MemoryStream inputs route through the original JObject-peek path, which handles them via incremental StreamReader / JsonTextReader reads. Three new regression tests in CrossAdapterAuditFindingsTests cover non-MemoryStream MDE round-trip, non-MemoryStream legacy round-trip, and a metadata- throwing wrapper that proves the detector does not touch .Length / .Position. 2. EncryptionProperties.EncryptionFormatVersion: annotated with [JsonNumberHandling(JsonNumberHandling.AllowReadingFromString)] on net8.0 so the System.Text.Json deserializer accepts _ei._ef = "3" (numeric string) identically to Newtonsoft's permissive coercion. Documents written with the string form now decrypt on both adapters. 3. SystemTextJsonStreamAdapter.DecryptAsync(Stream, Encryptor, ...): now disposes the caller's input stream on successful decrypt, matching NewtonsoftAdapter.DecryptAsync's documented stream-ownership contract. 4. StreamProcessor encrypt path: re-encodes JsonTokenType.PropertyName and JsonTokenType.String values through Utf8JsonWriter's canonical escape handling when reader.ValueIsEscaped is true. Previously the raw ValueSpan was copied verbatim, causing uXXXX escapes and escaped metacharacters in property names / top-level string values to be re-emitted as doubled escape sequences on the wire. 5. CrossAdapterAuditFindingsTests: 2 [Ignore] regression-trackers converted to positive assertions; 3 new chunked-stream tests added (non-MemoryStream MDE, non-MemoryStream legacy, ThrowOnMetadataAccessStream). Changelog updated to document the 5 fixes and the remaining pre-existing master divergences (non-finite numbers, BigInteger > Int64, UTF-8 BOM, lone surrogates, decimal exponent lexical form) that audit subagents surfaced but that are out of scope for this PR. Test results: 437 passed / 0 failed / 0 skipped on net8.0 (up from 429 passed + 4 [Ignore]). src builds clean on both net8.0 and netstandard2.0 (0 warnings, 0 errors). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…known limitations Fifth GPT-5.5 audit subagent (deserialization-strictness-aud) surfaced 6 asymmetries between the Newtonsoft and System.Text.Json adapters in how they handle malformed/tampered `_ei` shapes. Triaged: - Findings 4, 5 are masked at the public API by the JObject-fallback routing (detector returns Unknown, request routes through the Newtonsoft adapter), so they are not observable from real callers. - Findings 1, 2, 3, 6 are reachable on the Stream-opt-in fast path because the detector routes valid-shape `_ea` strings directly to the streaming adapter. None of the reachable findings is producible by `EncryptionProcessor.EncryptAsync` (real encrypters always write canonical wire-format values), so no real round-trip is affected. They only fire on tampered or hand-crafted documents. Closing each fully would require either tightening the Newtonsoft path (breaking change for documents in production) or attaching a custom JsonConverter that defeats the streaming optimisation. Both are out of scope for this PR. Documented in two places so callers and future maintainers can discover the divergences without reading test source: - changelog.md: extended the existing 'Known pre-existing JsonProcessor.Stream decrypt adapter divergences' section with five new bullets covering whitespace-padded `_ef`, non-string `_en`, non-string `_ep` entries, and duplicate `_ei` keys. Closing note clarifies real-encrypter output is unaffected. - CrossAdapterAuditFindingsTests.cs: four `[Ignore]`-flagged regression-tracker tests with `[DataRow]` variants and detailed XML doc comments explaining why each divergence exists and why it is not fixed. Test count: 437 passed, 4 skipped (new regression-trackers), 0 failed. Source build clean on net8.0 and netstandard2.0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Follow-up to #5902. Replaces the exception-driven legacy AE-AES detection (try/catch over
NotSupportedExceptionfromSystemTextJsonStreamAdapter) with a purpose-builtLegacyAlgorithmDetector(Utf8JsonReaderscan) forJsonProcessor.Streamopt-in callers.The original detector triggered a full
JsonSerializer.DeserializeAsync<EncryptionPropertiesWrapper>(base64-decoding the entire_edpayload) before throwing, so legacy documents paid for two full parses. The new detector classifies the document in a single pass over the top-level_ei._eaproperty without allocating either aJObjector anEncryptionPropertiesWrapper.Strict behavioural parity
Stream-opt-in callers see byte-for-byte the same output as the non-opt-in (Newtonsoft) path on every input that does not match the MDE fast-path classification. Two correctness fixes landed alongside the detector to make this guarantee hold:
LegacyAlgorithmDetectorexact-matches both the legacy AND the MDE algorithm names. Empty strings, unknown algorithms, case variants, prefixes, and suffixes all classify asUnknown(routed through the JObject peek path) instead of being optimistically forwarded to the streaming MDE adapter.DecryptViaJObjectPeekAsyncgainedstripStreamOverrideOnSuccessFallthrough: when the synchronous JObject parse succeeded, the post-peek MDE fallthrough strips theJsonProcessor.Streamoverride (parity with non-opt-in). When the parse threw (async-only stream, malformed JSON), the override is preserved so the streaming MDE adapter still gets a shot - preserves the pre-existingDecrypt_Cancellation_Abortscontract onSlowCancelableStream.Coverage
LegacyAlgorithmDetectorunit tests (empty/whitespace/non-object inputs,_eishape variants,_eaposition variants, JSON escapes, case-sensitivity, prefix/suffix collisions, MDE exact-match enforcement).EncryptionProcessorTests(full property parity,DecryptionContextpath lists, large docs, repeated round-trips, null sensitive values, non-MemoryStream wrappers, byte-for-byte parity with Newtonsoft, unknown-algorithm propagation).EncryptionProcessorStreamParityTeststhat run every input through BOTH paths and assert agreement on: thrown vs succeeded, exception family, output bytes,PathsDecrypted. Sections: round-trip, legacy, edge-case payloads (empty/null/non-string_ea,_eishape variants, truncated/empty/whitespace JSON, deep nesting), input-stream variants, cancellation, repeated calls.TreatWarningsAsErrors=true.Benchmarks
New
LegacyDetectionBenchmarkwith[MemoryDiagnoser]measures the detector against an inlined replica of the PR #5902 try/catch baseline (MediumRun, InProcessEmit, 15 iters x 2 launches):Strictly equal-or-better on allocations in every configuration; removes the exception-throw + wasted MDE deserialize cost from the legacy path entirely.
Also fixed the perf project so BenchmarkDotNet can load the existing
EncryptionBenchmark(BDN refuses[Params]on non-public members like the internalJsonProcessorenum). Added a publicJsonProcessorOptionsurrogate enum +BenchmarkSwitcherinProgram.cs.Gating
The detector and routing changes are gated behind
NET8_0_OR_GREATER(the framework the Stream opt-in path requires).Type of change