perf: improve ManagedAuthenticatedEncryptor Decrypt() and Encrypt() flow #59424

DeagleGross · 2024-12-10T22:59:01Z

The goal of PR is to bring linux performance closer to windows performance for DataProtection scenario. Below is the picture of Antiforgery benchmarks on win vs lin machines.

Results: DataProtection Benchmark

The benchmark I am relying on to show the result numbers is here, which is basically building default ServiceProvider, adding DataProtection via .AddDataProtection() and calling IDataProtector.Protect() or IDataProtector.Unprotect().

example crank run:

crank --config https://raw.githubusercontent.com/DeagleGross/Baraholka/refs/heads/main/Benchmarks/Benchmarks/crank/data-protector.yml --scenario dataprotector --profile aspnet-perf-lin --application.framework net10.0 --application.runtimeVersion "10.0.0-preview.3.25156.13" --application.options.outputFiles "D:\code\aspnetcore\artifacts\bin\Microsoft.AspNetCore.DataProtection\Release\net10.0\.dll" --application.options.outputFiles "D:\code\aspnetcore\artifacts\bin\Microsoft.AspNetCore.DataProtection.Abstractions\Release\net10.0\.dll" --application.variables.filterArg "DataProtectionBenchmarks"

net	Code	Method	Mean	Error	StdDev	Gen0	Allocated	Diff (% µs)
net10	main branch	Unprotect	9.706 µs	0.0143 µs	0.0134 µs	0.1068	3.34 KB	-
net10	changed	Unprotect	7.033 µs	0.0104 µs	0.0098 µs	0.2518	1.57 KB	38.5%
net10	main branch	Protect	10.88 µs	0.015 µs	0.014 µs	0.6256	3.91 KB	-
net10	changed	Protect	7.888 µs	0.0141 µs	0.0132 µs	0.3357	2.15 KB	37.93%

Results: Antiforgery Benchmark

However, since we are originally looking at improving the Antiforgery performance on linux, I ran the Antiforgery benchmark including locally built dll's from this PR.

example crank run:

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/refs/heads/main/scenarios/antiforgery.benchmarks.yml --scenario antiforgery-validation --profile aspnet-perf-lin --application.framework net10.0 --application.runtimeVersion "10.0.0-preview.3.25156.13" --application.options.outputFiles "D:\code\aspnetcore\artifacts\bin\Microsoft.AspNetCore.DataProtection\Release\net10.0\.dll" --application.options.outputFiles "D:\code\aspnetcore\artifacts\bin\Microsoft.AspNetCore.DataProtection.Abstractions\Release\net10.0\.dll"

current aspnet core gives these stats on the benchmark (avg on 4 runs):

Stat	Value
Requests/sec	111,771
Max Working Set (MB)	141.5
Max Private Memory (MB)	491

RPS of run with changed dll's varies from run to run, therefore I ran it 10 times

#	Stat	Value
1	Requests/sec	163,286
2	Requests/sec	163,648
3	Requests/sec	164,287
4	Requests/sec	162,611
5	Requests/sec	163,820
6	Requests/sec	164,098
7	Requests/sec	163,480
8	Requests/sec	162,668
9	Requests/sec	163,302
10	Requests/sec	161,953
Mean: 163,215.3	Mean Diff %: 46.06%

But memory usage is stable with such values:

Stat	Value	diff
Max Working Set (MB)	127	9.3%
Max Private Memory (MB)	442	10%

Which provide an evidence of ~10% of max app allocation size and ~46% RPS improvement.

Optimization details

Another improvement can be achieved after new APIs introduced in dotnet/runtime: dotnet/runtime#111154

I looked into Unprotect method for ManagedAuthenticatedEncryptor and spotted MemoryStream usage and multiple Buffer.BlockCopy usages. Also I saw that there is some shuffling of byte[] data, which I think can be skipped and performed in such a way, that some allocations are skipped.

In order to be as safe as possible, I created a separate DataProtectionPool which provides API to rent and return byte arrays. It is not intersecting with ArrayPool<byte>.Shared.

ManagedSP800_108_CTR_HMACSHA512.DeriveKeys is changed to explicit usage ManagedSP800_108_CTR_HMACSHA512.DeriveKeysHMACSHA512, because _kdkPrfFactory is anyway hardcoded to use HMACSHA512. There is a static API allowing to hash without allocating kdk byte[] which is rented from the buffer: HMACSHA512.TryHashData(kdk, prfInput, prfOutput, out _);
Avoided usage of DeriveKeysWithContextHeader which allocates a separate intermediate array for contextHeader and context. Instead passing the spans operationSubkey and validationSubkey directly into ManagedSP800_108_CTR_HMACSHA512.DeriveKeys
ManagedSP800_108_CTR_HMACSHA512.DeriveKeysHMACSHA512 had 2 more arrays (prfInput and prfOutput), which now I am renting (via DataProtectionPool) or even stackalloc'ing. They are returned to the pool with clearArray: true flag to make sure key material is removed from the memory after usage.
In Decrypt() flow I am again using HashAlgorithm.TryComputeHash overload, which works based on the Span<byte> types, compared to previously used HashAlgorithm.ComputeHash
In Decrypt() flow changed usage to SymmetricAlgorithm.DecryptCbc() instead of CryptoTransform.TransformBlock() with same idea to use Span<byte> API instead of another byte[] allocation.
Encrypt() flow is reusing №1, №2 and №3 optimizations as well
Encrypt() before was relying on the MemoryStream and CryptoStream to write data in the result buffer, but I am pre-calculating the length, and then doing a single allocation of result array: var outputArray = new byte[keyModifierLength + ivLength + cipherTextLength + macLength]; All required data is copied into the outputArray via APIs supporting Span<byte>.

All listed optimizations are included in the net10 TFM, but only some (№ 2, №3 and №6) are used in netstandard2.0 and netFx TFMs which DataProtection also targets.

Related to #59287

src/DataProtection/DataProtection/src/Managed/ManagedAuthenticatedEncryptor.cs

src/DataProtection/DataProtection/src/KeyManagement/KeyManagementOptions.cs

src/DataProtection/DataProtection/src/Managed/ManagedAuthenticatedEncryptor.cs

src/DataProtection/DataProtection/src/SP800_108/ManagedSP800_108_CTR_HMACSHA512.cs

GrabYourPitchforks · 2024-12-19T19:06:55Z

Code review notes

It's generally not advisable to pool buffers for sensitive cryptographic operations, such as those which perform key storage or manipulation. This tends to increase the attack surface of the application and should only be performed if it's absolutely required to meet some performance goal. If you absolutely must use pooled buffers, use different pooled buffers for key material specifically vs. (all other data). In dataprotection, key material would be the KDK, KEK, and individual decryption + validation keys.
This PR introduces a call to the SymmetricAlgorithm.Key property setter. The original code intentionally avoided calling this property setter because it duplicates the sensitive key material in such a way that the caller has no control over the new lifetime, and this undermines other protections present in the system (the use of the Secret type and the widespread use of pinning within the core crypto logic). Of course, the EG can always say that the new behavior is preferred over the old behavior, but there needs to be an explicit acknowledgement that there is a security tradeoff here. The tradeoff shouldn't be a mere side effect that is likely to go unnoticed.

It's possible we can make changes to the underlying SymmetricAlgorithm type to improve the perf without reducing the security stance, but this would require new API to be exposed within corelib. Since you're targeting these changes for net10 that's probably acceptable? Work with Jeff's team to give them your requirements and they can add the work to the backlog.
The pattern if (len < CONST) { foo = stackalloc[len]; } else { foo = Rent(len).Slice(len); } is typically considered an antipattern. Prefer a pattern like if (len < CONST) { foo = stackalloc[CONST]; } else { foo = Rent(len); } foo = foo.Slice(len); instead. (Basically, the stackalloc should be a const, not variable-length.)

Scenario notes

Is the goal to improve the performance of the [real-world?] AntiForgery benchmark or to improve the performance of DataProtection in a standalone benchmark? The PR description (and attached graph) make it sound like improving the performance of the crank-based benchmark is the goal, but no throughput measurement is provided for the changes in this PR. Please provide that graph. It would supply evidence that these changes have real-world impact and aren't just microbenchmark improvements.

DeagleGross · 2024-12-20T13:10:09Z

Code review notes

It's generally not advisable to pool buffers for sensitive cryptographic operations, such as those which perform key storage or manipulation. This tends to increase the attack surface of the application and should only be performed if it's absolutely required to meet some performance goal. If you absolutely must use pooled buffers, use different pooled buffers for key material specifically vs. (all other data). In dataprotection, key material would be the KDK, KEK, and individual decryption + validation keys.

This PR introduces a call to the SymmetricAlgorithm.Key property setter. The original code intentionally avoided calling this property setter because it duplicates the sensitive key material in such a way that the caller has no control over the new lifetime, and this undermines other protections present in the system (the use of the Secret type and the widespread use of pinning within the core crypto logic). Of course, the EG can always say that the new behavior is preferred over the old behavior, but there needs to be an explicit acknowledgement that there is a security tradeoff here. The tradeoff shouldn't be a mere side effect that is likely to go unnoticed.
It's possible we can make changes to the underlying SymmetricAlgorithm type to improve the perf without reducing the security stance, but this would require new API to be exposed within corelib. Since you're targeting these changes for net10 that's probably acceptable? Work with Jeff's team to give them your requirements and they can add the work to the backlog.

The pattern if (len < CONST) { foo = stackalloc[len]; } else { foo = Rent(len).Slice(len); } is typically considered an antipattern. Prefer a pattern like if (len < CONST) { foo = stackalloc[CONST]; } else { foo = Rent(len); } foo = foo.Slice(len); instead. (Basically, the stackalloc should be a const, not variable-length.)

Scenario notes

Is the goal to improve the performance of the [real-world?] AntiForgery benchmark or to improve the performance of DataProtection in a standalone benchmark? The PR description (and attached graph) make it sound like improving the performance of the crank-based benchmark is the goal, but no throughput measurement is provided for the changes in this PR. Please provide that graph. It would supply evidence that these changes have real-world impact and aren't just microbenchmark improvements.

Thanks for detailed answer @GrabYourPitchforks! Firstly, I ran the Antiforgery benchmark multiple times, and I provided the results in the PR description.

Re №3: I ran a BenchmarkDotNet for stackalloc with dynamic \ constant length of stackalloc (also with or without [SkipLocalsInit]): results are that without [SkipLocalsInit] dynamic length is even faster, but with [SkipLocalsInit] you are correct. I will probably enable the [SkipLocalsInit] on the method and then we can use constant length. What do you think?

Re №2: Thanks for clarifying it, I will create issues on the dotnet/runtime explaining what API I would like to have to make DataProtection's flow dont use byte[] directly.

Re №1: Could you please describe how attack surface of the application is increased if pool buffers are used? Does that mean that pooling is easier to inject into via reflection for example? Actually, even if we will not be using pooling byte arrays, if I work with corelib to introduce APIs supporting Span<byte>, we can choose to stackalloc or allocate a new byte via new byte[], meaning that we will not ever touch pool buffers. Does it make sense and is that more secure? I think most of the code is not using "big" arrays (say with length more than 128), and we will hit stackalloc in most cases.

Span<byte> arr = length <= 128
    ? stackalloc byte[128]
    : new byte[length];

src/DataProtection/DataProtection/src/SP800_108/ManagedSP800_108_CTR_HMACSHA512.cs

Copilot

Copilot reviewed 5 out of 15 changed files in this pull request and generated no comments.

Files not reviewed (10)

eng/Dependencies.props: Language not supported
eng/Version.Details.xml: Language not supported
eng/Versions.props: Language not supported
src/DataProtection/Cryptography.Internal/src/Microsoft.AspNetCore.Cryptography.Internal.csproj: Language not supported
src/DataProtection/DataProtection/test/Microsoft.AspNetCore.DataProtection.Tests/KeyManagement/KeyRingBasedDataProtectorTests.cs: Evaluated as low risk
src/DataProtection/DataProtection/src/Managed/AesGcmAuthenticatedEncryptor.cs: Evaluated as low risk
src/DataProtection/Cryptography.Internal/test/CryptoUtilTests.cs: Evaluated as low risk
src/DataProtection/DataProtection/test/Microsoft.AspNetCore.DataProtection.Tests/SP800_108/SP800_108Tests.cs: Evaluated as low risk
src/DataProtection/DataProtection/test/Microsoft.AspNetCore.DataProtection.Tests/SequentialGenRandom.cs: Evaluated as low risk
src/DataProtection/Cryptography.Internal/src/CryptoUtil.cs: Evaluated as low risk

This reverts commit d87aeaf.

adityamandaleeka · 2025-03-12T20:13:34Z

Just saw the latest update:

Which provide an evidence of ~10% of max app allocation size and ~46% RPS improvement.

👀 nice work!

src/DataProtection/DataProtection/src/SP800_108/ManagedSP800_108_CTR_HMACSHA512.cs

BrennanConroy · 2025-03-17T20:44:20Z

src/DataProtection/DataProtection/test/Microsoft.AspNetCore.DataProtection.Tests/E2ETests.cs

+public class E2ETests
+{
+    [Fact]
+    public void ProtectAndUnprotect_ForSampleAntiforgeryToken()


What is this test doing that's different from the current unit tests?

just yet another check to see if specifically Antiforgery tokens are parsed correctly. I thought it's better to check than not to. let me know if it makes sense

But you're just calling protect then unprotect on the value, so all you're really testing is a random string is round trip-able which I'm sure is already tested.

src/DataProtection/DataProtection/src/Managed/ManagedAuthenticatedEncryptor.cs

src/DataProtection/DataProtection/src/SP800_108/ManagedSP800_108_CTR_HMACSHA512.cs

BrennanConroy · 2025-03-18T17:36:52Z

src/DataProtection/DataProtection/test/Microsoft.AspNetCore.DataProtection.Tests/E2ETests.cs

+public class E2ETests
+{
+    [Fact]
+    public void ProtectAndUnprotect_ForSampleAntiforgeryToken()


But you're just calling protect then unprotect on the value, so all you're really testing is a random string is round trip-able which I'm sure is already tested.

src/DataProtection/DataProtection/src/Managed/ManagedAuthenticatedEncryptor.cs

DeagleGross · 2025-03-20T16:28:20Z

just as a history note: @BrennanConroy noticed that 111k RPS was before the openssl update, so we dont see 45% RPS improvement, but >10%

however allocation rate is still going lower significantly even for a 15 sec run:

cc @adityamandaleeka

DeagleGross added 4 commits December 10, 2024 00:04

a bit of improvement

e47b41d

another buffer.blockCopy removal

bb6d3c0

use DecryptCbc instead?

384c3d5

tests and remove another array

90bf754

DeagleGross self-assigned this Dec 10, 2024

ghost added the area-dataprotection Includes: DataProtection label Dec 10, 2024

This was referenced Dec 11, 2024

slow macOS - "##[error]The job running on agent Azure Pipelines 9 ran longer than the maximum time of 60 minutes." dotnet/dnceng#1883

Open

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

DeagleGross added 2 commits December 11, 2024 21:56

fix the slicing of IV

a37b0d5

slice correctly!

268ee44

mgravell reviewed Dec 13, 2024

View reviewed changes

DeagleGross added 5 commits December 16, 2024 12:28

try with encrypt

eb6cb1b

finish encrypt

d090882

use same overload in decrypt()

6ec76d3

dont allocate for prf-output as well

49eab48

address PR comments

4ba9cc2

DeagleGross changed the title ~~[DRAFT] perf: improve ManagedAuthenticatedEncryptor.Decrypt flow~~ [DRAFT] perf: improve ManagedAuthenticatedEncryptor Decrypt() and Encrypt() flow Dec 17, 2024

DeagleGross added 6 commits December 17, 2024 18:11

remove spaces

919d003

prfInput and prfOutput rented

87d118b

correctHash as rented

2524a15

more details

ec70b6e

use static TryHashData for specific hash implementation

0d30a5b

prettify

bab196a

DeagleGross marked this pull request as ready for review December 19, 2024 14:31

DeagleGross changed the title ~~[DRAFT] perf: improve ManagedAuthenticatedEncryptor Decrypt() and Encrypt() flow~~ perf: improve ManagedAuthenticatedEncryptor Decrypt() and Encrypt() flow Dec 19, 2024

DeagleGross requested a review from GrabYourPitchforks December 19, 2024 14:38

dotnet-policy-service bot added the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Dec 27, 2024

DeagleGross mentioned this pull request Jan 7, 2025

[API Proposal]: add possibility to pass key as Span<byte> in SymmetricAlgorithm and KeyedHashAlgorithm dotnet/runtime#111154

Closed

DeagleGross added 5 commits January 27, 2025 22:27

no pool at all

cf48cc8

remove leftover

0f3d1aa

oops stackoverflow

51bfe77

merge main

ab25e0f

correct merge

6f035d4

BrennanConroy reviewed Jan 30, 2025

View reviewed changes

dotnet-policy-service bot added the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Feb 6, 2025

danmoseley requested a review from Copilot February 14, 2025 04:00

Copilot AI reviewed Feb 14, 2025

View reviewed changes

DeagleGross added 5 commits March 7, 2025 14:37

merge main

22224dc

no-alloc setKey for algorithm

c8019fb

raise stackalloc to 256

d87aeaf

Revert "raise stackalloc to 256"

489c5cd

This reverts commit d87aeaf.

address ManagedSP800_108_CTR_HMACSHA512

26ee6ad

adityamandaleeka added the blog-candidate Consider mentioning this in the release blog post label Mar 12, 2025

Rick-Anderson approved these changes Mar 13, 2025

View reviewed changes

DeagleGross removed the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Mar 17, 2025

Merge branch 'main' into dmkorolev/dataprotection/lin-perf-2

9ce2b17

BrennanConroy reviewed Mar 17, 2025

View reviewed changes

DeagleGross added 3 commits March 18, 2025 13:55

merga main

f50c7fb

address PR comments

f45a948

merge main

66b2170

BrennanConroy approved these changes Mar 18, 2025

View reviewed changes

remove test + address nits

66cd115

BrennanConroy added the Perf label Mar 18, 2025

BrennanConroy merged commit f83c97f into dotnet:main Mar 18, 2025
27 checks passed

dotnet-policy-service bot added this to the 10.0-preview3 milestone Mar 18, 2025

DeagleGross deleted the dmkorolev/dataprotection/lin-perf-2 branch March 20, 2025 16:34

perf: improve ManagedAuthenticatedEncryptor Decrypt() and Encrypt() flow #59424

perf: improve ManagedAuthenticatedEncryptor Decrypt() and Encrypt() flow #59424

Uh oh!

Conversation

DeagleGross commented Dec 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Results: DataProtection Benchmark

Results: Antiforgery Benchmark

Optimization details

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

GrabYourPitchforks commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code review notes

Scenario notes

Uh oh!

DeagleGross commented Dec 20, 2024

Code review notes

Scenario notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

adityamandaleeka commented Mar 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BrennanConroy Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

DeagleGross Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

BrennanConroy Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BrennanConroy Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

DeagleGross commented Mar 20, 2025

Uh oh!

Uh oh!

DeagleGross commented Dec 10, 2024 •

edited

Loading

GrabYourPitchforks commented Dec 19, 2024 •

edited

Loading