Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't enable GCM by default, even on Java >8? #1867

Open
henryptung opened this issue Jan 21, 2021 · 6 comments
Open

Don't enable GCM by default, even on Java >8? #1867

henryptung opened this issue Jan 21, 2021 · 6 comments

Comments

@henryptung
Copy link

henryptung commented Jan 21, 2021

What happened?

JRE-default GCM ciphers (even on Java >8) seem to cause a lot of heap churn. This ended up contributing heavily to a Horizon performance regression with Conjure enabled (2-3 sec operation -> 15-20 sec and about 10x more GC activity; see PDS-134017).

JRE version in use (from IL): zulu11.41.23-ca-jdk11.0.8-linux_x64

Example request load, 100 requests each 5MB in size with TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384, tracking all allocations >100B in size:
image

Same request load, 100 5MB requests with TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384, all allocations >100B:
image

Given this load seems to scale directly with network traffic, would expect this to generally cause more GCs in any server using GCM with nontrivial request or response sizes.

What did you want to happen?

Ideally, don't try to enable GCM by default if e.g. Conscrypt/Bouncycastle isn't in use.

@carterkozak
Copy link
Contributor

This is interesting, thanks for the report! We've found that GCM ciphers on java 11+ provide substantially more throughput in the conditions we've tested, but we may be hitting an edge case with excessively large (or small) buffers that results in unexpected GC churn.
Recently there have been a few openjdk changes to make the GCM implementation more efficient in newer releases, for example reducing allocation overhead https://bugs.openjdk.java.net/browse/JDK-8253821 delivered in java 16.

@henryptung
Copy link
Author

We've found that GCM ciphers on java 11+ provide substantially more throughput in the conditions we've tested

Honestly curious to see those results/data, if you have them around.

for example reducing allocation overhead https://bugs.openjdk.java.net/browse/JDK-8253821 delivered in java 16.

Heh, would be nice to get a backport of that, though I dunno what JDK backport policy is.

@carterkozak
Copy link
Contributor

I see, this appears to be an issue receiving data on the server, which uses the SSLEngine directly, the SSLSocket API doesn't appear to be impacted. I've rerun the benchmarks which show GCM provides about ~28% more throughput while creating a great deal more garbage than CBC. Tomorrow I can do some validation/comparison with jdk16, and see what we can do to reduce the overhead.
It's not clear that using CBC ciphers would be better, but in some scenarios GCM could exacerbate GC problems and result in degraded performance.

@carterkozak
Copy link
Contributor

Using zulu16.0.79-ea-jdk16.0.0-ea.31 those allocations are effectively nonexistant and we see about 30% greater throughput. This is comparing jdk11 with jdk16-ea, so I expect the throughput difference is in some part related to other improvements in jdk14 and jdk15, but this looks promising.

Jdk 16 will be tricky to roll out due to reflective access exceptions that we haven't begun to fix, but it was relatively straightforward to get our rpc stack working.

@henryptung
Copy link
Author

@carterkozak Yeah, I think I remember seeing some JDK tickets about reducing GCM allocation overhead. Glad to know that it still produces greater throughput - we made some improvements to our network request size anyway, so hopefully we can make do until we make it to JDK14+.

That said, even if we don't change the default, probably worth keeping in mind somewhere that large network requests can be particularly hard on GC until then.

@carterkozak
Copy link
Contributor

@henryptung I'm curious if performance is better using the CBC ciphers than GCM, or just the amount of garbage that is created? How confident are we that the GC numbers are a root cause rather than a symptom of another problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants