Possible improvements to the benchmarking suite

This is just one issue to track the many different ways the benchmarking suite at `bindings/rust/bench` could be improved, roughly in order from most to least impactful/important. This list is not complete, nor is it a checklist meant to be completed.

- [ ] **Incorporate into CI.** This would allow PRs with performance regressions to be caught before they even get merged into the codebase. This could be achieved by running benchmarks both before and after a PR (likely on CodeBuild), then comparing the two runs with Criterion’s baselines feature.
- [ ] **Testing different versions or variants of OpenSSL/other TLS libraries.** To use another local build of OpenSSL, set env variables as described in the [openssl crate docs](https://docs.rs/openssl/latest/openssl/index.html#manual).
- [ ] **Test async and parellelism.** Most customers run TLS multithreaded, so seeing how concurrency could affect performance is important. This would likely require major code changes.
- [ ] **Nightly dashboard.** This would have a nice UI that runs the benchmarks nightly and then displays past and present results, allowing for easy comparison and analysis of historical performance data for all three libraries.
- [ ] **TLS 1.2.** Most of the internet still supports TLS 1.2, and a lot of traffic still is TLS 1.2, even though it’s not as fast or secure. TLS 1.2 benchmarks would require significant API changes and possibly refactoring to be added to the benchmarking suite.
- [ ] **Separate client and server for handshake benches.** This would involve partially handshaking a connection pair and measuring the time it would take for the next step, which is relatively easy in Criterion. This would also match customer usage.
- [x] **Session resumption benches.** A significant portion of connections (1-30+%) are resumed, so this would be important to test. These are already being worked on by [James Mayclin](https://quip-amazon.com/YUD9EAwTjMr).
- [ ] **Separate send/receive and client/server.** This would give results that would be more comparable with common customer uses. 
- [x] **Vary certificate chain structure.** Currently, the server has a cert chain of only length 2: the root cert directly signs the server cert. Both certs also have the same key size. However, most normal use cases have a cert chain of at least 3 certs, with the root cert having a larger key size than the server cert. 
- [ ] **Baremetal benching.** Benching on baremetal EC2 instances removes a lot of noise on the machine and allow for higher repeatability. 
- [ ] **Vary amount of data sent for throughput benching.** Different amounts of data being sent as once allows us to test both constant-time operations and asymptotic time complexity for throughput.
- [ ] **Test memory usage before/after a large data send/receive or handshake.** It would be useful to customers to know when a connection takes up the most memory and by how much. 
- [ ] **Test OpenSSL memory with RELEASE_BUFFERS.** OpenSSL currently doesn’t get the benefits of its RELEASE_BUFFERS API in the benches due to it not fitting in the benching API. 
- [ ] **Do historical benching with more past versions.** Currently, historical benches are run on the bindings for each past version; however, it’s probably possible to use the bindings with a static library built from a past version, similar to how a custom build of s2n-tls is used to bench with AWS-LC. This might provide more backwards compatibility and allow for benching more versions from the past. Alternatively (and less elegantly), additional compile-time use (or other) statements could be added to accommodate the old Rust binding APIs.
- [ ] **Reduce noise and variability.** This could be achieved by running all benchmarks multiple times and not benchmarking everything sequentially. The average of all of the runs or the lowest time of all of the runs could be taken as the main metric of performance for a set of parameters and version. The historical benchmarks would then take longer, but the variability would be a lot less.
- [ ] **Test more cryptographic parameters (eg. signature algorithm, key shares).** The current benchmarks are limited especially by s2n-tls’s security policies, but more commonly used parameters (ex. ChaChaPoly) could be tested.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Possible improvements to the benchmarking suite #4157

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Possible improvements to the benchmarking suite #4157

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions