-
Notifications
You must be signed in to change notification settings - Fork 274
Description
Describe the bug
We tried to fully migrate one application from MongoDB to DynamoDB. While testing our changes on live we saw a lot timeouts (> 100ms) in our logs. These timeouts were also visible on DynamoDB request, which were there before the migration.
There weren't any spikes in CPU usage whatever.
To debug the issue, we created a flame graph on a live instance with the feature toggle enabled. You can see that most (30%-50%) of CPU time is spend on the callstack below the `send` function of the dynamodb client (marked in purple).
The following images shows a zoom of the last image:

A lot time is spend at aws_sigv4::sign. We assume that the CPU is busy due to the number of DynamoDB requests and the associated number of signature calculations, and therefore no yields can be made. This means that when tokio can switch back to a task, it determines that the maximum time has been exceeded and therefore kills the task (even though a response is probably already available but cannot be retrieved).
The following images shows a few areas of the flamegraph in detail:
Regression Issue
- Select this option if this issue appears to be a regression.
Expected Behavior
Signing should not take 10% cpu cycle time.
Current Behavior
Signing takes 10% of cpu cycles.
Reproduction Steps
Can't share any business code, instance is running on a c8g.medium with 512 cpu and small dynamodb payloads (few KB)
Possible Solution
No response
Additional Information/Context
This issue might be related to issue #1349 by @russfellows
We measured the time of the request on the instance, which is a lot higher compared to the metric in DynamoDB. We also saw an increase after activating the feature toggle for the existing requests.
DynamoDB client config:
pub fn config_with_timeout(aws_config: SdkConfig, duration: core::time::Duration) -> SdkConfig {
let timeout_config = TimeoutConfig::builder().operation_attempt_timeout(duration).build();
let retry_config = RetryConfigBuilder::new().max_attempts(2).build();
aws_config.into_builder().timeout_config(timeout_config).retry_config(retry_config).build()
}
Version
aws-sdk-dynamodb v1.102.0
├── aws-runtime v1.5.18
├── aws-smithy-async v1.2.7
├── aws-smithy-http v0.62.6
├── aws-smithy-json v0.61.9
├── aws-smithy-observability v0.2.0
├── aws-smithy-runtime v1.9.8
├── aws-smithy-runtime-api v1.10.0
├── aws-smithy-types v1.3.6
aws-config v1.8.12
Environment details (OS name and version, etc.)
chainguard/wolfi-base on Bottlerocket (aarch64), AWS ECS with EC2
Logs
No response