Skip to content

Performance issues with AWS DynamoDB (AWS sig v4) #1400

@NicoSchmidt1703

Description

@NicoSchmidt1703

Describe the bug

We tried to fully migrate one application from MongoDB to DynamoDB. While testing our changes on live we saw a lot timeouts (> 100ms) in our logs. These timeouts were also visible on DynamoDB request, which were there before the migration.

There weren't any spikes in CPU usage whatever.

Image To debug the issue, we created a flame graph on a live instance with the feature toggle enabled. You can see that most (30%-50%) of CPU time is spend on the callstack below the `send` function of the dynamodb client (marked in purple).

The following images shows a zoom of the last image:
Image
A lot time is spend at aws_sigv4::sign. We assume that the CPU is busy due to the number of DynamoDB requests and the associated number of signature calculations, and therefore no yields can be made. This means that when tokio can switch back to a task, it determines that the maximum time has been exceeded and therefore kills the task (even though a response is probably already available but cannot be retrieved).

The following images shows a few areas of the flamegraph in detail:

Image Image

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

Signing should not take 10% cpu cycle time.

Current Behavior

Signing takes 10% of cpu cycles.

Reproduction Steps

Can't share any business code, instance is running on a c8g.medium with 512 cpu and small dynamodb payloads (few KB)

Possible Solution

No response

Additional Information/Context

This issue might be related to issue #1349 by @russfellows

We measured the time of the request on the instance, which is a lot higher compared to the metric in DynamoDB. We also saw an increase after activating the feature toggle for the existing requests.

DynamoDB client config:

pub fn config_with_timeout(aws_config: SdkConfig, duration: core::time::Duration) -> SdkConfig {
    let timeout_config = TimeoutConfig::builder().operation_attempt_timeout(duration).build();
    let retry_config = RetryConfigBuilder::new().max_attempts(2).build();

    aws_config.into_builder().timeout_config(timeout_config).retry_config(retry_config).build()
}
Image

Version

aws-sdk-dynamodb v1.102.0
├── aws-runtime v1.5.18
├── aws-smithy-async v1.2.7
├── aws-smithy-http v0.62.6
├── aws-smithy-json v0.61.9
├── aws-smithy-observability v0.2.0
├── aws-smithy-runtime v1.9.8
├── aws-smithy-runtime-api v1.10.0
├── aws-smithy-types v1.3.6

aws-config v1.8.12

Environment details (OS name and version, etc.)

chainguard/wolfi-base on Bottlerocket (aarch64), AWS ECS with EC2

Logs

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugThis issue is a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions