Skip to content

Conversation

pront
Copy link
Member

@pront pront commented Oct 13, 2025

Summary

This commit adds retry logic between subsequent calls of aws_sqs run_once if it returns an error. This is required to prevent spamming the SQS API if it fails to receive messages, e.g. due to invalid IAM permissions or an SQS outage.

Vector configuration

How did you test this PR?

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

closes: #22947

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details here.

medzin and others added 7 commits May 6, 2025 23:37
This commit adds logic to wait 500ms between subsequent calls of
run_once if it returns an error. This is required to prevent spamming
the SQS API if it fails to receive messages, e.g. due to invalid IAM
permissions or an SQS outage.
@pront pront requested a review from a team as a code owner October 13, 2025 13:59
@github-actions github-actions bot added the domain: sources Anything related to the Vector's sources label Oct 13, 2025

The new behavior starts at 500ms and doubles with each consecutive failure, capping at 30 seconds. This prevents excessive API calls during prolonged AWS SQS outages, invalid IAM permissions, or throttling scenarios, while still being responsive when the service recovers.

authors: @medzin @pront

Check failure

Code scanning / check-spelling

Unrecognized Spelling Error

medzin is not a recognized word. (unrecognized-spelling)
@datadog-vectordotdev
Copy link

⚠️ Tests

⚠️ Warnings

🧪 2 Tests failed

common::backoff::tests::test_backoff_reset from vector (Datadog)
thread 'common::backoff::tests::test_backoff_reset' panicked at src/common/backoff.rs:114:9

thread 'common::backoff::tests::test_backoff_reset' panicked at src/common/backoff.rs:114:9:
assertion \`left == right\` failed
  left: 30s
 right: 2s
stack backtrace:
   0:     0x55b107e39372 - std::backtrace_rs::backtrace::libunwind::trace::h2d45396358f41939
                               at /rustc/1159e78c4747b02ef996e55082b704c09b970588/library/std/src/../../backtrace/src/backtrace/libunwind.rs:117:9
   1:     0x55b107e39372 - std::backtrace_rs::backtrace::trace_unsynchronized::hffcefc0b67f1d6e2
...
common::backoff::tests::test_exponential_backoff_sequence from vector (Datadog)
thread 'common::backoff::tests::test_exponential_backoff_sequence' panicked at src/common/backoff.rs:104:13

thread 'common::backoff::tests::test_exponential_backoff_sequence' panicked at src/common/backoff.rs:104:13:
assertion \`left == right\` failed
  left: 30s
 right: 1s
stack backtrace:
   0:     0x563b44fe0372 - std::backtrace_rs::backtrace::libunwind::trace::h2d45396358f41939
                               at /rustc/1159e78c4747b02ef996e55082b704c09b970588/library/std/src/../../backtrace/src/backtrace/libunwind.rs:117:9
   1:     0x563b44fe0372 - std::backtrace_rs::backtrace::trace_unsynchronized::hffcefc0b67f1d6e2
...

ℹ️ Info

❄️ No new flaky tests detected

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: f719d4c | Docs | Was this helpful? Give us feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: sources Anything related to the Vector's sources

Projects

None yet

Development

Successfully merging this pull request may close these issues.

No retry backoff for sqs in aws_s3 source

2 participants