Skip to content

Conversation

@andrewkroh
Copy link
Member

Proposed commit message

The httpjson input was performing unwanted automatic retries when
receiving HTTP 429 (Too Many Requests) responses on initial requests,
despite being configured to honor X-Rate-Limit-Reset headers. This
occurred because the HTTP client was initialized without a custom retry
policy, causing it to fall back to the default retryablehttp library
behavior which automatically retries on 429 status codes.

The problem manifested specifically during the first request of an input
execution. When the server returned a 429 response with rate limit
headers indicating when to retry, the retryablehttp layer would
intercept this response and perform multiple retry attempts with
exponential backoff before the rate limiter component could read the
response headers. This defeated the purpose of the rate limiter's logic,
which is designed to wait precisely until the time specified in the
X-Rate-Limit-Reset header rather than making blind retry attempts.

The fix introduces a custom retry policy whenever rate limiting is
configured in the input settings. This policy distinguishes between
different types of failures, only retrying on genuine connection errors
and server errors in the 5xx range, while allowing 4xx client errors
like 429 to pass through immediately to the rate limiter. This ensures
that the rate limiter can properly parse the reset time from response
headers and wait the appropriate duration before making a single,
well-timed retry request.

A new test case validates that when retry attempts are configured
alongside rate limiting, and the server returns a 429 response, exactly
two HTTP requests are made in total: the initial request that receives
the 429, followed by a single retry after honoring the reset period,
rather than the five or more attempts that would occur with automatic
retries enabled.

Note

This PR was written primarily by Cursor. The issue was discovered while manually testing Elastic Agent 8.18.0 with the Okta integration.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works. Where relevant, I have used the stresstest.sh script to run them under stress conditions and race detector to verify their stability.
  • I have added an entry in ./changelog/fragments using the changelog tool.

How to test this PR locally

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Oct 31, 2025
@github-actions
Copy link
Contributor

🤖 GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@mergify
Copy link
Contributor

mergify bot commented Oct 31, 2025

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @andrewkroh? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@andrewkroh andrewkroh added Filebeat Filebeat Team:Service-Integrations Label for the Service Integrations team bugfix labels Oct 31, 2025
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Oct 31, 2025
@andrewkroh andrewkroh added backport-active-9 Automated backport with mergify to all the active 9.[0-9]+ branches backport-8.19 Automated backport to the 8.19 branch labels Oct 31, 2025
@andrewkroh andrewkroh force-pushed the fix/httpjson-429-rate-limit-retry branch from ff53f6e to 62984f8 Compare October 31, 2025 23:37
The httpjson input was performing unwanted automatic retries when
receiving HTTP 429 (Too Many Requests) responses on initial requests,
despite being configured to honor X-Rate-Limit-Reset headers. This
occurred because the HTTP client was initialized without a custom retry
policy, causing it to fall back to the default retryablehttp library
behavior which automatically retries on 429 status codes.

The problem manifested specifically during the first request of an input
execution. When the server returned a 429 response with rate limit
headers indicating when to retry, the retryablehttp layer would
intercept this response and perform multiple retry attempts with
exponential backoff before the rate limiter component could read the
response headers. This defeated the purpose of the rate limiter's logic,
which is designed to wait precisely until the time specified in the
X-Rate-Limit-Reset header rather than making blind retry attempts.

The fix introduces a custom retry policy whenever rate limiting is
configured in the input settings. This policy distinguishes between
different types of failures, only retrying on genuine connection errors
and server errors in the 5xx range, while allowing 4xx client errors
like 429 to pass through immediately to the rate limiter. This ensures
that the rate limiter can properly parse the reset time from response
headers and wait the appropriate duration before making a single,
well-timed retry request.

A new test case validates that when retry attempts are configured
alongside rate limiting, and the server returns a 429 response, exactly
two HTTP requests are made in total: the initial request that receives
the 429, followed by a single retry after honoring the reset period,
rather than the five or more attempts that would occur with automatic
retries enabled.
@andrewkroh andrewkroh force-pushed the fix/httpjson-429-rate-limit-retry branch from 62984f8 to a619862 Compare November 1, 2025 03:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-8.19 Automated backport to the 8.19 branch backport-active-9 Automated backport with mergify to all the active 9.[0-9]+ branches bugfix Filebeat Filebeat Team:Service-Integrations Label for the Service Integrations team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant