fix(paddle): retry rate-limit and 5xx responses with backoff#65041
fix(paddle): retry rate-limit and 5xx responses with backoff#65041Gilbert09 wants to merge 1 commit into
Conversation
The Paddle source built its HTTP session with `Retry(total=0)`, so a single 429 Too Many Requests surfaced straight out of `response.raise_for_status()` and failed the whole sync. Rate limits are transient and should be backed off and retried, not treated as terminal. Switch the session to the framework's `DEFAULT_RETRY`, which backs off on 429/5xx and honors `Retry-After`, while still letting auth/4xx errors fail fast so they reach `get_non_retryable_errors`. Generated-By: PostHog Code Task-Id: 8cbc7a7e-838f-4c6c-b5ac-9f9d60f2ae5d
|
Hey @Gilbert09! 👋 It looks like your git author email on this PR isn't your
You can fix it for this repo with: git config user.email "you@posthog.com"Or set it globally with |
| return make_tracked_session(retry=Retry(total=0)) | ||
| # DEFAULT_RETRY backs off on 429/5xx (honoring Retry-After) but leaves auth/4xx | ||
| # failures to surface immediately, so a transient rate-limit doesn't fail the sync. | ||
| return make_tracked_session(retry=DEFAULT_RETRY) |
There was a problem hiding this comment.
Paddle API key is no longer redacted in tracked HTTP telemetry
_get_paddle_session() now opts into the shared tracked session with DEFAULT_RETRY, but it still doesn't pass redact_values=(api_key,) or set the auth header on the session itself. This source sends the bearer token on every request, and the tracked transport can capture matching requests/responses to object storage for operators. Without value-based redaction, a Paddle API key that appears outside the standard header scrub path (for example in serialized request metadata or retries/debug captures) can be written to logs/sample artifacts, exposing a live third-party credential.
Prompt To Fix With AI
Update the Paddle connector to construct its tracked session with credential redaction enabled, e.g. thread the API key into `_get_paddle_session(api_key)` and call `make_tracked_session(retry=DEFAULT_RETRY, headers={"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}, redact_values=(api_key,))`. Then remove per-request auth headers where possible so both normal logging and sample capture consistently scrub the secret across URLs, headers, and bodies.Severity: medium | Confidence: 79% | React with 👍 if useful or 👎 if not
There was a problem hiding this comment.
The hex-security-app bot raised an unresolved, current-head medium-severity security concern: the Paddle API key (Bearer token) is not passed as redact_values to make_tracked_session(), so it will appear unredacted in any HTTP telemetry captured by the tracked transport. This pattern is established across 82 other sources in the codebase; Paddle is a clear outlier.
Prompt To Fix All With AIFix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
posthog/temporal/data_imports/sources/paddle/tests/test_paddle.py:4-24
The test inspects `status_forcelist` membership directly rather than using `retry.is_retry("GET", status_code)`, which is the established pattern used in `TestConvexRetryPolicy`. Checking `status_forcelist` alone does not account for `allowed_methods` — if the retry's `allowed_methods` excluded GET, the 429 assertion would pass even though no retry would actually fire. Using `is_retry` exercises the full retry-decision logic, which is what matters for Paddle's GET calls.
```suggestion
class TestPaddleSession:
def test_session_retries_rate_limits(self):
session = _get_paddle_session()
retry = session.get_adapter(PADDLE_BASE_URL).max_retries
# A transient 429 must back off and retry rather than failing the whole sync.
assert retry.total is not None and retry.total > 0
assert retry.is_retry("GET", 429) is True
assert retry.respect_retry_after_header is True
# Persistent failures still surface via response.raise_for_status(), not MaxRetryError.
assert retry.raise_on_status is False
def test_auth_failures_are_not_retried(self):
session = _get_paddle_session()
retry = session.get_adapter(PADDLE_BASE_URL).max_retries
# 401/403/400 are credential/config problems handled by get_non_retryable_errors;
# retrying them would only delay surfacing the error to the user.
assert retry.is_retry("GET", 401) is False
assert retry.is_retry("GET", 403) is False
assert retry.is_retry("GET", 400) is False
```
Reviews (1): Last reviewed commit: "fix(paddle): retry rate-limit and 5xx re..." | Re-trigger Greptile |
| class TestPaddleSession: | ||
| def test_session_retries_rate_limits(self): | ||
| session = _get_paddle_session() | ||
| retry = session.get_adapter(PADDLE_BASE_URL).max_retries | ||
|
|
||
| # A transient 429 must back off and retry rather than failing the whole sync. | ||
| assert retry.total is not None and retry.total > 0 | ||
| assert 429 in retry.status_forcelist | ||
| assert retry.respect_retry_after_header is True | ||
| # Persistent failures still surface via response.raise_for_status(), not MaxRetryError. | ||
| assert retry.raise_on_status is False | ||
|
|
||
| def test_auth_failures_are_not_retried(self): | ||
| session = _get_paddle_session() | ||
| retry = session.get_adapter(PADDLE_BASE_URL).max_retries | ||
|
|
||
| # 401/403/400 are credential/config problems handled by get_non_retryable_errors; | ||
| # retrying them would only delay surfacing the error to the user. | ||
| assert 401 not in retry.status_forcelist | ||
| assert 403 not in retry.status_forcelist | ||
| assert 400 not in retry.status_forcelist |
There was a problem hiding this comment.
The test inspects
status_forcelist membership directly rather than using retry.is_retry("GET", status_code), which is the established pattern used in TestConvexRetryPolicy. Checking status_forcelist alone does not account for allowed_methods — if the retry's allowed_methods excluded GET, the 429 assertion would pass even though no retry would actually fire. Using is_retry exercises the full retry-decision logic, which is what matters for Paddle's GET calls.
| class TestPaddleSession: | |
| def test_session_retries_rate_limits(self): | |
| session = _get_paddle_session() | |
| retry = session.get_adapter(PADDLE_BASE_URL).max_retries | |
| # A transient 429 must back off and retry rather than failing the whole sync. | |
| assert retry.total is not None and retry.total > 0 | |
| assert 429 in retry.status_forcelist | |
| assert retry.respect_retry_after_header is True | |
| # Persistent failures still surface via response.raise_for_status(), not MaxRetryError. | |
| assert retry.raise_on_status is False | |
| def test_auth_failures_are_not_retried(self): | |
| session = _get_paddle_session() | |
| retry = session.get_adapter(PADDLE_BASE_URL).max_retries | |
| # 401/403/400 are credential/config problems handled by get_non_retryable_errors; | |
| # retrying them would only delay surfacing the error to the user. | |
| assert 401 not in retry.status_forcelist | |
| assert 403 not in retry.status_forcelist | |
| assert 400 not in retry.status_forcelist | |
| class TestPaddleSession: | |
| def test_session_retries_rate_limits(self): | |
| session = _get_paddle_session() | |
| retry = session.get_adapter(PADDLE_BASE_URL).max_retries | |
| # A transient 429 must back off and retry rather than failing the whole sync. | |
| assert retry.total is not None and retry.total > 0 | |
| assert retry.is_retry("GET", 429) is True | |
| assert retry.respect_retry_after_header is True | |
| # Persistent failures still surface via response.raise_for_status(), not MaxRetryError. | |
| assert retry.raise_on_status is False | |
| def test_auth_failures_are_not_retried(self): | |
| session = _get_paddle_session() | |
| retry = session.get_adapter(PADDLE_BASE_URL).max_retries | |
| # 401/403/400 are credential/config problems handled by get_non_retryable_errors; | |
| # retrying them would only delay surfacing the error to the user. | |
| assert retry.is_retry("GET", 401) is False | |
| assert retry.is_retry("GET", 403) is False | |
| assert retry.is_retry("GET", 400) is False |
Prompt To Fix With AI
This is a comment left during a code review.
Path: posthog/temporal/data_imports/sources/paddle/tests/test_paddle.py
Line: 4-24
Comment:
The test inspects `status_forcelist` membership directly rather than using `retry.is_retry("GET", status_code)`, which is the established pattern used in `TestConvexRetryPolicy`. Checking `status_forcelist` alone does not account for `allowed_methods` — if the retry's `allowed_methods` excluded GET, the 429 assertion would pass even though no retry would actually fire. Using `is_retry` exercises the full retry-decision logic, which is what matters for Paddle's GET calls.
```suggestion
class TestPaddleSession:
def test_session_retries_rate_limits(self):
session = _get_paddle_session()
retry = session.get_adapter(PADDLE_BASE_URL).max_retries
# A transient 429 must back off and retry rather than failing the whole sync.
assert retry.total is not None and retry.total > 0
assert retry.is_retry("GET", 429) is True
assert retry.respect_retry_after_header is True
# Persistent failures still surface via response.raise_for_status(), not MaxRetryError.
assert retry.raise_on_status is False
def test_auth_failures_are_not_retried(self):
session = _get_paddle_session()
retry = session.get_adapter(PADDLE_BASE_URL).max_retries
# 401/403/400 are credential/config problems handled by get_non_retryable_errors;
# retrying them would only delay surfacing the error to the user.
assert retry.is_retry("GET", 401) is False
assert retry.is_retry("GET", 403) is False
assert retry.is_retry("GET", 400) is False
```
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Problem
Error tracking surfaced an
HTTPErrorfrom the Paddle data-warehouse import source:The Paddle source builds its HTTP session with
Retry(total=0), so a single 429 (Paddle rate-limiting a paginatedtransactionsfetch) propagated straight out ofresponse.raise_for_status()inget_rowsand failed the sync. A 429 means "slow down and retry", not "give up" — every sibling source (Brex, Stripe, Asana, Adroll, …) already backs off and retries on429/5xx. Paddle was the outlier.Changes
_get_paddle_session()now uses the framework'sDEFAULT_RETRYinstead of opting out withRetry(total=0):429/5xx, honoring theRetry-Afterheader (urllib3'srespect_retry_after_headerdefaults on).400/401/403) failing fast — those aren't in the retry status list — so they still reachget_non_retryable_errorsand surface a clear, non-retried message to the user.raise_on_status=Falsemeans a persistent 429 still surfaces viaraise_for_status()(retryable at the Temporal layer), rather than being silently swallowed.This is a robustness fix, not a
NonRetryableErrorschange: a rate limit is transient and upstream-recoverable, so the right behavior is to retry it, not to stop retrying.How did you test this code?
I'm an agent. I added a regression test (
tests/test_paddle.py) asserting the session's retry policy retries429(and honorsRetry-After) while leaving400/401/403un-retried. Before the fix the session usedRetry(total=0), so the rate-limit assertion would have failed. Ran the new tests locally (2 passed) and ranruff check/ruff formaton the source.Automatic notifications
🤖 Agent context
Autonomy: Fully autonomous
Triaged from an error-tracking webhook for the Paddle source. Confirmed the stack originates in
get_rowsatpaddle/paddle.py(theraise_for_status()call), then traced the session config to the deliberateRetry(total=0). Decision: a 429 is a transient rate limit, so it belongs in retry/backoff handling, notNonRetryableErrors— matching the established429 or >= 500pattern across the other warehouse sources and the framework's ownDEFAULT_RETRY. Kept the change minimal: swap the retry policy and update the now-stale fail-fast comments; no behavior change for auth errors.