Skip to content

fix(paddle): retry rate-limit and 5xx responses with backoff#65041

Open
Gilbert09 wants to merge 1 commit into
masterfrom
posthog-code/paddle-retry-rate-limits
Open

fix(paddle): retry rate-limit and 5xx responses with backoff#65041
Gilbert09 wants to merge 1 commit into
masterfrom
posthog-code/paddle-retry-rate-limits

Conversation

@Gilbert09

Copy link
Copy Markdown
Member

Problem

Error tracking surfaced an HTTPError from the Paddle data-warehouse import source:

429 Client Error: Too Many Requests for url: https://api.paddle.com/transactions?...

The Paddle source builds its HTTP session with Retry(total=0), so a single 429 (Paddle rate-limiting a paginated transactions fetch) propagated straight out of response.raise_for_status() in get_rows and failed the sync. A 429 means "slow down and retry", not "give up" — every sibling source (Brex, Stripe, Asana, Adroll, …) already backs off and retries on 429/5xx. Paddle was the outlier.

Changes

_get_paddle_session() now uses the framework's DEFAULT_RETRY instead of opting out with Retry(total=0):

  • Backs off and retries on 429/5xx, honoring the Retry-After header (urllib3's respect_retry_after_header defaults on).
  • Leaves auth/4xx (400/401/403) failing fast — those aren't in the retry status list — so they still reach get_non_retryable_errors and surface a clear, non-retried message to the user.
  • raise_on_status=False means a persistent 429 still surfaces via raise_for_status() (retryable at the Temporal layer), rather than being silently swallowed.

This is a robustness fix, not a NonRetryableErrors change: a rate limit is transient and upstream-recoverable, so the right behavior is to retry it, not to stop retrying.

How did you test this code?

I'm an agent. I added a regression test (tests/test_paddle.py) asserting the session's retry policy retries 429 (and honors Retry-After) while leaving 400/401/403 un-retried. Before the fix the session used Retry(total=0), so the rate-limit assertion would have failed. Ran the new tests locally (2 passed) and ran ruff check/ruff format on the source.

Automatic notifications

  • Publish to changelog?
  • Alert Sales and Marketing teams?

🤖 Agent context

Autonomy: Fully autonomous

Triaged from an error-tracking webhook for the Paddle source. Confirmed the stack originates in get_rows at paddle/paddle.py (the raise_for_status() call), then traced the session config to the deliberate Retry(total=0). Decision: a 429 is a transient rate limit, so it belongs in retry/backoff handling, not NonRetryableErrors — matching the established 429 or >= 500 pattern across the other warehouse sources and the framework's own DEFAULT_RETRY. Kept the change minimal: swap the retry policy and update the now-stale fail-fast comments; no behavior change for auth errors.

The Paddle source built its HTTP session with `Retry(total=0)`, so a single
429 Too Many Requests surfaced straight out of `response.raise_for_status()`
and failed the whole sync. Rate limits are transient and should be backed off
and retried, not treated as terminal.

Switch the session to the framework's `DEFAULT_RETRY`, which backs off on
429/5xx and honors `Retry-After`, while still letting auth/4xx errors fail
fast so they reach `get_non_retryable_errors`.

Generated-By: PostHog Code
Task-Id: 8cbc7a7e-838f-4c6c-b5ac-9f9d60f2ae5d
@Gilbert09 Gilbert09 added the stamphog Request AI review from stamphog label Jun 20, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Hey @Gilbert09! 👋

It looks like your git author email on this PR isn't your @posthog.com address (owerstom@gmail.com). Since you're on the PostHog team, it's worth pointing your local git author email at your @posthog.com address. Why it matters:

  • Consistent work identity in git history — internal tooling that attributes commits to team members keys off your @posthog.com address.
  • Keeps team contributions easy to tell apart from external community ones when scanning history.

You can fix it for this repo with:

git config user.email "you@posthog.com"

Or set it globally with git config --global user.email "you@posthog.com". No need to redo this PR — just a nudge for next time. 🙂

@assign-reviewers-posthog assign-reviewers-posthog Bot requested a review from a team June 20, 2026 21:44
return make_tracked_session(retry=Retry(total=0))
# DEFAULT_RETRY backs off on 429/5xx (honoring Retry-After) but leaves auth/4xx
# failures to surface immediately, so a transient rate-limit doesn't fail the sync.
return make_tracked_session(retry=DEFAULT_RETRY)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Paddle API key is no longer redacted in tracked HTTP telemetry

_get_paddle_session() now opts into the shared tracked session with DEFAULT_RETRY, but it still doesn't pass redact_values=(api_key,) or set the auth header on the session itself. This source sends the bearer token on every request, and the tracked transport can capture matching requests/responses to object storage for operators. Without value-based redaction, a Paddle API key that appears outside the standard header scrub path (for example in serialized request metadata or retries/debug captures) can be written to logs/sample artifacts, exposing a live third-party credential.

Prompt To Fix With AI
Update the Paddle connector to construct its tracked session with credential redaction enabled, e.g. thread the API key into `_get_paddle_session(api_key)` and call `make_tracked_session(retry=DEFAULT_RETRY, headers={"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}, redact_values=(api_key,))`. Then remove per-request auth headers where possible so both normal logging and sample capture consistently scrub the secret across URLs, headers, and bodies.

Severity: medium | Confidence: 79% | React with 👍 if useful or 👎 if not

@stamphog stamphog Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hex-security-app bot raised an unresolved, current-head medium-severity security concern: the Paddle API key (Bearer token) is not passed as redact_values to make_tracked_session(), so it will appear unredacted in any HTTP telemetry captured by the tracked transport. This pattern is established across 82 other sources in the codebase; Paddle is a clear outlier.

@stamphog stamphog Bot removed the stamphog Request AI review from stamphog label Jun 20, 2026
@greptile-apps

greptile-apps Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
posthog/temporal/data_imports/sources/paddle/tests/test_paddle.py:4-24
The test inspects `status_forcelist` membership directly rather than using `retry.is_retry("GET", status_code)`, which is the established pattern used in `TestConvexRetryPolicy`. Checking `status_forcelist` alone does not account for `allowed_methods` — if the retry's `allowed_methods` excluded GET, the 429 assertion would pass even though no retry would actually fire. Using `is_retry` exercises the full retry-decision logic, which is what matters for Paddle's GET calls.

```suggestion
class TestPaddleSession:
    def test_session_retries_rate_limits(self):
        session = _get_paddle_session()
        retry = session.get_adapter(PADDLE_BASE_URL).max_retries

        # A transient 429 must back off and retry rather than failing the whole sync.
        assert retry.total is not None and retry.total > 0
        assert retry.is_retry("GET", 429) is True
        assert retry.respect_retry_after_header is True
        # Persistent failures still surface via response.raise_for_status(), not MaxRetryError.
        assert retry.raise_on_status is False

    def test_auth_failures_are_not_retried(self):
        session = _get_paddle_session()
        retry = session.get_adapter(PADDLE_BASE_URL).max_retries

        # 401/403/400 are credential/config problems handled by get_non_retryable_errors;
        # retrying them would only delay surfacing the error to the user.
        assert retry.is_retry("GET", 401) is False
        assert retry.is_retry("GET", 403) is False
        assert retry.is_retry("GET", 400) is False
```

Reviews (1): Last reviewed commit: "fix(paddle): retry rate-limit and 5xx re..." | Re-trigger Greptile

Comment on lines +4 to +24
class TestPaddleSession:
def test_session_retries_rate_limits(self):
session = _get_paddle_session()
retry = session.get_adapter(PADDLE_BASE_URL).max_retries

# A transient 429 must back off and retry rather than failing the whole sync.
assert retry.total is not None and retry.total > 0
assert 429 in retry.status_forcelist
assert retry.respect_retry_after_header is True
# Persistent failures still surface via response.raise_for_status(), not MaxRetryError.
assert retry.raise_on_status is False

def test_auth_failures_are_not_retried(self):
session = _get_paddle_session()
retry = session.get_adapter(PADDLE_BASE_URL).max_retries

# 401/403/400 are credential/config problems handled by get_non_retryable_errors;
# retrying them would only delay surfacing the error to the user.
assert 401 not in retry.status_forcelist
assert 403 not in retry.status_forcelist
assert 400 not in retry.status_forcelist

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The test inspects status_forcelist membership directly rather than using retry.is_retry("GET", status_code), which is the established pattern used in TestConvexRetryPolicy. Checking status_forcelist alone does not account for allowed_methods — if the retry's allowed_methods excluded GET, the 429 assertion would pass even though no retry would actually fire. Using is_retry exercises the full retry-decision logic, which is what matters for Paddle's GET calls.

Suggested change
class TestPaddleSession:
def test_session_retries_rate_limits(self):
session = _get_paddle_session()
retry = session.get_adapter(PADDLE_BASE_URL).max_retries
# A transient 429 must back off and retry rather than failing the whole sync.
assert retry.total is not None and retry.total > 0
assert 429 in retry.status_forcelist
assert retry.respect_retry_after_header is True
# Persistent failures still surface via response.raise_for_status(), not MaxRetryError.
assert retry.raise_on_status is False
def test_auth_failures_are_not_retried(self):
session = _get_paddle_session()
retry = session.get_adapter(PADDLE_BASE_URL).max_retries
# 401/403/400 are credential/config problems handled by get_non_retryable_errors;
# retrying them would only delay surfacing the error to the user.
assert 401 not in retry.status_forcelist
assert 403 not in retry.status_forcelist
assert 400 not in retry.status_forcelist
class TestPaddleSession:
def test_session_retries_rate_limits(self):
session = _get_paddle_session()
retry = session.get_adapter(PADDLE_BASE_URL).max_retries
# A transient 429 must back off and retry rather than failing the whole sync.
assert retry.total is not None and retry.total > 0
assert retry.is_retry("GET", 429) is True
assert retry.respect_retry_after_header is True
# Persistent failures still surface via response.raise_for_status(), not MaxRetryError.
assert retry.raise_on_status is False
def test_auth_failures_are_not_retried(self):
session = _get_paddle_session()
retry = session.get_adapter(PADDLE_BASE_URL).max_retries
# 401/403/400 are credential/config problems handled by get_non_retryable_errors;
# retrying them would only delay surfacing the error to the user.
assert retry.is_retry("GET", 401) is False
assert retry.is_retry("GET", 403) is False
assert retry.is_retry("GET", 400) is False
Prompt To Fix With AI
This is a comment left during a code review.
Path: posthog/temporal/data_imports/sources/paddle/tests/test_paddle.py
Line: 4-24

Comment:
The test inspects `status_forcelist` membership directly rather than using `retry.is_retry("GET", status_code)`, which is the established pattern used in `TestConvexRetryPolicy`. Checking `status_forcelist` alone does not account for `allowed_methods` — if the retry's `allowed_methods` excluded GET, the 429 assertion would pass even though no retry would actually fire. Using `is_retry` exercises the full retry-decision logic, which is what matters for Paddle's GET calls.

```suggestion
class TestPaddleSession:
    def test_session_retries_rate_limits(self):
        session = _get_paddle_session()
        retry = session.get_adapter(PADDLE_BASE_URL).max_retries

        # A transient 429 must back off and retry rather than failing the whole sync.
        assert retry.total is not None and retry.total > 0
        assert retry.is_retry("GET", 429) is True
        assert retry.respect_retry_after_header is True
        # Persistent failures still surface via response.raise_for_status(), not MaxRetryError.
        assert retry.raise_on_status is False

    def test_auth_failures_are_not_retried(self):
        session = _get_paddle_session()
        retry = session.get_adapter(PADDLE_BASE_URL).max_retries

        # 401/403/400 are credential/config problems handled by get_non_retryable_errors;
        # retrying them would only delay surfacing the error to the user.
        assert retry.is_retry("GET", 401) is False
        assert retry.is_retry("GET", 403) is False
        assert retry.is_retry("GET", 400) is False
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant