Detect and report GitHub API rate limiting#1609
Detect and report GitHub API rate limiting#1609dguido wants to merge 7 commits intozizmorcore:mainfrom
Conversation
Add a `RateLimitMiddleware` that checks for HTTP 403 responses with `x-ratelimit-remaining: 0` and converts them into a dedicated `ClientError::RateLimited` error. This surfaces a clear, actionable message instead of the confusing "request error while accessing GitHub API" that previously appeared. Also narrows the retry classifier to only retry server errors (5xx), 429, and 408 — previously all client errors including 403 were retried, wasting attempts on rate-limit responses that will keep failing. Closes zizmorcore#1252 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Narrowing retries to only 5xx/429/408 would change existing behavior for 404s and other client errors. Instead, just exclude 403 (rate-limit responses) from retries since those will never succeed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add 5 unit tests covering is_rate_limited() chain traversal:
direct, through Middleware, through ListTags, through Inner,
and false for unrelated errors
- Show human-readable relative time ("~5 minutes") instead of
raw epoch timestamp, using only std::time (no new dependencies)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix error message: "resets at" → "resets in" for relative durations
- Surface reset time in user-facing help ("rate limit resets in ~5 minutes")
- Replace is_rate_limited() with rate_limit_reset() returning Option<&str>
- Extend rate-limit detection to GlobalConfig and RemoteWithoutWorkflows paths
- Flatten RateLimitMiddleware with early returns for clarity
- Extract format_reset_duration() as testable pure function
- Use or_else chaining in main.rs to avoid re-indenting existing match arms
- Add comment explaining 403 retry classifier tradeoff
- Add tests: ListBranches traversal, non-ClientError middleware, reset formatting
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
P2 fixes: - Replace wildcard `_ => None` in `rate_limit_reset()` with exhaustive match arms, so the compiler forces handling new `ClientError` variants (aligns with project style: no wildcard matches) - Add `CollectionError::Config` path to rate-limit extraction in main.rs, so rate limits during per-group config fetching get actionable messaging P3 fixes: - Change `.unwrap_or(0)` to `.ok()?` in `RateLimitMiddleware` when reading `SystemTime`; a broken system clock now falls through to "unknown" instead of producing a misleading duration - Add boundary tests: 59-second threshold, equal timestamps, and nested `Inner(Arc<Inner(...)>)` traversal Dismissed: - P3: tracing::debug! for unparseable x-ratelimit-reset header — adds complexity to a clean chain for an extremely rare edge case; the "unknown" fallback is already honest and actionable - P4: Error::Audit path loses rate-limit info through anyhow — acknowledged limitation requiring deeper refactoring outside PR scope - P4: 403s no longer retried — documented, intentional tradeoff Quality pipeline: cargo fmt, clippy, 317 tests (123 unit + 194 integration) all pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
woodruffw
left a comment
There was a problem hiding this comment.
Mostly minor feedback, but there's one thing here that hard-blocks this approach (not being able to actually pull the ratelimit error out in the most common case).
crates/zizmor/src/github.rs
Outdated
| .and_then(|v| v.to_str().ok()) | ||
| .is_some_and(|v| v == "0"); |
There was a problem hiding this comment.
This will panic if the server sends a malformed header, e.g. x-ratelimit-remaining: <invalid-utf8>. It's probably fine to just compare for the literal byte string b"0" here instead.
(Or we could percolate an error on an invalid header, but that seems overkill.)
There was a problem hiding this comment.
Done — switched to v.as_bytes() == b"0" which avoids the to_str() entirely.
| .and_then(|v| v.to_str().ok()) | ||
| .and_then(|v| v.parse::<u64>().ok()) |
There was a problem hiding this comment.
Same thing re: panics. I think an appropriate failure mode here would be fall back to "unknown" here too.
There was a problem hiding this comment.
Done — the x-ratelimit-reset header now uses .and_then(|v| v.to_str().ok()) followed by .and_then(|v| v.parse::<u64>().ok()), falling back to "unknown" via .unwrap_or_else(|| "unknown".into()) if either step fails.
| if res.status() != StatusCode::FORBIDDEN { | ||
| return Ok(res); | ||
| } | ||
|
|
There was a problem hiding this comment.
I think we should also probably check the origin here, and avoid trying to apply any rate-limit header checks if the host doesn't match the GitHub API host. That'll make it harder for us to panic on differently-shaped headers if we start to reuse our client stack for other requests.
(This matters because some servers like to put relative durations in x-ratelimit-reset and similar, which we'd handle incorrectly below.)
There was a problem hiding this comment.
Done — RateLimitMiddleware now takes an api_host: String field (set from the client's configured API URL at construction) and short-circuits with next.run() if req.url().host_str() doesn't match.
crates/zizmor/src/github.rs
Outdated
| assert_eq!(err.rate_limit_reset(), Some("~5 minutes")); | ||
| } | ||
|
|
||
| #[test] |
There was a problem hiding this comment.
The "through" tests below seem pretty duplicative; I think they should just be removed.
There was a problem hiding this comment.
Removed test_rate_limit_reset_through_list_tags and test_rate_limit_reset_through_list_branches. Kept test_rate_limit_reset_through_middleware since it covers the one wrapping layer the middleware actually produces.
| #[test] | ||
| fn test_rate_limit_reset_none_for_other_errors() { | ||
| let err = ClientError::RepoMissingOrPrivate { | ||
| owner: "foo".into(), | ||
| repo: "bar".into(), | ||
| }; | ||
| assert_eq!(err.rate_limit_reset(), None); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_rate_limit_reset_none_for_non_client_middleware_error() { | ||
| let err = ClientError::Middleware(reqwest_middleware::Error::Middleware(anyhow::anyhow!( | ||
| "some other middleware error" | ||
| ))); | ||
| assert_eq!(err.rate_limit_reset(), None); | ||
| } |
| #[test] | ||
| fn test_rate_limit_reset_through_nested_inner() { | ||
| let rate_limited = ClientError::RateLimited { | ||
| reset: "~2 minutes".into(), | ||
| }; | ||
| let inner = ClientError::Inner(Arc::new(rate_limited)); | ||
| let outer = ClientError::Inner(Arc::new(inner)); | ||
| assert_eq!(outer.rate_limit_reset(), Some("~2 minutes")); | ||
| } |
crates/zizmor/src/main.rs
Outdated
| .element(Level::HELP.message( | ||
| "ensure GH_TOKEN is set to a valid token for higher rate limits", | ||
| )) |
There was a problem hiding this comment.
This is probably misleading, since zizmor doesn't access GitHub's API unless it has a valid token. What's actually happened in this case is that someone has exceeded their authenticated rate limit, so we should probably just limit the message to telling them to wait.
There was a problem hiding this comment.
Good call — removed the "ensure GH_TOKEN is set" help line. The message now just says the reset time and to wait and try again.
crates/zizmor/src/main.rs
Outdated
|
|
||
| let report = match &err { | ||
| // Check all error paths that can carry a rate-limit error. | ||
| // NOTE: Error::Audit wraps errors through anyhow, making |
There was a problem hiding this comment.
I think this limitation makes this PR unmergeable, at least for now -- most API operations happen during audit steps, so a ratelimit is overwhelmingly likely to be hit during the audit phase. A larger refactor isn't out of the question, but I think it'd need to be human-driven.
There was a problem hiding this comment.
Addressed — AuditError now has a downcast_ref method that reaches through the anyhow::Error wrapper to recover the original ClientError. The Error::Audit arm in main.rs uses it: source.downcast_ref::<ClientError>().and_then(|ce| ce.rate_limit_reset()).
This turned out to be simpler than expected because anyhow::Error already provides downcast_ref<T>() — it preserves the concrete type behind its type-erased storage. So rather than needing to restructure the error hierarchy or change how audit errors propagate, we just needed a one-line delegation method on AuditError that forwards to self.source.downcast_ref::<T>(). No changes to the Audit trait, no changes to how individual audits construct errors, and the existing AuditCore::err() constructor already wraps via Into<anyhow::Error> which preserves the concrete type.
Added a test in audit/mod.rs that round-trips a ClientError::RateLimited through AuditError::new → downcast_ref → rate_limit_reset() to verify the chain works.
- Use byte comparison for x-ratelimit-remaining header - Add host guard to RateLimitMiddleware to skip non-API requests - Add AuditError::downcast_ref to surface rate-limit errors from audits - Handle Error::Audit in rate-limit extraction - Remove misleading GH_TOKEN/--offline help from rate-limit error - Remove 3 duplicative tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pre-submission checks
Summary
RateLimitMiddleware(implementsreqwest_middleware::Middleware) that detects HTTP 403 +x-ratelimit-remaining: 0and converts to a dedicatedClientError::RateLimitederrorx-ratelimit-remaining: 0to distinguish rate-limit 403s from other 403s (auth failures, etc.)Closes #1252
Test Plan
cargo build— clean, zero warningscargo test -p zizmor— 194 passed, 0 failedcargo fmt --check— cleancargo clippy -p zizmor— clean (pre-existing test-only clippy issues unchanged)