Skip to content

Detect and report GitHub API rate limiting#1609

Open
dguido wants to merge 7 commits intozizmorcore:mainfrom
dguido:fix/issue-1252
Open

Detect and report GitHub API rate limiting#1609
dguido wants to merge 7 commits intozizmorcore:mainfrom
dguido:fix/issue-1252

Conversation

@dguido
Copy link
Contributor

@dguido dguido commented Feb 12, 2026

Pre-submission checks

  • This PR corresponds to an issue (if not, please create one first).
  • I hereby disclose the use of an LLM or other AI coding assistant in the creation of this PR.

Summary

  • Add RateLimitMiddleware (implements reqwest_middleware::Middleware) that detects HTTP 403 + x-ratelimit-remaining: 0 and converts to a dedicated ClientError::RateLimited error
  • Show a clear error message with actionable help: wait and retry, check GH_TOKEN, or use --offline
  • Narrow retry classifier to only retry 5xx/429/408 (previously retried all 4xx including 403)
  • Uses middleware approach as suggested by @woodruffw; checks x-ratelimit-remaining: 0 to distinguish rate-limit 403s from other 403s (auth failures, etc.)

Closes #1252

Test Plan

  • cargo build — clean, zero warnings
  • cargo test -p zizmor — 194 passed, 0 failed
  • cargo fmt --check — clean
  • cargo clippy -p zizmor — clean (pre-existing test-only clippy issues unchanged)

dguido and others added 2 commits February 12, 2026 11:38
Add a `RateLimitMiddleware` that checks for HTTP 403 responses with
`x-ratelimit-remaining: 0` and converts them into a dedicated
`ClientError::RateLimited` error. This surfaces a clear, actionable
message instead of the confusing "request error while accessing
GitHub API" that previously appeared.

Also narrows the retry classifier to only retry server errors (5xx),
429, and 408 — previously all client errors including 403 were
retried, wasting attempts on rate-limit responses that will keep
failing.

Closes zizmorcore#1252

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Narrowing retries to only 5xx/429/408 would change existing behavior
for 404s and other client errors. Instead, just exclude 403
(rate-limit responses) from retries since those will never succeed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dguido and others added 3 commits February 12, 2026 11:47
- Add 5 unit tests covering is_rate_limited() chain traversal:
  direct, through Middleware, through ListTags, through Inner,
  and false for unrelated errors
- Show human-readable relative time ("~5 minutes") instead of
  raw epoch timestamp, using only std::time (no new dependencies)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix error message: "resets at" → "resets in" for relative durations
- Surface reset time in user-facing help ("rate limit resets in ~5 minutes")
- Replace is_rate_limited() with rate_limit_reset() returning Option<&str>
- Extend rate-limit detection to GlobalConfig and RemoteWithoutWorkflows paths
- Flatten RateLimitMiddleware with early returns for clarity
- Extract format_reset_duration() as testable pure function
- Use or_else chaining in main.rs to avoid re-indenting existing match arms
- Add comment explaining 403 retry classifier tradeoff
- Add tests: ListBranches traversal, non-ClientError middleware, reset formatting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
P2 fixes:
- Replace wildcard `_ => None` in `rate_limit_reset()` with exhaustive
  match arms, so the compiler forces handling new `ClientError` variants
  (aligns with project style: no wildcard matches)
- Add `CollectionError::Config` path to rate-limit extraction in main.rs,
  so rate limits during per-group config fetching get actionable messaging

P3 fixes:
- Change `.unwrap_or(0)` to `.ok()?` in `RateLimitMiddleware` when
  reading `SystemTime`; a broken system clock now falls through to
  "unknown" instead of producing a misleading duration
- Add boundary tests: 59-second threshold, equal timestamps, and
  nested `Inner(Arc<Inner(...)>)` traversal

Dismissed:
- P3: tracing::debug! for unparseable x-ratelimit-reset header —
  adds complexity to a clean chain for an extremely rare edge case;
  the "unknown" fallback is already honest and actionable
- P4: Error::Audit path loses rate-limit info through anyhow —
  acknowledged limitation requiring deeper refactoring outside PR scope
- P4: 403s no longer retried — documented, intentional tradeoff

Quality pipeline: cargo fmt, clippy, 317 tests (123 unit + 194
integration) all pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Member

@woodruffw woodruffw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly minor feedback, but there's one thing here that hard-blocks this approach (not being able to actually pull the ratelimit error out in the most common case).

Comment on lines 304 to 305
.and_then(|v| v.to_str().ok())
.is_some_and(|v| v == "0");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will panic if the server sends a malformed header, e.g. x-ratelimit-remaining: <invalid-utf8>. It's probably fine to just compare for the literal byte string b"0" here instead.

(Or we could percolate an error on an invalid header, but that seems overkill.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — switched to v.as_bytes() == b"0" which avoids the to_str() entirely.

Comment on lines +314 to +315
.and_then(|v| v.to_str().ok())
.and_then(|v| v.parse::<u64>().ok())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing re: panics. I think an appropriate failure mode here would be fall back to "unknown" here too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — the x-ratelimit-reset header now uses .and_then(|v| v.to_str().ok()) followed by .and_then(|v| v.parse::<u64>().ok()), falling back to "unknown" via .unwrap_or_else(|| "unknown".into()) if either step fails.

if res.status() != StatusCode::FORBIDDEN {
return Ok(res);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also probably check the origin here, and avoid trying to apply any rate-limit header checks if the host doesn't match the GitHub API host. That'll make it harder for us to panic on differently-shaped headers if we start to reuse our client stack for other requests.

(This matters because some servers like to put relative durations in x-ratelimit-reset and similar, which we'd handle incorrectly below.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — RateLimitMiddleware now takes an api_host: String field (set from the client's configured API URL at construction) and short-circuits with next.run() if req.url().host_str() doesn't match.

assert_eq!(err.rate_limit_reset(), Some("~5 minutes"));
}

#[test]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "through" tests below seem pretty duplicative; I think they should just be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed test_rate_limit_reset_through_list_tags and test_rate_limit_reset_through_list_branches. Kept test_rate_limit_reset_through_middleware since it covers the one wrapping layer the middleware actually produces.

Comment on lines +1150 to +1165
#[test]
fn test_rate_limit_reset_none_for_other_errors() {
let err = ClientError::RepoMissingOrPrivate {
owner: "foo".into(),
repo: "bar".into(),
};
assert_eq!(err.rate_limit_reset(), None);
}

#[test]
fn test_rate_limit_reset_none_for_non_client_middleware_error() {
let err = ClientError::Middleware(reqwest_middleware::Error::Middleware(anyhow::anyhow!(
"some other middleware error"
)));
assert_eq!(err.rate_limit_reset(), None);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Safe to remove too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Comment on lines +1215 to +1223
#[test]
fn test_rate_limit_reset_through_nested_inner() {
let rate_limited = ClientError::RateLimited {
reset: "~2 minutes".into(),
};
let inner = ClientError::Inner(Arc::new(rate_limited));
let outer = ClientError::Inner(Arc::new(inner));
assert_eq!(outer.rate_limit_reset(), Some("~2 minutes"));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Safe to remove IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Comment on lines 991 to 993
.element(Level::HELP.message(
"ensure GH_TOKEN is set to a valid token for higher rate limits",
))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably misleading, since zizmor doesn't access GitHub's API unless it has a valid token. What's actually happened in this case is that someone has exceeded their authenticated rate limit, so we should probably just limit the message to telling them to wait.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — removed the "ensure GH_TOKEN is set" help line. The message now just says the reset time and to wait and try again.


let report = match &err {
// Check all error paths that can carry a rate-limit error.
// NOTE: Error::Audit wraps errors through anyhow, making
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this limitation makes this PR unmergeable, at least for now -- most API operations happen during audit steps, so a ratelimit is overwhelmingly likely to be hit during the audit phase. A larger refactor isn't out of the question, but I think it'd need to be human-driven.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — AuditError now has a downcast_ref method that reaches through the anyhow::Error wrapper to recover the original ClientError. The Error::Audit arm in main.rs uses it: source.downcast_ref::<ClientError>().and_then(|ce| ce.rate_limit_reset()).

This turned out to be simpler than expected because anyhow::Error already provides downcast_ref<T>() — it preserves the concrete type behind its type-erased storage. So rather than needing to restructure the error hierarchy or change how audit errors propagate, we just needed a one-line delegation method on AuditError that forwards to self.source.downcast_ref::<T>(). No changes to the Audit trait, no changes to how individual audits construct errors, and the existing AuditCore::err() constructor already wraps via Into<anyhow::Error> which preserves the concrete type.

Added a test in audit/mod.rs that round-trips a ClientError::RateLimited through AuditError::newdowncast_refrate_limit_reset() to verify the chain works.

dguido and others added 2 commits February 18, 2026 23:29
- Use byte comparison for x-ratelimit-remaining header
- Add host guard to RateLimitMiddleware to skip non-API requests
- Add AuditError::downcast_ref to surface rate-limit errors from audits
- Handle Error::Audit in rate-limit extraction
- Remove misleading GH_TOKEN/--offline help from rate-limit error
- Remove 3 duplicative tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CLI: improve error message on rate limiting from GitHub

2 participants

Comments