Skip to content

fix: tolerate transient Uplink manifest errors at startup#767

Open
DaleSeo wants to merge 2 commits into
mainfrom
AMS-524
Open

fix: tolerate transient Uplink manifest errors at startup#767
DaleSeo wants to merge 2 commits into
mainfrom
AMS-524

Conversation

@DaleSeo

@DaleSeo DaleSeo commented Jun 22, 2026

Copy link
Copy Markdown
Member

After #762 the server still treated every persisted-query manifest fetch error as fatal during startup, so a brief Uplink blip at boot would crash the process even though the Uplink poll loop would have recovered on the very next tick. This change classifies manifest errors as transient or permanent in the Uplink stream, where the typed error is still available, so transient errors are logged at warn and left for the poll loop to retry while only non-retryable errors such as an invalid API key remain fatal at startup and surface as an error while running. That brings the manifest path in line with how operation collections already behave.

@DaleSeo DaleSeo self-assigned this Jun 22, 2026
@apollo-librarian

apollo-librarian Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

✅ Docs preview has no changes

The preview was not built because there were no changes.

Build ID: 54ef0a4f22fbc2f99c9e259f
Build Logs: View logs


✅ AI Style Review — No Changes Detected

No MDX files were changed in this pull request.

Review Log: View detailed log

This review is AI-generated. Please use common sense when accepting these suggestions, as they may not always be accurate or appropriate for your specific context.

@DaleSeo DaleSeo marked this pull request as ready for review June 22, 2026 15:40
@DaleSeo DaleSeo requested a review from a team as a code owner June 22, 2026 15:40
@github-actions

github-actions Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Changeset file added - thank you!

@DaleSeo DaleSeo force-pushed the AMS-524 branch 2 times, most recently from fac7272 to 8592c90 Compare June 23, 2026 16:44
Comment thread crates/apollo-mcp-registry/src/uplink.rs Outdated
@claude

claude Bot commented Jun 23, 2026

Copy link
Copy Markdown

Review Summary

The second commit (b0a365e) fully addresses the finding from my initial review. The bundled is_transient_classifies_retryable_errors test was replaced with five focused per-variant tests, including a proper async http_error_is_transient test that uses wiremock to construct a real reqwest::Error — the right approach since reqwest::Error has no public constructor.

Findings:

  • [Should Fix] Missing Error::Http(_) coverage in is_transient tests; bundled assertions in one function — Resolved by second commit: split into individual tests per variant with descriptive names, plus http_error_is_transient covering the Http variant end-to-end through a mock server.

Test coverage: All five is_transient variants are now individually exercised. The four classify_uplink_manifest_result tests in manifest_poller.rs remain well-structured with one assertion per test. The empty_operations_update_clears_operations_while_running test in states.rs usefully distinguishes a network error (keep ops) from an authoritative empty update (clear ops).

Recommendation: Approve — the fix is correct, the log-level escalation in states.rs is the right consequence of upstream filtering, and test coverage is thorough.


Reviewed by Claude Code Sonnet 4.6

@DaleSeo

DaleSeo commented Jun 23, 2026

Copy link
Copy Markdown
Member Author

@claude can you hear me?

@claude

claude Bot commented Jun 23, 2026

Copy link
Copy Markdown

Claude Code is working…

I'll analyze this and get back to you.

View job run

@github-actions

Copy link
Copy Markdown
Contributor

⚠️ Claude Code Review failed to complete. This could be due to API issues, rate limits, or insufficient credits. The PR can still be merged - manual review recommended.

@DaleSeo

DaleSeo commented Jun 23, 2026

Copy link
Copy Markdown
Member Author

@claude can you hear me now?

@claude

claude Bot commented Jun 23, 2026

Copy link
Copy Markdown

Yes, I can hear you! 👋

My previous review is above — the only finding was that is_transient_classifies_retryable_errors in uplink.rs is missing an assertion for Error::Http(_), which is_transient() classifies as retryable. Everything else looks good.

Let me know if you'd like me to implement the fix or if you have any questions!

View job run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant