|
| 1 | +# Case Study: Issue #261 - Rust CI/CD Pipeline Failure (No Crates Release, No GitHub Release) |
| 2 | + |
| 3 | +## Summary |
| 4 | + |
| 5 | +The Rust CI/CD auto-release pipeline successfully published crate `link-assistant-agent@0.9.2` to crates.io but then failed verification due to crates.io propagation delay. This caused the pipeline to exit with error, preventing GitHub Release creation. |
| 6 | + |
| 7 | +## Timeline of Events |
| 8 | + |
| 9 | +**CI Run:** [#24334488286](https://github.com/link-assistant/agent/actions/runs/24334488286/job/71048047065) |
| 10 | +**Date:** 2026-04-13 |
| 11 | + |
| 12 | +| Time (UTC) | Event | |
| 13 | +|---|---| |
| 14 | +| 08:54:25 | Auto Release job starts | |
| 15 | +| 08:54:32 | Detects 2 changelog fragments (patch bump) | |
| 16 | +| 08:54:34 | Bumps version to 0.9.2, commits and tags `rust-v0.9.2` | |
| 17 | +| 08:54:35 | Pushes changes and tags to origin | |
| 18 | +| 08:55:42 | `publish-to-crates.mjs` starts, detects crate doesn't exist yet on crates.io | |
| 19 | +| 08:55:42 | Publish attempt 1 starts (`cargo publish`) | |
| 20 | +| 08:56:19 | `cargo publish` succeeds (exit code 0), waits 5s for propagation | |
| 21 | +| 08:56:24 | Verification fails - crate not found on crates.io API yet (propagation delay) | |
| 22 | +| 08:56:24 | Script treats verification failure as publish failure, waits 10s | |
| 23 | +| 08:56:34 | Publish attempt 2 - gets `error: crate link-assistant-agent@0.9.2 already exists on crates.io index` | |
| 24 | +| 08:56:34 | Script matches `error: ` failure pattern, doesn't recognize "already exists" as success | |
| 25 | +| 08:56:44 | Publish attempt 3 - same "already exists" error | |
| 26 | +| 08:56:44 | Script exits with code 1, `published=false` | |
| 27 | +| 08:56:44 | GitHub Release step skipped (gated on `published == 'true'`) | |
| 28 | + |
| 29 | +## Root Causes |
| 30 | + |
| 31 | +### Root Cause 1: Insufficient crates.io propagation wait time |
| 32 | +- Script waited only 5 seconds for crates.io API propagation |
| 33 | +- The crate was actually published but the API hadn't updated yet |
| 34 | +- crates.io can take 10-30+ seconds to propagate depending on load |
| 35 | + |
| 36 | +### Root Cause 2: "already exists" error not recognized as success |
| 37 | +- On retry, `cargo publish` returns: `error: crate link-assistant-agent@0.9.2 already exists on crates.io index` |
| 38 | +- The `detectPublishFailure()` function matched `error: ` pattern first |
| 39 | +- The `crate already uploaded` pattern didn't match this different error wording |
| 40 | +- The script had no concept of "already exists on index" being a success case |
| 41 | + |
| 42 | +### Root Cause 3: No verification retries |
| 43 | +- Verification was a single check after a fixed 5s delay |
| 44 | +- No retry mechanism for verification (only for the publish command itself) |
| 45 | + |
| 46 | +### Root Cause 4: GitHub Release gated solely on publish output |
| 47 | +- Workflow condition `steps.publish.outputs.published == 'true'` meant any publish failure blocked GitHub Release |
| 48 | +- Even when the crate WAS published, the script exited with error so the output was `false` |
| 49 | + |
| 50 | +### Root Cause 5: No graceful handling of existing GitHub releases |
| 51 | +- `create-github-release.mjs` would fail fatally if the release tag already existed |
| 52 | +- No recovery path for re-running the pipeline |
| 53 | + |
| 54 | +## Solutions Applied |
| 55 | + |
| 56 | +### Fix 1: Recognize "already exists" as successful publish |
| 57 | +- Added `ALREADY_EXISTS_PATTERNS` array with common crates.io "already exists" messages |
| 58 | +- `detectAlreadyExists()` checks these patterns before `detectPublishFailure()` |
| 59 | +- When detected, script sets `published=true` and `already_published=true` |
| 60 | + |
| 61 | +### Fix 2: Improved verification with retries |
| 62 | +- Increased initial propagation delay from 5s to 15s |
| 63 | +- Added verification retry loop (3 attempts with 10s between retries) |
| 64 | +- If cargo publish exits with code 0 but verification can't confirm, treats as success (trusting cargo's exit code) |
| 65 | + |
| 66 | +### Fix 3: Graceful GitHub Release creation |
| 67 | +- `create-github-release.mjs` now catches "already exists" / "Validation Failed" errors |
| 68 | +- Skips creation silently instead of failing |
| 69 | + |
| 70 | +### Fix 4: Decoupled workflow conditions |
| 71 | +- GitHub Release step now runs if `should_release == 'true'` AND either `published == 'true'` OR `publish.outcome == 'success'` |
| 72 | +- This ensures GitHub Release is created even in edge cases |
| 73 | + |
| 74 | +## Best Practices Applied (from reference repos) |
| 75 | + |
| 76 | +Referenced from: |
| 77 | +- [rust-ai-driven-development-pipeline-template](https://github.com/link-foundation/rust-ai-driven-development-pipeline-template) |
| 78 | +- [mem-rs](https://github.com/linksplatform/mem-rs) |
| 79 | + |
| 80 | +1. **Treat crates.io as authoritative source** - check actual API, not just git tags |
| 81 | +2. **"Already exists" is success** - following mem-rs graceful pattern |
| 82 | +3. **Trust cargo exit code** - if `cargo publish` exits 0, the publish succeeded even if API is slow |
| 83 | +4. **Idempotent release creation** - GitHub Release creation handles "already exists" gracefully |
| 84 | +5. **Verification with backoff** - multiple verification attempts with increasing delays |
0 commit comments