|
| 1 | +# CI/CD Troubleshooting Guide |
| 2 | + |
| 3 | +This guide helps resolve common issues with GitHub Actions workflows. |
| 4 | + |
| 5 | +## 🚨 Current GitHub Service Issues |
| 6 | + |
| 7 | +### Cache Service Down (400 errors) |
| 8 | +**Symptoms:** |
| 9 | +- `Failed to restore: Cache service responded with 400` |
| 10 | +- `Failed to save: <h2>Our services aren't available right now</h2>` |
| 11 | + |
| 12 | +**Solutions:** |
| 13 | +1. **Use Resilient Build**: Run `build-resilient.yml` workflow |
| 14 | + - Go to Actions → "Resilient Build (No External Dependencies)" |
| 15 | + - Check "Skip caching" option |
| 16 | + - This workflow works without GitHub cache service |
| 17 | + |
| 18 | +2. **Manual workaround**: |
| 19 | + ```bash |
| 20 | + # Locally pre-download FFmpeg |
| 21 | + ./scripts/download-ffmpeg.sh |
| 22 | + git add src/vendor/ffmpeg |
| 23 | + git commit -m "Add pre-downloaded FFmpeg for CI" |
| 24 | + git push |
| 25 | + ``` |
| 26 | + |
| 27 | +### Artifact Upload Failures |
| 28 | +**Symptoms:** |
| 29 | +- `Failed to CreateArtifact: Unable to make request: ENOTFOUND` |
| 30 | +- `The operation was canceled` |
| 31 | + |
| 32 | +**Solutions:** |
| 33 | +1. **Check build success**: Even if upload fails, build may have succeeded |
| 34 | +2. **Use resilient workflow**: Artifacts are optional in `build-resilient.yml` |
| 35 | +3. **Manual download**: SSH into runner and download artifacts manually |
| 36 | + |
| 37 | +## 🔧 Common Build Issues |
| 38 | + |
| 39 | +### FFmpeg Download Failures |
| 40 | +**Symptoms:** |
| 41 | +- `tar: Pattern matching characters used in file names` |
| 42 | +- `Download script failed with exit code: 2` |
| 43 | + |
| 44 | +**Solutions:** |
| 45 | +1. **Use PowerShell script** (Windows): |
| 46 | + ```powershell |
| 47 | + .\scripts\download-ffmpeg-windows.ps1 |
| 48 | + ``` |
| 49 | + |
| 50 | +2. **Fix tar patterns** (Unix): |
| 51 | + ```bash |
| 52 | + # Use updated script without wildcards |
| 53 | + chmod +x ./scripts/download-ffmpeg.sh |
| 54 | + ./scripts/download-ffmpeg.sh |
| 55 | + ``` |
| 56 | + |
| 57 | +### Windows Build Slow |
| 58 | +**Symptoms:** |
| 59 | +- Windows builds taking 5+ minutes |
| 60 | +- Timeout errors on Windows |
| 61 | + |
| 62 | +**Solutions:** |
| 63 | +1. **Use optimized Windows workflow**: |
| 64 | + - Actions → "Windows Build (Optimized)" |
| 65 | + - Uses PowerShell and parallel compilation |
| 66 | + |
| 67 | +2. **Enable parallel builds**: |
| 68 | + ```bash |
| 69 | + zig build --parallel -Doptimize=ReleaseFast |
| 70 | + ``` |
| 71 | + |
| 72 | +### macOS Runner Issues |
| 73 | +**Symptoms:** |
| 74 | +- `macos-13` vs `macos-latest` confusion |
| 75 | +- ARM64 vs Intel build issues |
| 76 | + |
| 77 | +**Solutions:** |
| 78 | +1. **Use specific runners**: |
| 79 | + - Intel: `macos-13` |
| 80 | + - ARM64: `macos-latest` |
| 81 | + |
| 82 | +2. **Check target architecture**: |
| 83 | + ```bash |
| 84 | + zig build -Dtarget=x86_64-macos # Intel |
| 85 | + zig build -Dtarget=aarch64-macos # ARM64 |
| 86 | + ``` |
| 87 | + |
| 88 | +## 🛠️ Workflow Selection Guide |
| 89 | + |
| 90 | +### When to use each workflow: |
| 91 | + |
| 92 | +| Situation | Recommended Workflow | Reason | |
| 93 | +|-----------|---------------------|---------| |
| 94 | +| **Normal development** | `quick-test.yml` | Fast feedback on PRs | |
| 95 | +| **GitHub cache is down** | `build-resilient.yml` | No external dependencies | |
| 96 | +| **Windows optimization needed** | `windows-build.yml` | PowerShell + parallel builds | |
| 97 | +| **Full testing** | `test.yml` | Comprehensive platform testing | |
| 98 | +| **Creating releases** | `manual-release.yml` | Version control + artifacts | |
| 99 | +| **Emergency builds** | `build-resilient.yml` | Works in any conditions | |
| 100 | + |
| 101 | +## 🔍 Debugging Steps |
| 102 | + |
| 103 | +### 1. Check GitHub Status |
| 104 | +- Visit [GitHub Status](https://www.githubstatus.com/) |
| 105 | +- Look for issues with Actions, Packages, or API |
| 106 | + |
| 107 | +### 2. Identify the Problem |
| 108 | +```bash |
| 109 | +# Check recent workflow runs |
| 110 | +gh run list --limit 5 |
| 111 | + |
| 112 | +# View specific run details |
| 113 | +gh run view <run-id> |
| 114 | + |
| 115 | +# Download logs |
| 116 | +gh run download <run-id> |
| 117 | +``` |
| 118 | + |
| 119 | +### 3. Try Alternative Workflows |
| 120 | +```bash |
| 121 | +# If normal build fails, try resilient |
| 122 | +gh workflow run build-resilient.yml |
| 123 | + |
| 124 | +# For Windows-specific issues |
| 125 | +gh workflow run windows-build.yml |
| 126 | + |
| 127 | +# For quick validation |
| 128 | +gh workflow run quick-test.yml |
| 129 | +``` |
| 130 | + |
| 131 | +### 4. Local Reproduction |
| 132 | +```bash |
| 133 | +# Test locally first |
| 134 | +zig build test |
| 135 | +zig build -Dtarget=x86_64-linux -Doptimize=ReleaseSafe |
| 136 | + |
| 137 | +# Test FFmpeg download |
| 138 | +./scripts/download-ffmpeg.sh |
| 139 | +``` |
| 140 | + |
| 141 | +## 📊 Performance Expectations |
| 142 | + |
| 143 | +### Normal Conditions (with cache): |
| 144 | +- **Quick tests**: 1-2 minutes |
| 145 | +- **Full build**: 2-4 minutes |
| 146 | +- **Windows build**: 2-3 minutes |
| 147 | +- **Release**: 8-12 minutes |
| 148 | + |
| 149 | +### Degraded Conditions (no cache): |
| 150 | +- **Quick tests**: 3-4 minutes |
| 151 | +- **Full build**: 5-8 minutes |
| 152 | +- **Windows build**: 4-6 minutes |
| 153 | +- **Release**: 15-20 minutes |
| 154 | + |
| 155 | +## 🆘 Emergency Procedures |
| 156 | + |
| 157 | +### Complete GitHub Actions Outage |
| 158 | +1. **Build locally**: |
| 159 | + ```bash |
| 160 | + ./scripts/download-ffmpeg.sh |
| 161 | + zig build -Doptimize=ReleaseSafe |
| 162 | + ``` |
| 163 | + |
| 164 | +2. **Create manual release**: |
| 165 | + ```bash |
| 166 | + # Build all platforms locally |
| 167 | + zig build -Dtarget=x86_64-linux -Doptimize=ReleaseSafe |
| 168 | + zig build -Dtarget=x86_64-windows -Doptimize=ReleaseSafe |
| 169 | + zig build -Dtarget=x86_64-macos -Doptimize=ReleaseSafe |
| 170 | + zig build -Dtarget=aarch64-macos -Doptimize=ReleaseSafe |
| 171 | + |
| 172 | + # Upload manually via GitHub web interface |
| 173 | + ``` |
| 174 | + |
| 175 | +### Cache Corruption |
| 176 | +1. **Clear all caches**: |
| 177 | + - Go to repository Settings → Actions → Caches |
| 178 | + - Delete all cache entries |
| 179 | + - Run `build-resilient.yml` to rebuild fresh |
| 180 | + |
| 181 | +2. **Update cache keys**: |
| 182 | + ```yaml |
| 183 | + # Increment version in cache keys |
| 184 | + key: ffmpeg-binaries-${{ matrix.os }}-v2 # was v1 |
| 185 | + ``` |
| 186 | +
|
| 187 | +## 📞 Getting Help |
| 188 | +
|
| 189 | +1. **Check this guide first** |
| 190 | +2. **Review workflow logs** in GitHub Actions |
| 191 | +3. **Try resilient workflows** before reporting issues |
| 192 | +4. **Check GitHub Status** for service outages |
| 193 | +5. **Open issue** with full error logs and context |
| 194 | +
|
| 195 | +## 🔄 Recovery Checklist |
| 196 | +
|
| 197 | +- [ ] Identified the failing component (cache/artifacts/build) |
| 198 | +- [ ] Checked GitHub Status for known issues |
| 199 | +- [ ] Tried appropriate alternative workflow |
| 200 | +- [ ] Verified local build works |
| 201 | +- [ ] Documented the issue for future reference |
0 commit comments