|
| 1 | +# Runbook: Migrate to a New Server |
| 2 | + |
| 3 | +## When to Use |
| 4 | + |
| 5 | +- Server hardware failure or end-of-life |
| 6 | +- Cloud provider migration |
| 7 | +- Upgrading to a larger instance |
| 8 | + |
| 9 | +## Prerequisites |
| 10 | + |
| 11 | +- SSH access to both old and new servers as the `deploy` user |
| 12 | +- DNS control for all app domains and the ops domain |
| 13 | +- Backup encryption key (if encrypted backups are enabled) |
| 14 | +- GitHub deploy key or SSH key for repo access on the new server |
| 15 | + |
| 16 | +## Steps |
| 17 | + |
| 18 | +### 1. Inventory the old server |
| 19 | + |
| 20 | +SSH into the old server and record what is running: |
| 21 | + |
| 22 | +```bash |
| 23 | +ssh deploy@<old-server-ip> |
| 24 | +``` |
| 25 | + |
| 26 | +List running apps and their deploy slots: |
| 27 | + |
| 28 | +```bash |
| 29 | +for dir in /opt/apps/*/; do |
| 30 | + app=$(basename "$dir") |
| 31 | + slot=$(cat "$dir/.deploy-slot" 2>/dev/null || echo "none") |
| 32 | + echo "$app slot=$slot" |
| 33 | +done |
| 34 | +``` |
| 35 | + |
| 36 | +List app domains from the Caddyfile: |
| 37 | + |
| 38 | +```bash |
| 39 | +cat /opt/platform/Caddyfile |
| 40 | +``` |
| 41 | + |
| 42 | +Record platform environment variables: |
| 43 | + |
| 44 | +```bash |
| 45 | +cat /opt/platform/.env |
| 46 | +``` |
| 47 | + |
| 48 | +List per-app credential files: |
| 49 | + |
| 50 | +```bash |
| 51 | +ls /opt/platform/credentials/ |
| 52 | +``` |
| 53 | + |
| 54 | +List cron jobs: |
| 55 | + |
| 56 | +```bash |
| 57 | +crontab -l |
| 58 | +``` |
| 59 | + |
| 60 | +### 2. Create fresh backups |
| 61 | + |
| 62 | +Run the backup script for every app database: |
| 63 | + |
| 64 | +```bash |
| 65 | +bash /opt/platform/infrastructure/backup-postgres.sh |
| 66 | +``` |
| 67 | + |
| 68 | +Verify backups were created: |
| 69 | + |
| 70 | +```bash |
| 71 | +ls -lh /data/backups/postgres/ |
| 72 | +``` |
| 73 | + |
| 74 | +### 3. Transfer backups and credentials to your local machine |
| 75 | + |
| 76 | +```bash |
| 77 | +# From your local machine: |
| 78 | +scp -r deploy@<old-server-ip>:/data/backups/postgres/ ./migration-backups/ |
| 79 | +scp -r deploy@<old-server-ip>:/opt/platform/.env ./migration-platform.env |
| 80 | +scp -r deploy@<old-server-ip>:/opt/platform/credentials/ ./migration-credentials/ |
| 81 | +``` |
| 82 | + |
| 83 | +If encrypted backups are enabled, also copy the encryption key: |
| 84 | + |
| 85 | +```bash |
| 86 | +scp deploy@<old-server-ip>:<path-to-encryption-key> ./migration-backup-key |
| 87 | +``` |
| 88 | + |
| 89 | +### 4. Bootstrap the new server |
| 90 | + |
| 91 | +On the new server, run the bootstrap script with the same env vars used for the old server: |
| 92 | + |
| 93 | +```bash |
| 94 | +sudo ACME_EMAIL=<your-email> OPS_DOMAIN=<ops.example.com> ALERT_REPO=<org/repo> \ |
| 95 | + bash infrastructure/bootstrap-server.sh |
| 96 | +``` |
| 97 | + |
| 98 | +Wait for all platform containers to become healthy: |
| 99 | + |
| 100 | +```bash |
| 101 | +docker ps --format "table {{.Names}}\t{{.Status}}" |
| 102 | +``` |
| 103 | + |
| 104 | +### 5. Copy credentials to the new server |
| 105 | + |
| 106 | +```bash |
| 107 | +# From your local machine: |
| 108 | +scp ./migration-platform.env deploy@<new-server-ip>:/opt/platform/.env |
| 109 | +scp -r ./migration-credentials/ deploy@<new-server-ip>:/opt/platform/credentials/ |
| 110 | +``` |
| 111 | + |
| 112 | +If using backup encryption, copy the key: |
| 113 | + |
| 114 | +```bash |
| 115 | +scp ./migration-backup-key deploy@<new-server-ip>:<path-to-encryption-key> |
| 116 | +``` |
| 117 | + |
| 118 | +Restart platform containers so they pick up the restored credentials: |
| 119 | + |
| 120 | +```bash |
| 121 | +ssh deploy@<new-server-ip> |
| 122 | +cd /opt/platform |
| 123 | +docker compose down && docker compose up -d |
| 124 | +``` |
| 125 | + |
| 126 | +### 6. Restore databases |
| 127 | + |
| 128 | +Copy backup files to the new server: |
| 129 | + |
| 130 | +```bash |
| 131 | +# From your local machine: |
| 132 | +scp -r ./migration-backups/ deploy@<new-server-ip>:/data/backups/postgres/ |
| 133 | +``` |
| 134 | + |
| 135 | +On the new server, restore each app database: |
| 136 | + |
| 137 | +```bash |
| 138 | +ssh deploy@<new-server-ip> |
| 139 | +bash /opt/platform/infrastructure/restore-postgres.sh --yes <backup-file> |
| 140 | +``` |
| 141 | + |
| 142 | +Verify each restored database: |
| 143 | + |
| 144 | +```bash |
| 145 | +bash /opt/platform/infrastructure/verify-backup.sh <database-name> |
| 146 | +``` |
| 147 | + |
| 148 | +### 7. Clone and configure apps |
| 149 | + |
| 150 | +For each app, clone the repo and set up the deploy directory: |
| 151 | + |
| 152 | +```bash |
| 153 | +cd /opt/apps |
| 154 | +git clone git@github.com:towlion/<app-name>.git <app-name> |
| 155 | +cd <app-name> |
| 156 | +``` |
| 157 | + |
| 158 | +Write the app's `deploy/.env` using credentials from `/opt/platform/credentials/<app-name>`: |
| 159 | + |
| 160 | +```bash |
| 161 | +cp deploy/env.template deploy/.env |
| 162 | +# Edit deploy/.env with the correct DATABASE_URL, S3 credentials, JWT_SECRET, etc. |
| 163 | +``` |
| 164 | + |
| 165 | +Set the initial deploy slot: |
| 166 | + |
| 167 | +```bash |
| 168 | +echo "blue" > .deploy-slot |
| 169 | +``` |
| 170 | + |
| 171 | +### 8. Deploy apps |
| 172 | + |
| 173 | +Run the blue-green deploy script for each app: |
| 174 | + |
| 175 | +```bash |
| 176 | +bash /opt/platform/infrastructure/deploy-blue-green.sh \ |
| 177 | + <app-name> /opt/apps/<app-name> <app-domain> "<caddyfile-content>" |
| 178 | +``` |
| 179 | + |
| 180 | +Alternatively, trigger deploys via GitHub Actions once GitHub secrets are updated (step 12). |
| 181 | + |
| 182 | +### 9. Verify on the new server |
| 183 | + |
| 184 | +Check that all platform containers are healthy: |
| 185 | + |
| 186 | +```bash |
| 187 | +docker ps --format "table {{.Names}}\t{{.Status}}" |
| 188 | +``` |
| 189 | + |
| 190 | +Check health endpoints for each app (using the server IP directly, since DNS still points to the old server): |
| 191 | + |
| 192 | +```bash |
| 193 | +curl -sk --resolve <app-domain>:443:<new-server-ip> https://<app-domain>/health |
| 194 | +``` |
| 195 | + |
| 196 | +Verify Grafana is accessible: |
| 197 | + |
| 198 | +```bash |
| 199 | +curl -sk --resolve <ops-domain>:443:<new-server-ip> https://<ops-domain>/ |
| 200 | +``` |
| 201 | + |
| 202 | +Verify cron jobs are in place: |
| 203 | + |
| 204 | +```bash |
| 205 | +crontab -l |
| 206 | +``` |
| 207 | + |
| 208 | +### 10. Switch DNS |
| 209 | + |
| 210 | +Update A records for all domains to point to the new server IP: |
| 211 | + |
| 212 | +- Each app domain (e.g., `app.example.com`, `app2.example.com`) |
| 213 | +- The ops domain (e.g., `ops.example.com`) |
| 214 | +- Preview wildcard record (e.g., `*.preview.example.com`) |
| 215 | + |
| 216 | +DNS propagation typically takes minutes but can take up to 48 hours depending on TTL. Consider lowering TTL values a day before the migration. |
| 217 | + |
| 218 | +### 11. Verify TLS |
| 219 | + |
| 220 | +After DNS propagates, Caddy will automatically provision TLS certificates. Monitor the Caddy logs: |
| 221 | + |
| 222 | +```bash |
| 223 | +docker logs -f platform-caddy-1 |
| 224 | +``` |
| 225 | + |
| 226 | +Test HTTPS on all domains: |
| 227 | + |
| 228 | +```bash |
| 229 | +curl -s https://<app-domain>/health |
| 230 | +curl -s https://<ops-domain>/ |
| 231 | +``` |
| 232 | + |
| 233 | +Verify certificates are valid: |
| 234 | + |
| 235 | +```bash |
| 236 | +echo | openssl s_client -connect <app-domain>:443 -servername <app-domain> 2>/dev/null | openssl x509 -noout -dates |
| 237 | +``` |
| 238 | + |
| 239 | +### 12. Update GitHub secrets |
| 240 | + |
| 241 | +In each app repository, update the following secrets to point to the new server: |
| 242 | + |
| 243 | +- `SERVER_HOST` — new server IP |
| 244 | +- `SERVER_SSH_KEY` — SSH private key for the new server's `deploy` user |
| 245 | + |
| 246 | +```bash |
| 247 | +# Using the GitHub CLI: |
| 248 | +gh secret set SERVER_HOST --repo towlion/<app-name> --body "<new-server-ip>" |
| 249 | +gh secret set SERVER_SSH_KEY --repo towlion/<app-name> < ~/.ssh/<new-server-key> |
| 250 | +``` |
| 251 | + |
| 252 | +Trigger a test deploy on one app to confirm the pipeline works end-to-end. |
| 253 | + |
| 254 | +### 13. Decommission the old server |
| 255 | + |
| 256 | +Keep the old server running for 48-72 hours as a safety net. During this period: |
| 257 | + |
| 258 | +- Monitor the new server for errors |
| 259 | +- Confirm all deploys go to the new server |
| 260 | +- Verify backups run successfully on the new server |
| 261 | + |
| 262 | +Once satisfied, tear down the old server: |
| 263 | + |
| 264 | +```bash |
| 265 | +ssh deploy@<old-server-ip> |
| 266 | +# Stop all containers |
| 267 | +cd /opt/platform && docker compose down |
| 268 | +for dir in /opt/apps/*/; do |
| 269 | + app=$(basename "$dir") |
| 270 | + docker compose -p "$app" -f "$dir/deploy/docker-compose.yml" down |
| 271 | +done |
| 272 | +``` |
| 273 | + |
| 274 | +Then delete or destroy the old server instance through your cloud provider. |
| 275 | + |
| 276 | +## Rollback |
| 277 | + |
| 278 | +If issues arise after the DNS switch: |
| 279 | + |
| 280 | +- **Revert DNS** — Point A records back to the old server IP. The old server remains fully functional until explicitly decommissioned. |
| 281 | +- **Investigate** — SSH into the new server and check logs, health endpoints, and container status. |
| 282 | + |
| 283 | +## Verification Checklist |
| 284 | + |
| 285 | +- [ ] All platform containers healthy (`docker ps`) |
| 286 | +- [ ] All app health endpoints return 200 |
| 287 | +- [ ] Grafana accessible at ops domain |
| 288 | +- [ ] Backup cron running (`crontab -l`) |
| 289 | +- [ ] GitHub Actions deploys targeting new server |
| 290 | +- [ ] TLS certificates provisioned for all domains |
| 291 | +- [ ] Preview environment DNS (wildcard record) updated |
| 292 | + |
| 293 | +## Notes |
| 294 | + |
| 295 | +- Plan the migration during a low-traffic window to minimize impact. |
| 296 | +- If you lower DNS TTL before migration, remember to restore it afterward. |
| 297 | +- The old server's backups remain available as an additional safety net during the transition period. |
0 commit comments