Skip to content

Commit 4f6a7b7

Browse files
baijumclaude
andcommitted
feat: full platform hardening pass — backup encryption, log rotation, Trivy alerts, platform credential rotation
- Backup encryption: optional AES-256-CBC via BACKUP_ENCRYPTION_KEY env var - Restore/verify: auto-detect and decrypt .dump.enc files - Trivy scan-images.sh: create GitHub Issues when vulnerabilities found - rotate-credentials.sh: add --platform flag for master credential rotation - bootstrap-server.sh: Loki retention 14d→90d, add logrotate config - server-contract.md: document all new capabilities - New runbook: rotate-ssh-keys.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 25f555d commit 4f6a7b7

8 files changed

Lines changed: 415 additions & 40 deletions

File tree

docs/runbooks/rotate-ssh-keys.md

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
# Runbook: Rotate SSH Keys
2+
3+
Rotate the SSH key pair used for deployments. Recommended schedule: every 90 days.
4+
5+
## Prerequisites
6+
7+
- Local machine with `ssh-keygen` and `ssh` installed
8+
- Current SSH access to the server as `deploy` user
9+
- `gh` CLI authenticated with access to all towlion app repos
10+
11+
## Steps
12+
13+
### 1. Generate a new key pair locally
14+
15+
```bash
16+
ssh-keygen -t ed25519 -C "deploy@towlion-$(date +%Y%m%d)" -f ~/.ssh/towlion-deploy-new
17+
```
18+
19+
### 2. Add the new public key to the server
20+
21+
```bash
22+
ssh deploy@<SERVER_HOST> "cat >> ~/.ssh/authorized_keys" < ~/.ssh/towlion-deploy-new.pub
23+
```
24+
25+
### 3. Test SSH access with the new key
26+
27+
```bash
28+
ssh -i ~/.ssh/towlion-deploy-new deploy@<SERVER_HOST> "echo 'New key works'"
29+
```
30+
31+
### 4. Update GitHub Actions secrets on all app repos
32+
33+
```bash
34+
NEW_KEY=$(cat ~/.ssh/towlion-deploy-new)
35+
36+
for repo in towlion/todo-app towlion/hello-world towlion/starter-app towlion/wit; do
37+
gh secret set SERVER_SSH_KEY --repo "$repo" --body "$NEW_KEY"
38+
echo "Updated $repo"
39+
done
40+
```
41+
42+
### 5. Verify a deployment works
43+
44+
Trigger a deploy on one app (e.g., push a no-op commit) and confirm it succeeds.
45+
46+
### 6. Remove the old public key from the server
47+
48+
```bash
49+
ssh -i ~/.ssh/towlion-deploy-new deploy@<SERVER_HOST>
50+
# On the server:
51+
# Edit ~/.ssh/authorized_keys and remove the old key line
52+
# The old key has a different comment/date than the new one
53+
```
54+
55+
### 7. Replace the local key file
56+
57+
```bash
58+
mv ~/.ssh/towlion-deploy-new ~/.ssh/towlion-deploy
59+
mv ~/.ssh/towlion-deploy-new.pub ~/.ssh/towlion-deploy.pub
60+
```
61+
62+
## Rollback
63+
64+
If the new key doesn't work:
65+
- The old key is still in `authorized_keys` until step 6
66+
- SSH in with the old key and remove the new public key
67+
- Re-set `SERVER_SSH_KEY` secrets to the old private key
68+
69+
## Verification
70+
71+
```bash
72+
# Confirm only one key in authorized_keys
73+
ssh deploy@<SERVER_HOST> "wc -l ~/.ssh/authorized_keys"
74+
# Should output: 1
75+
76+
# Confirm deploy works
77+
gh workflow run deploy.yml --repo towlion/hello-world
78+
```

docs/server-contract.md

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ This document defines the contract between the platform infrastructure (bootstra
5252
loki/ # Loki log storage
5353
grafana/ # Grafana state
5454
prometheus/ # Prometheus data (created always, used when metrics enabled)
55-
backups/postgres/ # pg_dump backup files (7-day retention)
55+
backups/postgres/ # pg_dump backup files (7-day retention, optional .dump.enc encryption)
5656
```
5757

5858
## Bootstrap to Deploy Lifecycle
@@ -104,15 +104,15 @@ All scripts live in the platform repo under `infrastructure/` and are copied to
104104
| `bootstrap-server.sh` | Transform fresh Debian into running platform | Manual (`sudo bash`) |
105105
| `verify-server.sh` | Read-only health check of server state | Manual (`bash`) |
106106
| `create-app-credentials.sh` | Provision per-app PostgreSQL user + MinIO bucket | Manual (`bash <script> <app-name>`) |
107-
| `backup-postgres.sh` | Per-database `pg_dump` with 7-day retention | Cron: daily at 02:00 |
107+
| `backup-postgres.sh` | Per-database `pg_dump` with 7-day retention, optional AES-256 encryption | Cron: daily at 02:00 |
108108
| `restore-postgres.sh` | Restore a database from backup | Manual (`bash <script>`) |
109109
| `check-alerts.sh` | Check container health, disk, memory; create GitHub Issues | Cron: every 5 minutes |
110110
| `update-images.sh` | Pull latest Docker images and recreate containers | Cron: weekly Sunday at 03:00 |
111111
| `usage-report.sh` | Generate 6-section resource usage report | Manual (`bash`) |
112-
| `scan-images.sh` | Scan running container images for vulnerabilities (Trivy) | Cron: weekly Sunday at 04:00 |
112+
| `scan-images.sh` | Scan running container images for vulnerabilities (Trivy), create GitHub Issues | Cron: weekly Sunday at 04:00 |
113113
| `deploy-blue-green.sh` | Zero-downtime blue-green deploy with automatic rollback | Called by `deploy.yml` workflow |
114114
| `verify-backup.sh` | Restore backups to temp DB and verify integrity | Cron: weekly Sunday at 05:00 |
115-
| `rotate-credentials.sh` | Rotate PostgreSQL/MinIO credentials without downtime | Manual (`bash <script> <app-name>`) |
115+
| `rotate-credentials.sh` | Rotate per-app or platform master credentials without downtime | Manual (`bash <script> <app-name>` or `--platform`) |
116116

117117
## Server Hardening
118118

@@ -140,6 +140,14 @@ The bootstrap script applies several security measures automatically. Self-hoste
140140

141141
**Docker event audit logging** — A systemd service (`docker-audit.service`) runs `docker events` with JSON output to `/var/log/docker-audit.log`. Promtail scrapes this file and forwards events to Loki (label: `job=docker-audit`). All container start, stop, die, and health_status events are captured.
142142

143+
**Backup encryption** — Backups can be encrypted at rest using AES-256-CBC. Set the `BACKUP_ENCRYPTION_KEY` environment variable to the path of a key file. When set, `backup-postgres.sh` pipes `pg_dump` output through `openssl enc` and produces `.dump.enc` files. `restore-postgres.sh` and `verify-backup.sh` automatically detect encrypted backups and decrypt them before restoring. If the key file is not set, backups are stored unencrypted (with a warning).
144+
145+
**Log rotation** — A logrotate config at `/etc/logrotate.d/towlion` rotates `/var/log/towlion-*.log` and `/var/log/docker-audit.log` daily, retaining 90 compressed copies. The `docker-audit.service` is restarted after rotation since it holds the log file open.
146+
147+
**Log retention** — Loki retains logs for 90 days (`retention_period: 2160h`). The compactor runs retention enforcement with a 2-hour delete delay.
148+
149+
**Platform credential rotation**`rotate-credentials.sh --platform` rotates the PostgreSQL superuser password and/or MinIO root password. After rotation, all app health checks are verified. Use `--yes` to skip the confirmation prompt.
150+
143151
**Image vulnerability scanning** — Trivy is installed via the Aqua Security apt repository. Every deploy runs a non-blocking `trivy image` scan of the newly built app image (HIGH/CRITICAL severity). A weekly cron job (`scan-images.sh`, Sunday 04:00) scans all running container images.
144152

145153
**Mandatory Access Control (AppArmor)** — Debian 12 ships with AppArmor enabled by default. Docker automatically applies the `docker-default` AppArmor profile to all containers, which restricts capabilities like writing to `/proc` and `/sys`, mounting filesystems, and accessing raw sockets. No configuration is needed — this works out of the box.

infrastructure/backup-postgres.sh

Lines changed: 34 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,13 +23,26 @@ error() {
2323
# Configuration
2424
BACKUP_DIR="/data/backups/postgres"
2525
COMPOSE_FILE="/opt/platform/docker-compose.yml"
26+
# BACKUP_ENCRYPTION_KEY: path to a key file for AES-256-CBC encryption (optional)
27+
ENCRYPTION_KEY="${BACKUP_ENCRYPTION_KEY:-}"
2628

2729
# Create backup directory if it doesn't exist
2830
mkdir -p "$BACKUP_DIR"
2931

3032
info "Starting PostgreSQL backup..."
3133
info "Backup directory: $BACKUP_DIR"
3234

35+
if [ -n "$ENCRYPTION_KEY" ] && [ -f "$ENCRYPTION_KEY" ]; then
36+
info "Encryption enabled (key file: $ENCRYPTION_KEY)"
37+
ENCRYPT=true
38+
elif [ -n "$ENCRYPTION_KEY" ]; then
39+
warn "BACKUP_ENCRYPTION_KEY is set but file not found: $ENCRYPTION_KEY"
40+
warn "Backups will NOT be encrypted"
41+
ENCRYPT=false
42+
else
43+
ENCRYPT=false
44+
fi
45+
3346
# List all databases except templates and postgres system database
3447
info "Fetching database list..."
3548
databases=$(docker compose -f "$COMPOSE_FILE" exec -T postgres psql -U postgres -tc \
@@ -48,12 +61,30 @@ backup_count=0
4861

4962
# Backup each database
5063
for db in $databases; do
51-
filename="${db}_$(date +%Y%m%d_%H%M%S).dump"
64+
timestamp=$(date +%Y%m%d_%H%M%S)
65+
if [ "$ENCRYPT" = true ]; then
66+
filename="${db}_${timestamp}.dump.enc"
67+
else
68+
filename="${db}_${timestamp}.dump"
69+
fi
5270
filepath="$BACKUP_DIR/$filename"
5371

5472
info "Backing up database: $db"
5573

56-
if docker compose -f "$COMPOSE_FILE" exec -T postgres pg_dump -U postgres -Fc "$db" > "$filepath"; then
74+
if [ "$ENCRYPT" = true ]; then
75+
dump_ok=false
76+
if docker compose -f "$COMPOSE_FILE" exec -T postgres pg_dump -U postgres -Fc "$db" \
77+
| openssl enc -aes-256-cbc -pbkdf2 -pass "file:${ENCRYPTION_KEY}" -out "$filepath"; then
78+
dump_ok=true
79+
fi
80+
else
81+
dump_ok=false
82+
if docker compose -f "$COMPOSE_FILE" exec -T postgres pg_dump -U postgres -Fc "$db" > "$filepath"; then
83+
dump_ok=true
84+
fi
85+
fi
86+
87+
if [ "$dump_ok" = true ]; then
5788
file_size=$(stat -f%z "$filepath" 2>/dev/null || stat -c%s "$filepath" 2>/dev/null || echo "0")
5889
human_size=$(numfmt --to=iec-i --suffix=B "$file_size" 2>/dev/null || echo "${file_size}B")
5990
info " ✓ Backed up $db to $filename ($human_size)"
@@ -68,7 +99,7 @@ done
6899

69100
# Retention: delete backups older than 7 days
70101
info "Cleaning up old backups (older than 7 days)..."
71-
deleted_count=$(find "$BACKUP_DIR" -name "*.dump" -mtime +7 -delete -print | wc -l | tr -d ' ')
102+
deleted_count=$(find "$BACKUP_DIR" \( -name "*.dump" -o -name "*.dump.enc" \) -mtime +7 -delete -print | wc -l | tr -d ' ')
72103
if [ "$deleted_count" -gt 0 ]; then
73104
info " Removed $deleted_count old backup(s)"
74105
else

infrastructure/bootstrap-server.sh

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -314,7 +314,7 @@ common:
314314
store: inmemory
315315
316316
limits_config:
317-
retention_period: 336h
317+
retention_period: 2160h
318318
319319
schema_config:
320320
configs:
@@ -329,6 +329,8 @@ schema_config:
329329
compactor:
330330
working_directory: /loki/compactor
331331
retention_enabled: true
332+
retention_delete_delay: 2h
333+
retention_delete_worker_count: 150
332334
delete_request_cancel_period: 10m
333335
delete_request_store: filesystem
334336
EOF
@@ -1329,6 +1331,27 @@ EOF
13291331
info "Promtail config updated with docker-audit scrape target"
13301332
fi
13311333

1334+
# --- Log Rotation ---
1335+
1336+
LOGROTATE_CONF="/etc/logrotate.d/towlion"
1337+
LOGROTATE_CONTENT='/var/log/towlion-*.log /var/log/docker-audit.log {
1338+
daily
1339+
rotate 90
1340+
compress
1341+
missingok
1342+
notifempty
1343+
postrotate
1344+
systemctl restart docker-audit.service 2>/dev/null || true
1345+
endscript
1346+
}'
1347+
1348+
if [[ -f "$LOGROTATE_CONF" ]] && echo "$LOGROTATE_CONTENT" | diff -q - "$LOGROTATE_CONF" >/dev/null 2>&1; then
1349+
info "Logrotate config already up to date"
1350+
else
1351+
echo "$LOGROTATE_CONTENT" > "$LOGROTATE_CONF"
1352+
info "Logrotate config created at $LOGROTATE_CONF"
1353+
fi
1354+
13321355
# --- Start Services ---
13331356

13341357
echo

infrastructure/restore-postgres.sh

Lines changed: 26 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -64,11 +64,34 @@ fi
6464

6565
# Determine target database
6666
if [ -z "$TARGET_DB" ]; then
67-
# Extract database name from filename pattern: <dbname>_YYYYMMDD_HHMMSS.dump
68-
TARGET_DB=$(basename "$BACKUP_FILE" | sed 's/_[0-9]\{8\}_[0-9]\{6\}\.dump$//')
67+
# Extract database name from filename pattern: <dbname>_YYYYMMDD_HHMMSS.dump[.enc]
68+
TARGET_DB=$(basename "$BACKUP_FILE" | sed 's/_[0-9]\{8\}_[0-9]\{6\}\.dump\(\.enc\)\?$//')
6969
info "Extracted database name from filename: $TARGET_DB"
7070
fi
7171

72+
# Handle encrypted backups
73+
RESTORE_FILE="$BACKUP_FILE"
74+
DECRYPTED_TEMP=""
75+
if [[ "$BACKUP_FILE" == *.dump.enc ]]; then
76+
ENCRYPTION_KEY="${BACKUP_ENCRYPTION_KEY:-}"
77+
if [ -z "$ENCRYPTION_KEY" ] || [ ! -f "$ENCRYPTION_KEY" ]; then
78+
error "Encrypted backup detected but BACKUP_ENCRYPTION_KEY is not set or file not found"
79+
exit 1
80+
fi
81+
DECRYPTED_TEMP=$(mktemp /tmp/restore_XXXXXXXXXX.dump)
82+
info "Decrypting backup..."
83+
if openssl enc -d -aes-256-cbc -pbkdf2 -pass "file:${ENCRYPTION_KEY}" -in "$BACKUP_FILE" -out "$DECRYPTED_TEMP"; then
84+
info "Backup decrypted to temp file"
85+
RESTORE_FILE="$DECRYPTED_TEMP"
86+
else
87+
rm -f "$DECRYPTED_TEMP"
88+
error "Failed to decrypt backup"
89+
exit 1
90+
fi
91+
# Clean up temp file on exit
92+
trap 'rm -f "$DECRYPTED_TEMP"' EXIT
93+
fi
94+
7295
# Confirmation
7396
if [ "$SKIP_CONFIRM" = false ]; then
7497
warn "This will DROP and recreate database '$TARGET_DB'"
@@ -102,7 +125,7 @@ fi
102125

103126
# Restore backup
104127
info "Restoring backup to $TARGET_DB..."
105-
if cat "$BACKUP_FILE" | docker compose -f "$COMPOSE_FILE" exec -T postgres pg_restore -U postgres -d "$TARGET_DB" --no-owner --no-acl; then
128+
if cat "$RESTORE_FILE" | docker compose -f "$COMPOSE_FILE" exec -T postgres pg_restore -U postgres -d "$TARGET_DB" --no-owner --no-acl; then
106129
info " ✓ Backup restored successfully"
107130
else
108131
error "Failed to restore backup"

0 commit comments

Comments
 (0)