-
Notifications
You must be signed in to change notification settings - Fork 13
Maintenance
We set up cron jobs on remote instances to back up postgres database periodically.
To check existing cron jobs, run crontab -l.
To edit cron jobs, including adding new ones, run crontab -e.
The command to execute postgres database back, as listed below, should be put on the instance where the postgres container is running. On staging, it is a single instance that runs all docker containers, so there's no confusion. On production, always check which node runs postgres and it is usually the vGPU instance.
0 0,12 * * * docker exec $(docker ps -q -f name=postgres) backup
The backup files are stored in the docker volume with name "backup" but on production, this docker volume is actually kept on the data instance and shared with other instances. To access this docker volume, you should go to the data instance for production. If the setting in production.sample is used, then it is under the directory /var/lib/docker/volumes/rodan_pg_backup/_data.
However, since we have many users and projects, a single backup file is around 1.3GB in size. This means we need to clean up old backup files to free up space on both staging and production. It is probably a good practice to check and do this twice a year, at least.
To do this, we follow a simple rule that we keep less frequent backups for older files. Currently we use this script that keeps only one backup file per month for 2023. This script is generated by ChatGPT so modify and use with caution if needed.
While no cronjob for this exists at the time of writing, it is possible to backup from and restore to the postgres database in Rodan using scripts. For the following, assume the postgres container has ID 612b4ae59567.
Create a Backup
This stores a backup with the name backup_YYYY_MM_DDTHH_MM_SS.sql.gz.
docker exec -it 612b4ae59567 backupSee a list of Backups
docker exec -it 612b4ae59567 backupsRestore a Backup
For a backup with the name backup_YYYY_MM_DDTHH_MM_SS.sql.gz,
docker exec -it 612b4ae59567 restore backup_YYYY_MM_DDTHH_MM_SS.sql.gzOn staging, these backups will persist in a separate volume so they are not tied to a specific postgres container.
Put this script in /var/lib/docker/volumes/rodan_pg_backup/_data, execute it, and follow the steps in the output.
#!/bin/bash
# Time-based backup retention script
# - Keep all backups from the last 7 days
# - Keep the first backup per week for the last 3 months
# - Keep two backups per month (first + middle) for older backups
# - Delete everything else
deletion_plan="backup_deletion_plan_$(date +%Y%m%d_%H%M%S).txt"
echo "Backup Deletion Plan - Created $(date)" > "$deletion_plan"
echo "----------------------------------------" >> "$deletion_plan"
declare -A keep_files
# Get current timestamp
now=$(date +%s)
seven_days=$((7 * 24 * 3600))
three_months=$((90 * 24 * 3600))
# Gather all backups (assumes filename format: backup_YYYY_MM_DD.sql.gz)
backups=( $(ls backup_*.sql.gz 2>/dev/null | sort) )
if [ ${#backups[@]} -eq 0 ]; then
echo "No backups found."
exit 0
fi
echo "Analyzing ${#backups[@]} backups..." >> "$deletion_plan"
# Helper: extract date from filename
get_date_from_filename() {
echo "$1" | grep -oE '[0-9]{4}_[0-9]{2}_[0-9]{2}'
}
# Convert YYYY_MM_DD → Unix timestamp
date_to_timestamp() {
date -d "${1//_/\/}" +%s
}
# --- Pass 1: Keep all backups from the last 7 days ---
echo "Keeping all backups from the last 7 days:" >> "$deletion_plan"
for file in "${backups[@]}"; do
file_date=$(get_date_from_filename "$file")
ts=$(date_to_timestamp "$file_date")
age=$((now - ts))
if (( age <= seven_days )); then
keep_files["$file"]=1
echo "KEEP (recent): $file" >> "$deletion_plan"
fi
done
# --- Pass 2: Keep the FIRST backup per week for the last 3 months ---
echo "" >> "$deletion_plan"
echo "Keeping the first backup per week for the last 3 months:" >> "$deletion_plan"
declare -A weeks_kept
for file in "${backups[@]}"; do
file_date=$(get_date_from_filename "$file")
ts=$(date_to_timestamp "$file_date")
age=$((now - ts))
if (( age > seven_days && age <= three_months )); then
week_key=$(date -d "${file_date//_/\/}" +%Y-%U) # Year-Week number
# Only keep the first encountered backup for that week
if [[ -z ${weeks_kept["$week_key"]} ]]; then
keep_files["$file"]=1
weeks_kept["$week_key"]=1
echo "KEEP (weekly first): $file" >> "$deletion_plan"
fi
fi
done
# --- Pass 3: Keep two backups per month (first + middle) for older backups ---
echo "" >> "$deletion_plan"
echo "Keeping two backups per month for older data (first + middle):" >> "$deletion_plan"
# Identify unique months with backups older than 3 months
months=($(for file in "${backups[@]}"; do
file_date=$(get_date_from_filename "$file")
ts=$(date_to_timestamp "$file_date")
age=$((now - ts))
if (( age > three_months )); then
echo "$file_date" | cut -d'_' -f1-2 # YYYY_MM
fi
done | sort -u))
for month in "${months[@]}"; do
# Get all backups for this month
month_files=( $(ls backup_${month}_*.sql.gz 2>/dev/null | sort) )
if [ ${#month_files[@]} -eq 0 ]; then
continue
fi
# Keep the first backup of the month
first_backup="${month_files[0]}"
keep_files["$first_backup"]=1
echo "KEEP (monthly first): $first_backup" >> "$deletion_plan"
# If there are multiple backups, keep one from the middle
if [ ${#month_files[@]} -gt 1 ]; then
middle_index=$(( ${#month_files[@]} / 2 ))
middle_backup="${month_files[$middle_index]}"
if [ "$middle_backup" != "$first_backup" ]; then
keep_files["$middle_backup"]=1
echo "KEEP (monthly middle): $middle_backup" >> "$deletion_plan"
fi
fi
done
# --- Deletion list ---
echo "" >> "$deletion_plan"
echo "The following backups will be deleted:" >> "$deletion_plan"
delete_count=0
for file in "${backups[@]}"; do
if [[ -z ${keep_files["$file"]} ]]; then
echo "DELETE: $file" >> "$deletion_plan"
delete_count=$((delete_count + 1))
fi
done
# --- Summary ---
echo "" >> "$deletion_plan"
echo "----------------------------------------" >> "$deletion_plan"
echo "Summary: Will keep ${#keep_files[@]} files and delete $delete_count files" >> "$deletion_plan"
echo "To execute this plan, run: ./execute_deletion_plan.sh $deletion_plan" >> "$deletion_plan"
# Create deletion executor
cat > execute_deletion_plan.sh << 'EOF'
#!/bin/bash
if [ $# -ne 1 ]; then
echo "Usage: $0 <deletion_plan_file>"
exit 1
fi
plan_file="$1"
if [ ! -f "$plan_file" ]; then
echo "Error: Deletion plan file not found: $plan_file"
exit 1
fi
echo "Executing deletion plan from: $plan_file"
echo "----------------------------------------"
while IFS= read -r line; do
if [[ $line == DELETE:* ]]; then
file="${line#DELETE: }"
if [ -f "$file" ]; then
echo "Deleting: $file"
rm "$file"
else
echo "Warning: File not found: $file"
fi
fi
done < "$plan_file"
echo "----------------------------------------"
echo "Deletion plan execution completed."
EOF
chmod +x execute_deletion_plan.sh
echo ""
echo "✅ Deletion plan created: $deletion_plan"
echo "Review the plan carefully, then execute with:"
echo "./execute_deletion_plan.sh $deletion_plan"
To renew the SSL certificate on Rodan instances, run docker -ps to get the nginx container id. Then run
docker exec -it [nginx_container_id] bashto enter the container. Once in the container, run certbot renew to renew the certificate.
Finally, run service nginx restart within the container to update the changes. If these steps went smoothly, then the certificate should be renewed.
- Route all error messages to
Sentry.io, and automatically triage them to workers in the lab. - Create proper users/groups for the production container. In production (and on linux machines) docker needs to run as a privileged user. The container is not to be regarded as a layer of security. A root user inside the container can have root level effects outside of the container on linux. That is why a dummy user
www-datais created in therodan,celerycontainers.- The rodan container should be ran by the
djangoorrodanuser. - The nginx container should be ran by the
nginxuser. - The postgres container should be ran by the
postgresuser.
- The rodan container should be ran by the
- Fix issues outlined by
https://github.com/docker/docker-bench-security, and intergrate them with atravis-cicheck. - Deploy with docker swarm
- Create
celery-GPUqueue for GPU intensive workloads.
- If you need root privileges inside of the docker container, you can specify a user with the
-ubefore entering the container withexecorrun.docker compose -f docker-compose.yml -u root rodan bash
- Repository Structure
- Working on Rodan
- Testing Production Locally
- Working on Interactive Classifier
- Job Queues
- Testing New Docker Images
- Set up Environment Variables
- Set up SSL with Certbot
- Set up SSH with GitHub
- Deploying on Staging
- Deploying on Production
- Import Previous Data