Macrostrat's database backup service enables on-demand and periodic backups of PostgreSQL databases to local directories and remote S3 buckets. The service is designed to be run in a standalone Docker container, and is typically configured with environment variables.
It is based on pg_dump and Rclone and was initially created as a built-in backup service for the Sparrow data system.
Local backup to a directory and/or backup to a remote S3 bucket are supported, depending on which environment variables are set.
By default, the image runs the backup-service command for periodic backups. A
backup-db command that runs a one-off backup is also provided;
this can be run using
docker run pg-backup-service backup-db
with the appropriate environment variables.
Backups are named using an optional prefix, the database name, a 10-character file hash, and a timestamp, as such:
$DB_BACKUP_PREFIX/$dbname-5e753082e5-2021-11-01T20:51:55.pg-dump
This project is useful for low-intensity applications, but for larger-scale systems, using an incremental/streaming backup system like Barman or PGBackRest is recommended.
The application is configured with environment variables, allowing easy integration into Docker-centric workflows.
The name of the database to back up to (DB_NAME takes precedence)
With DB_NAME, a comma-separated string can be provided to back up multiple databases.
Other common PostgreSQL connection variables are also supported, such as:
PGHOST(default:localhost)PGPORT(default:5432)PGUSER(default:postgres)PGPASSWORD(no default)
This service is primarily designed to support backup to an S3-compatible storage bucket. S3 buckets are provided
by most storage providers including many university IT systems. Macrostrat's backup services are typically used with
s3.drive.wisc.edu.
S3_ENDPOINT: the S3 endpoint (Required for cloud backup)S3_ACCESS_KEY: the S3 access key (Required for cloud backup)S3_SECRET_KEY: the S3 secret key (Required for cloud backup)S3_BACKUP_BUCKET: the S3 bucket (Required for cloud backup)
DB_BACKUP_DIR: the directory to back up to (Required for local backup).
In order to back up the database outside of the docker container, you will need to mount this directory on the host machine:
docker run \
--env DB_BACKUP_DIR=/db-backups \
--volume /local-backups:/db-backups \
pg-backup-service
Backups are scheduled using the go-cron library; the schedule
is set by the SCHEDULE environment variable. Schedules can be
set to a crontab-style format, or more appealingly, a simple format: @daily, @hourly, @weekly, @every 100s etc.
See the go-cron documentation for more information.
By default, no schedule is applied, and a single backup is performed on startup.
DB_BACKUP_PREFIX: A prefix for database backups. If provided, all backups will be put within a specific namespace or folder. This is useful for sharing a bucket between many different database backup jobs.DB_BACKUP_MAX_N: the maximum number of backups to keep (default:10)PGDUMP_OPTIONS: additional options to pass topg_dumpfor all backup jobs.
If more customization of the backup process beyond $PGDUMP_OPTIONS is desired, any dump-$dbname or dump-database
commands added to the container's PATH will override the normal pg_dump -Fc command.
These have a signature dump-database <dbname> <out-dir> and must output only the filename of the dump file created.
See bin/defs.bash for more details.
The backup service creates custom-format PostgreSQL dump files. These can be restored with a command like
pg_restore -d $DB_NAME "$backup_file_name"
A built-in restore command may be provided in a future version of this image.
This container can be easily included in docker compose
container stacks.
services:
db_server:
image: postgis:13-3.1
...
db_backup:
image: ghcr.io/macrostrat/pg-backup-service:main
environment:
# Back up every weekend
- SCHEDULE=@weekly
- PGHOST=db_server
- PGPASSWORD=<your-password>
- DB_BACKUP_PREFIX=strata-v1
# Can set multiple databases for backup!!
- DB_NAME=strata-main,strata-dev
# S3 configuration parameters
- S3_ENDPOINT=s3.drive.wisc.edu
- S3_ACCESS_KEY
- S3_SECRET_KEY
- S3_BACKUP_BUCKET=database-backupsBasic backup functionality is fully tested. Tests can be run locally using make test.
All pull requests and commits to the main branch
are automatically tested, and updates to the Docker image are automatically pushed to the Github container registry.
Any contributions should add appropriate tests and documentation.
- A built-in command to restore from a backup (possibly with an interactive prompt).
- Possibly shift to Python from shell scripts.