Database Backups

Krawl includes an automatic backup job that periodically creates a database backup using native tools for each deployment mode.

Configuration

Via config.yaml

backups:
  path: "backups"          # Directory where backups are saved
  cron: "*/30 * * * *"     # Cron schedule (default: every 30 minutes)
  enabled: true            # Enable or disable the backup job

Via Environment Variables

Variable	Description	Default
`KRAWL_BACKUPS_PATH`	Directory where backup files are saved	`backups`
`KRAWL_BACKUPS_CRON`	Cron expression controlling backup frequency	`/30 * * *`
`KRAWL_BACKUPS_ENABLED`	Enable or disable the backup job	`true`

How It Works

The backup method depends on the deployment mode:

Standalone Mode (SQLite)

Uses Python's sqlite3.backup() API to create an atomic, consistent copy of the database file.

Output: {backups_path}/krawl_backup.db (a full SQLite database file)
Writes to a temporary file first, then atomically renames it — a partial backup never replaces a good one
Safe to run while Krawl is serving requests (SQLite WAL mode allows concurrent reads)

Restoring:

# Stop Krawl first
cp backups/krawl_backup.db data/krawl.db

Scalable Mode (PostgreSQL)

Uses pg_dump to create a standard SQL dump of the PostgreSQL database.

Output: {backups_path}/db_dump.sql
Requires pg_dump to be available in the container (included in the Krawl Docker image)
Uses --no-owner --no-privileges for portable dumps
5-minute timeout to prevent hung backups

Restoring:

psql -h localhost -U krawl -d krawl < backups/db_dump.sql

Note: If pg_dump is not available, an error is logged. Install postgresql-client to enable PostgreSQL backups.

Schedule

The backup job runs on the configured cron schedule (default: every 30 minutes).
Each backup overwrites the previous file.
The job also runs once immediately on startup.

Data Retention

Separately from backups, Krawl runs a data retention job daily at 3:00 AM that cleans up old records from the live database. This is controlled by KRAWL_DATABASE_RETENTION_DAYS (default: 30 days).

The retention job preserves:

All credential capture attempts
All suspicious access logs and honeypot triggers
IPs with suspicious activity history

It removes:

Non-suspicious access logs older than the retention period
Stale IP entries with no suspicious history
Orphaned attack detection records

Verifying Backups

Check that the backup file exists and is recent:

ls -la backups/

Check the Krawl logs for backup task output:

# Docker
docker logs krawl-server | grep "dump-krawl-data"

# Kubernetes
kubectl logs -l app.kubernetes.io/name=krawl | grep "dump-krawl-data"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database Backups

Configuration

Via config.yaml

Via Environment Variables

How It Works

Standalone Mode (SQLite)

Scalable Mode (PostgreSQL)

Schedule

Data Retention

Verifying Backups

FilesExpand file tree

backups.md

Latest commit

History

backups.md

File metadata and controls

Database Backups

Configuration

Via config.yaml

Via Environment Variables

How It Works

Standalone Mode (SQLite)

Scalable Mode (PostgreSQL)

Schedule

Data Retention

Verifying Backups