A Prefect worker for running flows on Slurm HPC clusters
Execute your Prefect flows on high-performance computing clusters using the Slurm workload manager. This worker seamlessly integrates with Slurm's REST API to submit, monitor, and manage flow runs as Slurm jobs.
β¨ Automatic API Version Detection - Supports Slurm REST API versions 0.0.40-0.0.42 with automatic detection
π Secure Token Management - JWT-based authentication with file locking and proper permissions
π Zombie Job Recovery - Automatically detects and handles orphaned flow runs after worker restarts
π Resource Management - Full Slurm job specification support for CPU, memory, and time limits
π οΈ CLI Tools - Built-in utilities for token management and worker administration
π§ͺ Comprehensive Testing - Both unit and integration tests
pip install prefect-slurm-
Create a work pool using the Slurm worker type:
prefect work-pool create slurm-pool --type slurm
-
Configure authentication - Set up your Slurm credentials:
export PREFECT_SLURM_USER_NAME=your_username export PREFECT_SLURM_API_URL=http://your-slurm-server:6820
-
Set up authentication token:
# Generate and store token using built-in CLI scontrol token username=$USER lifespan=3600 | prefect-slurm token # Or set token directly via environment variable export PREFECT_SLURM_USER_TOKEN=your_jwt_token
-
Start the worker:
prefect worker start --pool slurm-pool --type slurm
| Variable | Description | Default |
|---|---|---|
PREFECT_SLURM_USER_NAME |
Slurm username | Required |
PREFECT_SLURM_API_URL |
Slurm REST API URL | Required |
PREFECT_SLURM_USER_TOKEN |
JWT authentication token | Optional |
PREFECT_SLURM_TOKEN_FILE |
Path to token file | ~/.prefect_slurm.jwt |
PREFECT_SLURM_LOCK_TIMEOUT |
File lock timeout (seconds) | 60 |
PREFECT_SLURM_ENV_FILE |
Override environment file path | Optional |
PREFECT_SLURM_MAX_ATTEMPTS |
Max retries for requests to SLURM REST API | 3 |
PREFECT_SLURM_RETRY_MIN_DELAY_SECONDS |
Min number of seconds between retry requests | 10 |
PREFECT_SLURM_RETRY_MIN_DELAY_JITTER_SECONDS |
Min jitter to randomize delays | 0 |
PREFECT_SLURM_RETRY_MAX_DELAY_JITTER_SECONDS |
Max jitter to randomize delays | 20 |
The worker supports loading configuration from environment files using a hierarchical discovery system. Files are loaded in priority order (later files override earlier ones):
- System-wide:
/etc/prefect-slurm/.env - XDG Config:
~/.config/prefect-slurm/.env(or$XDG_CONFIG_HOME/prefect-slurm/.env) - User Home:
~/.prefect_slurm.env - Current Directory (app-specific):
./.prefect_slurm.env - Current Directory:
./.env - Environment Variable Override:
$PREFECT_SLURM_ENV_FILE
Example environment file (.prefect_slurm.env):
# Slurm connection settings
PREFECT_SLURM_USER_NAME=your_username
PREFECT_SLURM_API_URL=http://your-slurm-server:6820
# Optional token (alternative to token file)
PREFECT_SLURM_USER_TOKEN=your_jwt_token_here
# Optional custom token file location
PREFECT_SLURM_TOKEN_FILE=~/my_custom_token.jwt
# Optional custom lock timeout
PREFECT_SLURM_LOCK_TIMEOUT=120You can override the automatic discovery by setting PREFECT_SLURM_ENV_FILE to point to a specific file:
export PREFECT_SLURM_ENV_FILE=/path/to/my/custom.env
prefect worker start --pool slurm-pool --type slurmNote: CLI commands (prefect-slurm token) also support environment files, though only PREFECT_SLURM_TOKEN_FILE and PREFECT_SLURM_LOCK_TIMEOUT are relevant for CLI operations.
Configure your Slurm work pool with job specifications:
job_configuration:
partition: "compute"
cpu: 4
memory: 8
time_limit: 2
working_dir: "/path/to/working/directory"
source_files: # Optional - omit for default Python environment
- "~/.bashrc"
- "~/envs/conda/bin/activate"The worker supports two environment configuration modes:
Custom Environment (when source_files are specified):
job_configuration:
source_files:
- "~/.bashrc"
- "/opt/conda/bin/activate"
- "/opt/modules/init.sh"The worker will source these files before executing your flow. Use this for conda environments, module systems, or custom shell configurations.
Default Python Environment (when source_files is empty or omitted):
job_configuration:
partition: "compute"
cpu: 4
memory: 8The worker automatically creates a temporary Python virtual environment with the matching Prefect version installed. The environment is created in $TMPDIR/.venv_$SLURM_JOB_ID and cleaned up after job completion.
The package includes a command-line utility for token management:
# Store token from scontrol output at default location
scontrol token username=$USER lifespan=3600 | prefect-slurm token
# Store token to custom location
echo "jwt_token_here" | prefect-slurm token ~/my_token.jwt
# Get help
prefect-slurm token --helpThe default location for the token is ~/.prefect_slurm.jwt (can be overridden by setting PREFECT_SLURM_TOKEN_FILE) and default permissions are 600 (read/write allowed for user only)
You can test the examples in the examples/ directory using the local Docker Compose Slurm cluster:
-
Start the local cluster:
cd slurm_environment/ docker-compose up -d -
Wait for services to be healthy (check with
docker-compose ps) -
Deploy and run example flows (from the prefect_server container):
# Enter the Prefect server container docker-compose exec prefect_server bash # Navigate to examples and deploy the hello world example interactively cd /opt/data/examples prefect deploy # Run the deployment prefect deployment run slurm-hello-world/slurm-hello-world-deployment
-
Monitor execution:
- Prefect UI: http://localhost:4200
- Check Slurm jobs (from slurm_node container):
docker-compose exec slurm_node squeue - View worker logs:
docker-compose logs slurm_submitter
The Docker environment provides a complete Slurm cluster with the worker automatically configured and example flows ready to deploy.
The Slurm worker integrates with Prefect's execution model:
- Worker Polling - Continuously polls Prefect API for scheduled flow runs
- Job Submission - Converts flow runs to Slurm job specifications
- Execution - Submits jobs via Slurm REST API with proper resource allocation
- Monitoring - Tracks job status and reports back to Prefect
- Cleanup - Handles zombie jobs and ensures proper flow run state management
graph TB
A[Prefect Server] -->|polls for flows| B[Slurm Worker]
B -->|submits jobs| C[Slurm REST API]
C -->|schedules| D[Slurm Cluster]
D -->|executes| E[Flow Run]
E -->|reports status| B
B -->|updates state| A
- Python: 3.11+ (< 3.14)
- Prefect: 3.4.13+
- Slurm: Cluster with REST API enabled (versions 0.0.40-0.0.42 supported)
- Network: Access from worker node to both Prefect API and Slurm REST API
# Unit tests only
pytest -m unit
# Integration tests (requires Docker)
pytest -m integration
# CLI tests
pytest -m cli
# All tests
pytestThe project includes Docker-based Slurm cluster for integration testing:
cd slurm_environment/
docker-compose up -dContributions are welcome! This project is developed by the EBI Metagenomics team.
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Run the full test suite
- Submit a pull request
Licensed under the Apache License 2.0. See LICENSE for details.
- Issues: Report bugs and request features via GitHub Issues
- Documentation: See tests/README.md for detailed testing information