Skip to content

EBI-Metagenomics/prefect-slurm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

60 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Prefect-Slurm

A Prefect worker for running flows on Slurm HPC clusters

Unit Tests License Python Prefect

Execute your Prefect flows on high-performance computing clusters using the Slurm workload manager. This worker seamlessly integrates with Slurm's REST API to submit, monitor, and manage flow runs as Slurm jobs.

Features

✨ Automatic API Version Detection - Supports Slurm REST API versions 0.0.40-0.0.42 with automatic detection
πŸ”’ Secure Token Management - JWT-based authentication with file locking and proper permissions
πŸ”„ Zombie Job Recovery - Automatically detects and handles orphaned flow runs after worker restarts
πŸ“Š Resource Management - Full Slurm job specification support for CPU, memory, and time limits
πŸ› οΈ CLI Tools - Built-in utilities for token management and worker administration
πŸ§ͺ Comprehensive Testing - Both unit and integration tests

Quick Start

Installation

pip install prefect-slurm

Basic Setup

  1. Create a work pool using the Slurm worker type:

    prefect work-pool create slurm-pool --type slurm
  2. Configure authentication - Set up your Slurm credentials:

    export PREFECT_SLURM_USER_NAME=your_username
    export PREFECT_SLURM_API_URL=http://your-slurm-server:6820
  3. Set up authentication token:

    # Generate and store token using built-in CLI
    scontrol token username=$USER lifespan=3600 | prefect-slurm token
    
    # Or set token directly via environment variable
    export PREFECT_SLURM_USER_TOKEN=your_jwt_token
  4. Start the worker:

    prefect worker start --pool slurm-pool --type slurm

Configuration

Environment Variables

Variable Description Default
PREFECT_SLURM_USER_NAME Slurm username Required
PREFECT_SLURM_API_URL Slurm REST API URL Required
PREFECT_SLURM_USER_TOKEN JWT authentication token Optional
PREFECT_SLURM_TOKEN_FILE Path to token file ~/.prefect_slurm.jwt
PREFECT_SLURM_LOCK_TIMEOUT File lock timeout (seconds) 60
PREFECT_SLURM_ENV_FILE Override environment file path Optional
PREFECT_SLURM_MAX_ATTEMPTS Max retries for requests to SLURM REST API 3
PREFECT_SLURM_RETRY_MIN_DELAY_SECONDS Min number of seconds between retry requests 10
PREFECT_SLURM_RETRY_MIN_DELAY_JITTER_SECONDS Min jitter to randomize delays 0
PREFECT_SLURM_RETRY_MAX_DELAY_JITTER_SECONDS Max jitter to randomize delays 20

Environment Files

The worker supports loading configuration from environment files using a hierarchical discovery system. Files are loaded in priority order (later files override earlier ones):

  1. System-wide: /etc/prefect-slurm/.env
  2. XDG Config: ~/.config/prefect-slurm/.env (or $XDG_CONFIG_HOME/prefect-slurm/.env)
  3. User Home: ~/.prefect_slurm.env
  4. Current Directory (app-specific): ./.prefect_slurm.env
  5. Current Directory: ./.env
  6. Environment Variable Override: $PREFECT_SLURM_ENV_FILE

Example environment file (.prefect_slurm.env):

# Slurm connection settings
PREFECT_SLURM_USER_NAME=your_username
PREFECT_SLURM_API_URL=http://your-slurm-server:6820

# Optional token (alternative to token file)
PREFECT_SLURM_USER_TOKEN=your_jwt_token_here

# Optional custom token file location
PREFECT_SLURM_TOKEN_FILE=~/my_custom_token.jwt

# Optional custom lock timeout
PREFECT_SLURM_LOCK_TIMEOUT=120

You can override the automatic discovery by setting PREFECT_SLURM_ENV_FILE to point to a specific file:

export PREFECT_SLURM_ENV_FILE=/path/to/my/custom.env
prefect worker start --pool slurm-pool --type slurm

Note: CLI commands (prefect-slurm token) also support environment files, though only PREFECT_SLURM_TOKEN_FILE and PREFECT_SLURM_LOCK_TIMEOUT are relevant for CLI operations.

Work Pool Configuration

Configure your Slurm work pool with job specifications:

job_configuration:
  partition: "compute"
  cpu: 4
  memory: 8
  time_limit: 2
  working_dir: "/path/to/working/directory"
  source_files:  # Optional - omit for default Python environment
    - "~/.bashrc"
    - "~/envs/conda/bin/activate"

Environment Setup

The worker supports two environment configuration modes:

Custom Environment (when source_files are specified):

job_configuration:
  source_files:
    - "~/.bashrc"
    - "/opt/conda/bin/activate"
    - "/opt/modules/init.sh"

The worker will source these files before executing your flow. Use this for conda environments, module systems, or custom shell configurations.

Default Python Environment (when source_files is empty or omitted):

job_configuration:
  partition: "compute"
  cpu: 4
  memory: 8

The worker automatically creates a temporary Python virtual environment with the matching Prefect version installed. The environment is created in $TMPDIR/.venv_$SLURM_JOB_ID and cleaned up after job completion.

CLI Tools

The package includes a command-line utility for token management:

# Store token from scontrol output at default location
scontrol token username=$USER lifespan=3600 | prefect-slurm token

# Store token to custom location
echo "jwt_token_here" | prefect-slurm token ~/my_token.jwt

# Get help
prefect-slurm token --help

The default location for the token is ~/.prefect_slurm.jwt (can be overridden by setting PREFECT_SLURM_TOKEN_FILE) and default permissions are 600 (read/write allowed for user only)

Running the Examples

You can test the examples in the examples/ directory using the local Docker Compose Slurm cluster:

  1. Start the local cluster:

    cd slurm_environment/
    docker-compose up -d
  2. Wait for services to be healthy (check with docker-compose ps)

  3. Deploy and run example flows (from the prefect_server container):

    # Enter the Prefect server container
    docker-compose exec prefect_server bash
    
    # Navigate to examples and deploy the hello world example interactively
    cd /opt/data/examples
    prefect deploy
    
    # Run the deployment
    prefect deployment run slurm-hello-world/slurm-hello-world-deployment
  4. Monitor execution:

    • Prefect UI: http://localhost:4200
    • Check Slurm jobs (from slurm_node container): docker-compose exec slurm_node squeue
    • View worker logs: docker-compose logs slurm_submitter

The Docker environment provides a complete Slurm cluster with the worker automatically configured and example flows ready to deploy.

Architecture

The Slurm worker integrates with Prefect's execution model:

  1. Worker Polling - Continuously polls Prefect API for scheduled flow runs
  2. Job Submission - Converts flow runs to Slurm job specifications
  3. Execution - Submits jobs via Slurm REST API with proper resource allocation
  4. Monitoring - Tracks job status and reports back to Prefect
  5. Cleanup - Handles zombie jobs and ensures proper flow run state management
graph TB
    A[Prefect Server] -->|polls for flows| B[Slurm Worker]
    B -->|submits jobs| C[Slurm REST API]
    C -->|schedules| D[Slurm Cluster]
    D -->|executes| E[Flow Run]
    E -->|reports status| B
    B -->|updates state| A
Loading

Requirements

  • Python: 3.11+ (< 3.14)
  • Prefect: 3.4.13+
  • Slurm: Cluster with REST API enabled (versions 0.0.40-0.0.42 supported)
  • Network: Access from worker node to both Prefect API and Slurm REST API

Development

Running Tests

# Unit tests only
pytest -m unit

# Integration tests (requires Docker)
pytest -m integration

# CLI tests
pytest -m cli

# All tests
pytest

Test Environment

The project includes Docker-based Slurm cluster for integration testing:

cd slurm_environment/
docker-compose up -d

Contributing

Contributions are welcome! This project is developed by the EBI Metagenomics team.

Development Workflow

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes with tests
  4. Run the full test suite
  5. Submit a pull request

License

Licensed under the Apache License 2.0. See LICENSE for details.

Support

About

Prefect worker to run flows on a Slurm HPC cluster

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •