Prometheus Slurm Exporter 🚀

📸 View Dashboard Screenshots

Prometheus collector and exporter for metrics extracted from the Slurm resource scheduling system.

Note

Looking for a next-generation Slurm exporter with native OpenMetrics support (Slurm 25.11+)? Check out my new project: sckyzo/slurm_prometheus_exporter

✨ Features: Native OpenMetrics · Multiple endpoints · Basic Auth & TLS · Global labels · YAML config · Clean Architecture

📋 Table of Contents

✨ Features
📦 Installation
⚙️ Configuration (flags, collectors, Prometheus)
📊 Metrics Reference (all 14 collectors)
🛠️ Development (build, test, lint)
📈 Grafana Dashboards
📸 Screenshots
📜 License

✨ Features

✅ Exports a wide range of metrics from Slurm, including nodes, partitions, jobs, CPUs, and GPUs.
✅ All metric collectors are optional and can be enabled/disabled via flags.
✅ Supports TLS and Basic Authentication for secure connections.
✅ OpenMetrics format supported (exemplars, newer Prometheus features).
✅ Per-collector health metrics (slurm_exporter_collector_success, slurm_exporter_collector_duration_seconds).
✅ Liveness probe at /healthz for orchestrators (Kubernetes, systemd).
✅ GPU metrics per account and user (slurm_account_gpus_running, slurm_user_gpus_running).
✅ Per-reservation node state metrics (slurm_reservation_nodes_*).
✅ Ready-to-use Grafana dashboard.

📦 Installation

There are two recommended ways to install the Slurm Exporter.

1. From Pre-compiled Releases

This is the easiest method for most users.

Download the latest release for your OS and architecture from the GitHub Releases page. 📥
Place the slurm_exporter binary in a suitable location on a node with Slurm CLI access, such as /usr/local/bin/.
Ensure the binary is executable:
```
chmod +x /usr/local/bin/slurm_exporter
```
(Optional) To run the exporter as a service, you can adapt the example Systemd unit file provided in this repository at systemd/slurm_exporter.service.
- Copy it to /etc/systemd/system/slurm_exporter.service and customize it for your environment (especially the ExecStart path).
- Reload the Systemd daemon, then enable and start the service:
```
sudo systemctl daemon-reload
sudo systemctl enable slurm_exporter
sudo systemctl start slurm_exporter
```

2. From Source

If you want to build the exporter yourself, you can do so using the provided Makefile. 👩‍💻

Clone the repository:

git clone https://github.com/sckyzo/slurm_exporter.git
cd slurm_exporter

Build the binary:
```
make build
```
The new binary will be available at bin/slurm_exporter. You can then copy it to a location like /usr/local/bin/ and set up the Systemd service as described in the section above.

📈 Grafana Dashboards

Ten ready-to-use Grafana dashboards are provided in the dashboards_grafana/ directory. All dashboards use a $datasource template variable and are compatible with Grafana 12+.

#	Dashboard	UID	Description
01	Cluster Overview	`slurm-overview`	Global cluster health: CPU/GPU utilization, node states, job totals, partition summary
02	Jobs & Queue	`slurm-jobs`	Job queue details by user, account, partition — pending reasons, top users
03	Node Detail	`slurm-nodes`	Per-node CPU & memory table (filtered by partition), scalable to 100k+ nodes
04	Cluster Usage Statistics	`slurm-usage`	CPU/GPU utilization gauges, fairshare per account, top users by CPU
05	Scheduler	`slurm-scheduler`	slurmctld internals: cycle time, backfill, RPC statistics
06	Reservations & Licenses	`slurm-reservations`	Active reservations, node states per reservation, license usage
07	Accounting	`slurm-accounting`	User/account consumption, FairShare analysis, top consumers, priority diagnostics
08	Exporter Health	`slurm-health`	Collector OK/FAIL status, scrape duration history, Slurm binary versions
09	Exporter Performance	`slurm-exporter-perf`	Command durations, cache freshness, error rates, scrape health (new in v1.8.0)
10	All Metrics Reference	`slurm-all-metrics`	Exhaustive reference panel for every exported metric

Import to Grafana

Option 1 — Copy JSON files to your Grafana provisioning directory:

cp dashboards_grafana/*.json /etc/grafana/provisioning/dashboards/

Option 2 — Import via API:

for f in dashboards_grafana/*.json; do
  curl -s -X POST http://admin:password@grafana-host:3000/api/dashboards/db \
    -H "Content-Type: application/json" \
    -d "{\"dashboard\": $(cat $f), \"overwrite\": true, \"folderId\": 0}"
done

Scale note (Node Detail dashboard): The per-node table is filtered by the $partition variable. On clusters with 100k+ nodes, always select a specific partition to avoid loading excessive data. The partition summary and problem nodes panels are always scalable regardless of cluster size.

📸 Screenshots

Screenshots taken on a 20-node test cluster (alice/bob/carol/dave/eve/frank, multiple accounts and partitions). Click any thumbnail to open the full-size image. See dashboards_grafana/README.md for the full dashboard documentation.

Cluster Overview	Jobs & Queue	Node Detail (scalable 100k+ nodes)
Cluster Usage Statistics	Scheduler	Exporter Health
Reservations & Licenses	Accounting (new in v1.7.0)	Exporter Performance (new in v1.8.0)
All 10 dashboards documented in `dashboards_grafana/README.md`

📜 License

This project is licensed under the GNU General Public License, version 3 or later.

🍴 About this fork

This project is a fork of cea-hpc/slurm_exporter, which itself is a fork of vpenso/prometheus-slurm-exporter (now apparently unmaintained).

Feel free to contribute or open issues!

Name		Name	Last commit message	Last commit date
Latest commit History 341 Commits
.github		.github
cmd/slurm_exporter		cmd/slurm_exporter
dashboards_grafana		dashboards_grafana
docs		docs
images		images
internal		internal
scripts		scripts
systemd		systemd
test_data		test_data
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.goreleaser.dev.yaml		.goreleaser.dev.yaml
.goreleaser.yaml		.goreleaser.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prometheus Slurm Exporter 🚀

📋 Table of Contents

✨ Features

📦 Installation

1. From Pre-compiled Releases

2. From Source

📈 Grafana Dashboards

Import to Grafana

📸 Screenshots

📜 License

🍴 About this fork

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Prometheus Slurm Exporter 🚀

📋 Table of Contents

✨ Features

📦 Installation

1. From Pre-compiled Releases

2. From Source

📈 Grafana Dashboards

Import to Grafana

📸 Screenshots

📜 License

🍴 About this fork

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages