Omnistat

General

Omnistat provides a set of utilities to aid cluster administrators or individual application developers to aggregate scale-out system metrics via low-overhead sampling across all hosts in a cluster or, alternatively on a subset of hosts associated with a specific user job. Omnistat infrastructure can aid in the collection of key telemetry from AMD Instinct™ accelerators (on a per-GPU basis). Relevant target metrics include:

GPU utilization
High-bandwidth memory (HBM) usage
GPU power
GPU temperature
GPU clock frequency
GPU memory clock frequency
Inventory information
- ROCm driver version
- GPU type
- GPU vBIOS version

Additional optional metrics:

RAS information (error counts per GPU block)
GPU power caps
GPU throttling events
XGMI traffic
GPU hardware counters
Host network traffic (rx and tx)

The data can be scraped for detailed visualization and analysis via a combination of Prometheus / VictoriaMetrics and Grafana. Users can also generate PDF reports summarizing resource utilization on a per job basis with SLURM entirely in user-space.

For more information on available features and installation steps please refer to the online documentation.

Omnistat is an AMD open source research project and is not supported as part of the ROCm software stack. We welcome contributions and feedback from the community.

Licensing information can be found in the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 1,323 Commits
.github/workflows		.github/workflows
docker		docker
docs		docs
grafana		grafana
misc		misc
omnistat		omnistat
rocprofiler-sdk		rocprofiler-sdk
test		test
.gitignore		.gitignore
CHANGES		CHANGES
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION
omnistat-annotate		omnistat-annotate
omnistat-monitor		omnistat-monitor
omnistat-query		omnistat-query
omnistat-rms-env		omnistat-rms-env
omnistat-standalone		omnistat-standalone
omnistat-usermode		omnistat-usermode
omnistat.service		omnistat.service
pyproject.toml		pyproject.toml
requirements-query.txt		requirements-query.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Omnistat

General

Additional optional metrics:

About

Uh oh!

Releases 10

Uh oh!

Contributors 4

Uh oh!

Languages

License

ROCm/omnistat

Folders and files

Latest commit

History

Repository files navigation

Omnistat

General

Additional optional metrics:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Uh oh!

Contributors 4

Uh oh!

Languages