Realm is a distributed, event–based tasking runtime for building high-performance applications that span clusters of CPUs, GPUs, and other accelerators.
It began life as the low-level substrate underneath the Legion programming system but is now maintained as a standalone project for developers who want direct, fine-grained control of parallel and heterogeneous machines.
- Asynchronous tasks & events – Compose applications out of many light-weight tasks connected by events instead of blocking synchronization.
- Heterogeneous execution – Target CPUs, NVIDIA CUDA/HIP GPUs, OpenMP threads, and specialized fabrics with a single API.
- Scalable networking – Integrate GASNet-EX, UCX, MPI or shared memory transports for efficient inter-node communication.
- Extensible modules – Enable/disable features (CUDA, HIP, LLVM JIT, NVTX, PAPI …) at build time with simple CMake flags.
- Portable performance – Realm applications routinely scale from laptops to the world's largest supercomputers.
The runtime follows a data-flow execution model: tasks are launched asynchronously and start when their pre-condition events trigger. This design hides network and device latency, maximizes overlap, and gives programmers explicit control over when work becomes runnable.
For a deeper dive see the Realm white-paper published at PACT 2014:
https://cs.stanford.edu/~sjt/pubs/pact14.pdf
git clone https://github.com/StanfordLegion/realm.git
cd realm
# Create an out-of-tree build directory
mkdir build && cd build
# Configure – pick the options that match your system
cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DREALM_ENABLE_OPENMP=ON \ # OpenMP support
-DREALM_ENABLE_CUDA=OFF # flip ON to target NVIDIA GPUs
# Compile everything
make -j$(nproc)
# (optional) run the unit tests
ctest --output-on-failure
The full list of CMake toggles is documented inside CMakeLists.txt
. Common switches include:
Option | Default | Purpose |
---|---|---|
REALM_ENABLE_CUDA |
ON |
Build CUDA backend |
REALM_ENABLE_HIP |
ON |
Build HIP/ROCm backend |
REALM_ENABLE_GASNETEX |
ON on Linux |
GASNet-EX network |
REALM_ENABLE_UCX |
ON on Linux |
UCX network |
REALM_ENABLE_MPI |
OFF |
MPI network |
REALM_LOG_LEVEL |
WARNING |
Compile-time log level |
TIP: combine
cmake -LAH
orccmake
to explore every option.
make install # honour DESTDIR / CMAKE_INSTALL_PREFIX as usual
Libraries, headers and CMake packages will be placed under include/realm
, lib/
, and share/realm/
so that external projects can consume Realm via
find_package(Realm REQUIRED)
The easiest way to get started is to build the tutorials as part of your normal CMake build tree:
# Configure (enable tutorials) if you did not already
cmake -B build -DREALM_BUILD_TUTORIALS=ON
cmake --build build --target realm_hello_world
# Run it (path will be inside the build directory)
./build/tutorials/hello_world/realm_hello_world -ll:cpu 4
If you have installed Realm (e.g. via make install
) you can also build an individual tutorial stand-alone:
cd tutorials/hello_world # Inside this repo or the copy installed under share/realm/tutorials
cmake -B build -DCMAKE_PREFIX_PATH=/path/to/realm/install
cmake --build build
./build/realm_hello_world -ll:cpu 4
Note • The tutorial directories only provide CMake build files. Traditional Makefiles are no longer shipped.
Tutorials currently available
- Hello World – minimal Realm program
- Events & Barriers – synchronization primitive
- Reservations - locks
- Reductions
- CUDA/HIP interoperability – calling GPU kernels from Realm tasks
- Profiling & Tracing – using
-lg:prof
and Legion Prof - Machine Model Exploration – querying the machine model
- Index Space Operations – set algebra helpers for index spaces
- Region Instances – creating and using regional instances
- Copy ⁄ Fill – DMA-style data movement between instances
- Subgraph Launches – launching groups of tasks together
- Deferred Allocation – lazy allocation of physical memory
- Completion Queues – querying event completion programmatically
Realm and its modules share a common set of -ll:<flag>
options to tune processor/memory counts at runtime:
-ll:cpu <N> # number of CPU cores per rank
-ll:gpu <N> # number of GPUs per rank
-ll:util <N> # number of util processors (communication helpers)
-ll:csize <MB> # DRAM memory per rank
-ll:fsize <MB> # framebuffer memory per GPU
-ll:zsize <MB> # zero-copy (pinned) memory per GPU
-logfile <path> # redirect logging (supports % for rank)
-level <cat>=<n> # change logging level per category
Run any Realm executable with -hl:help
(high-level) or -ll:help
(low-level) to see everything that is available.
- Current public documentation can be found here
- API reference (Doxygen): generate with
make docs
orcmake --build . --target docs
. - Tutorials: see the
tutorials/
directory listed above. - Examples & Benchmarks: under
examples/
andbenchmarks/
.
Please file an issue or pull request if something is missing or outdated.
We welcome contributions of all kinds – bug reports, documentation fixes, new features, and performance improvements.
- Fork the repository and create a feature branch.
- Follow the existing code style (clang-format is enforced in CI).
- Make sure
ctest
passes on your machine and withREALM_ENABLE_SANITIZER
if possible. - Open a pull request against
master
(or the feature branch you were asked to use).
See CONTRIBUTING.md
for the full guidelines.
Realm is licensed under the Apache License 2.0 – see LICENSE.txt
for details.
Commercial and academic use is free; attribution in papers and derivative works is appreciated.
Realm is developed and maintained by the Stanford Legion team with significant contributions from NVIDIA, Los Alamos, Livermore, Sandia, and many members of the broader HPC community.