dace-fortran lowers Fortran HPC kernels into optimisable
DaCe SDFGs and hands back a
Fortran-callable shared library that preserves the caller's original
interface, so the SDFG can be dropped into the source program in place of the
kernel it replaces. Flang (LLVM Flang / flang-new) owns the front-end —
parsing, name binding, type inference, intrinsic lowering. This package
consumes Flang's already-elaborated HLFIR (an MLIR dialect), runs a
pipeline of MLIR passes to normalise it into one narrow IR shape, walks that IR
into a dace.SDFG, and regenerates a bind(c) Fortran wrapper around the
optimised result. It targets real HPC codes: ICON (ocean + atmosphere dycore,
graupel, velocity advection), Quantum ESPRESSO (exx), CLOUDSC, NPB, FV3, and
LULESH, plus a large construct-level test corpus.
kernel.f90
│ (0) preprocess (source-text rewrites) + flang -fc1 -emit-hlfir
▼
kernel.hlfir MLIR (HLFIR dialect) — flang did the parsing,
│ (1) C++/MLIR bridge name binding, type inference, intrinsic lowering
▼
normalised single-TU IR (struct-flattened, inlined, shape-propagated, …)
│ (2) walk → DaCe
▼
dace.SDFG ──(you optimise it with any DaCe transformation)──▶
│ (3) emit binding
▼
<entry>_bindings.f90 + lib<entry>.so ◀── the caller links here
- Flang-authoritative front-end. No Fortran re-parsing in Python; the bridge consumes HLFIR that flang has already elaborated.
- MLIR pass pipeline. Struct flattening (AoS→SoA), whole-kernel inlining, pointer-assignment rewriting, static devirtualisation + loud rejection of surviving polymorphism, shape propagation, reduction lifting, and a set of passes that delete non-numerical noise (error helpers, runtime I/O, character runtime) before the bridge sees the IR.
- AoS / nested derived types. Path-flattened to per-member arrays; supports array-of-struct with array and allocatable members, and ICON's array-of-pointer-records (Graupel) pattern.
- Fortran binding generation. Emits a
<entry>_bindings.f90module that preserves the caller's interface, aliases zero-copy where layouts agree, and generates copy-in/copy-out do-loops where the flatten plan requires it. An optional flat-C-ABI shim (bind(c)) exposes a stable C entry point. - External-call policy. A kernel's
CALLto a separately-compiledbind(c)procedure (e.g. ICON's MPI halo exchangesync_patch_array) can be kept external and lowered to a DaCe library node, or dropped — declared once and applied to both the inliner and the bridge. - Build-system integration. A standalone preprocess CLI, a CMake module, and Autotools macros run the source-text rewrites in place so an existing build emits flang-consumable Fortran with no other changes.
Run before flang on the raw source. All are SED-style regex transforms with
shared comment/string awareness, not a Fortran parser; each is narrow and
idempotent. Importable standalone from dace_fortran.preprocess:
| Pass | Default | What it does |
|---|---|---|
merge_used_modules |
on | Inlines USE-d module sources so flang sees one self-contained TU. Regex text-splice by default; an fparser AST engine is available via merge_engine="fparser". |
strip_openmp_directives |
on | Drops !$OMP / !$ACC / !$ sentinels and #ifdef _OPENMP / _OPENACC blocks. |
normalize_kind_parameters |
on | Substitutes precision aliases (wp, sp, dp, qp) with literal kind integers when the alias isn't locally bound. |
rewrite_integer_powers |
on | Expands integer-valued REAL-literal powers (x**2.0 → x*x). |
replace_external_with_modules |
opt-in | Resolves EXTERNAL :: name to USE mod, ONLY: name against search_dirs. |
rewrite_string_enum_to_integer |
opt-in | Converts CHARACTER enum-style dummies into INTEGER, returning a map for binding generation. |
preprocess_fortran (IF-intvar) |
opt-in | Rewrites IF (intvar) → IF (intvar /= 0) for INTEGER scalars (flang-21 accepts only LOGICAL). |
dace_fortran.fparser_inliner is an alternative single-TU inliner built on an
fparser AST: it parses the whole project, resolves every USE/ONLY:/=>
rename, prunes to what the entry reaches, consolidates surviving USE clauses,
runs a desugaring pipeline (deconstruct ASSOCIATE / GOTO / statement-functions,
remove interfaces), restores cross-module USE clauses so the output is legal
single-file Fortran, and re-emits one .f90. Requires fparser > 0.2.
The exact order is DEFAULT_PIPELINE in dace_fortran/builder/__init__.py
(MULTI_FILE_PIPELINE is the multi-file variant). The pipeline runs on a
dedicated 2 GB-stack worker thread with MLIRContext multithreading disabled.
Current order:
| # | Pass | Purpose |
|---|---|---|
| 1 | hlfir-prune-unreachable |
Erase dispatch-table bindings the entry never dynamically invokes. |
| 2 | symbol-dce (early) |
Drop private functions the entry never reaches. |
| 3 | lower-fir-select-case |
fir.select_case → cf.cond_br before inlining (the inliner segfaults on select-case callees). |
| 4 | lift-cf-to-scf (first) |
Structurise callees (fold early RETURN / CFG into scf.if) so inlining can't corrupt a structured region. |
| 5 | hlfir-strip-error-helpers |
Delete CALL errore / finish / abor1 etc. — their STOP-terminated shape stays multi-block and crashes the inliner. |
| 6 | hlfir-strip-runtime-io |
Delete diagnostic _FortranAio* calls (WRITE/PRINT/…); file-bound chains are preserved as dace.libraries.fortran_io nodes. |
| 7 | hlfir-strip-character-runtime |
Delete _FortranACharacter* calls (compare/Trim/Adjust) — the bridge models no character data. |
| 8 | hlfir-inline-all |
Splice every callee body into the entry; refuses multi-block callees as a safety net. |
| 9 | hlfir-unwrap-eval-in-mem |
hlfir.eval_in_mem → fir.alloca + body + plain reads. |
| 10 | hlfir-fold-element-aliases |
Erase element-scoped alias declares left by inlined elementals. |
| 11–12 | hlfir-expand-vector-subscript-{gather,scatter} |
Noncontiguous gather temps / scatter destinations → explicit do loops. |
| 13 | symbol-dce (late) |
Drop private callees once inlined. |
| 14 | fir-polymorphic-op |
Statically devirtualise resolvable fir.dispatch / fir.select_type. |
| 15 | hlfir-reject-polymorphism |
Loud-fail on surviving virtual dispatch (CLASS-as-monomorphic-box only). |
| 16 | hlfir-rewrite-sequence-association |
Collapse sequence-association adapters into section designates. |
| 17 | hlfir-fold-copy-in-out |
Fold flang's copy-in/copy-out temporaries. |
| 18 | hlfir-lift-alloc-array-of-records |
Lift type(t), allocatable :: f(:) into top-level companions. |
| 19 | hlfir-lift-aos-pointer-records |
Materialise concat companions for ICON's AoS-of-pointer-records (Graupel). |
| 20 | hlfir-split-aor-dummies |
Split allocatable-array-of-records dummies into per-member descriptors. |
| 21 | hlfir-marshal-external-structs |
Expand registered-external aos calls into per-member arguments. |
| 22 | hlfir-flatten-structs |
AoS → SoA; emits the hlfir.flatten_plan attribute. |
| 23 | hlfir-mark-bounds-remap-views |
Tag F2003 bounds-remapping pointer assigns so a DaCe View is emitted. |
| 24 | hlfir-rewrite-pointer-assigns |
Collapse plain ptr => target rebinds under strict-no-alias. |
| 25 | hlfir-propagate-shapes |
Assumed-shape dummies acquire real extent symbols. |
| 26 | hlfir-version-shape-scalars |
SSA-version a straight-line reassigned scalar used as an array extent. |
| 27 | hlfir-lift-reduction-operands |
Lift inline reductions (max(x, MAXVAL(slice))) into a preceding scalar temp. |
| 28 | hlfir-default-intent |
Intent-less dummies default to intent_inout. |
| 29 | lift-cf-to-scf (late) |
Raw-CFG loops (DO WHILE, DO…EXIT) → scf.while + scf.if. |
| 30 | hlfir-preserve-mutable-globals |
Clear init bodies of caller-mutable BSS globals so sccp can't fold their loads. |
| 31 | hlfir-fold-assumed-rank-queries |
Fold fir.box_rank / fir.is_assumed_size when the box's rank/shape is statically known. |
| 32 | sccp,canonicalize,cse |
Fold + simplify + dedupe. |
The C++/MLIR bridge under dace_fortran/bridge/ is a nanobind Python
extension (hlfir_bridge). bridge.cpp owns an MLIRContext and ModuleOp
and delegates to: trace_utils.cpp (declaration tracing), extract_vars.cpp
(variable/descriptor extraction), extract_ast.cpp plus bridge/ast/
(expressions, assigns, elementals, control_flow, dispatch) for the IR
walk. The MLIR passes above live under dace_fortran/passes/ and link into the
hlfir_bridge_passes static library. On the Python side,
dace_fortran/hlfir_to_sdfg.py (SDFGBuilder) and dace_fortran/builder/
construct the SDFG; dace_fortran/intrinsics/ lowers Fortran intrinsics
(elementwise, reductions, BLAS/LAPACK).
dace_fortran/bindings/ runs after the SDFG is built, consuming three inputs:
a FrozenSignature (the SDFG's argument list snapshotted at build time and
drift-checked at codegen), an OriginalInterface (the caller-facing Fortran
surface), and a FlattenPlan (the AoS→SoA record from hlfir-flatten-structs).
It emits <entry>_bindings.f90, aliases zero-copy where layouts agree,
generates copy-in/copy-out loops where the recipe demands, optionally emits a
bind(c) flat-C-ABI shim, then compiles and links a .so.
Three mechanisms do most of the work of turning idiomatic Fortran into a flat, monomorphic SDFG. They are documented here so the implementation files below are the only further reading needed.
DaCe SDFGs are monomorphic: there is no runtime type dispatch. Two layers cooperate to guarantee every call site is statically resolved before SDFG construction.
-
Source-level monomorphisation —
dace_fortran/inliner/ast_desugaring/monomorphize_rewrite.py(with the analyzermonomorphize.py). A polymorphicCLASS(base)slot — a local variable, or aCLASS(base)component of a container type (e.g. ICON'st_ocean_solve%act) — is expanded into anINTEGER :: <var>__tagdiscriminator plus one concrete companion per arm,TYPE(arm) [, ALLOCATABLE] :: <var>__<arm>.ALLOCATE(concrete :: v)becomesv__tag = <k>(+allocate(v__<arm>)), and a virtual dispatchCALL v%binding(args)becomes a static emit-all-always ladder —IF (v__tag == k) THEN; CALL <arm-proc>(v__<arm>, args); ELSE IF .... Each arm calls a concrete subroutine on a concreteTYPE, so the program lowers with only directfir.calls and the<var>__tagreads become free symbols the SDFG sees (e.g.this_act__tag). Four primitives compose over the per-translation-unitMonomorphizationSpec: local-dispatch and component-dispatch ladders, shared-interposer cloning (specialise an inheritedsolve/constructper arm so its internal dispatch resolves statically), andRETYPE(an axis pinned to one concrete type at its construction site is collapsed by rewritingCLASS(base)→TYPE(concrete)on its declarations, no tag needed). Withstack_slots, arms become plain stack objects rather than allocatables, since the bridge cannot lower an allocatable derived-type scalar. -
Bridge-level guard —
dace_fortran/passes/RejectPolymorphism.cpp(hlfir-reject-polymorphism). After flang's ownfir-polymorphic-oppass statically devirtualises the resolvable cases, this pass walks for any survivingfir.dispatch,fir.select_type, or the lowered-but-still-virtualfir.box_tdesc(a type-info read) and loud-fails with a source-located error. Non-polymorphicCLASS(t)boxes (member access without virtual dispatch) are supported and peeled likefir.box<T>; only genuine runtime type discrimination is rejected.
dace_fortran/passes/FlattenStructs.cpp (hlfir-flatten-structs) eliminates
Fortran derived types from the IR before SDFG construction — DaCe handles flat
arrays well and structures awkwardly. It is the post-SDFG mirror of DaCe-core's
StructToContainerGroups: a recursive walk over record members producing one
flat per-member companion array, with SoA naming and outer-shape concatenation.
Three shapes are handled: a scalar struct with flat members (t%u(M) →
t_u(M)); an array-of-struct with array members, where outer and inner extents
concatenate (type(t), dimension(K) :: A → A_u(K, M), and A(i)%u(j) →
A_u(i, j)); and nested records, which unfold recursively to the flat leaf
(o%inner%x(j) → o_inner_x(j)). Struct dummy arguments get the same
treatment — replaceStructArg inserts one block arg per member/leaf and
_soa-suffixes the function; inlined alias chains from hlfir-inline-all are
followed transparently. The pass records a hlfir.flatten_plan attribute that
the bindings emitter consumes.
The bind(c) wrapper marshals the host's Array-of-Structs ⇄ the SDFG's
Struct-of-Arrays with copy-in/out gather loops —
_render_aos_copy_in / _render_aos_copy_out / _aos_loop_pieces in
dace_fortran/bindings/block_builders.py. The SoA buffer layout is
[element-dims…, member-dims…]: an N-D record array contributes N leading
element-index loops; the member's own dims follow. It handles N-dim record
arrays and both kinds of member — allocatable/pointer members (extent =
per-element max cap, guarded by allocated/associated, zero-filled where
unallocated) and fixed-shape value members (e.g.
t_cartesian_coordinates%x(3) — literal extents, always present, so the
cap-scan and presence guard are skipped). Copy-out scatters back only when the
argument is written.
This is the bridge's model for ALLOCATABLE arrays under arbitrary
ALLOCATE / DEALLOCATE / conditional-allocate patterns (consolidated here
from the former ALLOC_BUFFER_SSA_DESIGN.md).
Semantics modelled. An ALLOCATABLE at routine/BLOCK scope has one name
bound to at most one current buffer; allocation status persists across control
flow within the scope (an ALLOCATE in a taken IF branch leaves it allocated
after the IF); referencing an unallocated allocatable is prohibited. So the
bridge never has to prove allocation — it models "the current buffer at each
point", trusting the program conforms, and may safely over-allocate on a path
where Fortran would have left the name unallocated (a conforming program never
reads it there).
The abstraction: buffer reaching-definitions. Each ALLOCATE site is a
buffer definition and each DEALLOCATE a kill. Two sites belong to the
same DaCe transient iff their buffers can reach a common use/join as
alternatives (the two arms of an IF both live to the join); sites never
simultaneously reaching are distinct transients (sequential
re-allocation — one dies before the next is born). Formally: build a merge
relation s ~ t (both reaching at some join/use) and take its union-find
equivalence classes — each class is one DaCe transient. This single rule
reproduces every pattern: IF/ELSE-both-allocate → one buffer; sequential
alloc; dealloc; alloc → two buffers; conditional + later realloc → two
classes; realloc-chain inside one branch → the right split.
Per-class shape (the PHI). A class whose sites share an extent gets a
concrete shape; a class whose sites differ (a real conditional) gets a
branch-dependent extent symbol <buf>_d<i> — each site assigns it on its own
path and DaCe binds it from whichever path ran. Classes are named in
first-definition order: class 0 keeps the base name, later classes get
<name>_alloc1, <name>_alloc2, …; the bridge's existing alias map routes
reads/writes to the current class buffer as it walks the IR, and both IF
arms set the same merged-class buffer so post-join reads need no special
handling. The grouping (a structured recursive walk over scf/fir.if
regions, no general iterative dataflow) lives in the bridge's extract_vars
allocation-site analysis. Out of scope: MOVE_ALLOC / allocatable-assignment
auto-realloc, and buffer-reuse storage aliasing (a DaCe-core concern).
A related hazard — shape-scalar versioning —
dace_fortran/passes/VersionShapeScalars.cpp (hlfir-version-shape-scalars).
A local integer scalar used as an array extent (ALLOCATE(x(m))) may be
reassigned (m = m + 3) before another array is sized from it. Both extents
otherwise resolve to the bare name m, making m a mutable SDFG symbol — so a
whole-array op over x's shape after the reassignment iterates m's new
value over a buffer allocated to its old one (OOB / heap corruption). For a
scalar that feeds an fir.allocmem extent and is reassigned after that
allocation in straight-line code, the pass SSA-versions it (m, m_2, …) so
each array binds the version live at its allocation. Reassignment inside a loop
or branch is refused with a clear error rather than silently emitting a mutable
shape. Non-hazards (accumulate-then-allocate-once; loop bounds and subscripts
that mint a fir.shape but never an fir.allocmem extent) are left untouched.
Some CALLs should not be pulled into the translation unit — their internals
are unlowerable (MPI, polymorphic dispatch, string scans) or they target a
separately-compiled bind(c) library. A single declaration drives both halves
of the toolchain: the inliner stubs the named procedure's body (keeping its
interface) so its internals never enter the TU, and the bridge then either
emits the surviving CALL as an ExternalCall library node
(dace_fortran/external.py) bound to a C-ABI symbol, or drops it. Three
behaviours: inline (default), don't-inline + emit, don't-inline + ignore
(structural invariant ignore ⊆ don't-inline).
Declare it once with apply_external_functions(EXTERNAL, IGNORE), where
EXTERNAL is a list of ExternalFunction(name, c_function=…, library=…) and
IGNORE is the drop list (finish, message, timers, …). The emitted call's
argument plan is derived from the HLFIR call site (array → inout pointer,
scalar / free-symbol → by-value), so you supply only what HLFIR can't know: the
extern "C" symbol and the .so that exports it (linked into the SDFG library
via rpath, resolving at load time). The contract: an emitted target must be
bind(c, name="…") — Fortran name mangling is compiler-specific and a .mod
is not C-consumable, so a stable C symbol (native or a thin forwarding shim) is
the only portable handle. When the C ABI carries facts HLFIR cannot infer —
whole derived-type (AoS) struct args, an MPI_Comm handle, per-leaf dynamic
extents, cross-library module-global forwarding, or intent narrowing — register
an authored ExternalSignature via keep_external(name, c_name=…, args=…, libraries=…) (the same registry apply_external_functions uses under the
hood). See tests/external_call/ and tests/external_aos_test.py.
- LLVM / Flang 21 —
flang-new-21(validated against LLVM 21.1.8; the default is set byLLVM_VERSION = "21"indace_fortran/CMakeLists.txtanddace_fortran/build_bridge.py, overridable via theLLVM_VERSIONenv var). On Debian/Ubuntu installllvm-21-dev libmlir-21-dev mlir-21-tools libflang-21-dev flang-21 clang-21from apt.llvm.org.libflang-21-devprovides both the FIR/HLFIR static libs the bridge links and the flang headers it includes. - Python 3.10–3.14 (CI runs 3.12).
- DaCe — pinned to
dace @ git+https://github.com/spcl/dace.git@FaCe(the FaCe branch carries DaCe-core pieces the frontend needs; seepyproject.toml). Plusfparser > 0.2,networkx,numpy. - nanobind — the bridge is a nanobind extension (
pip install nanobind). - CMake ≥ 3.18, a C++17 compiler (clang-21 is auto-selected when present),
and gfortran (the binding tests and numerical references compile with
gfortran; Ubuntu's
flang-new-21ships withoutlibflang_rt, so flang is emit-HLFIR-only).
The bridge locates LLVM/MLIR by deriving the install prefix from flang-new-21
and using find_package(LLVM) only (it deliberately avoids the often-broken
Debian find_package(MLIR) cmake config, locating MLIR headers/libs through
LLVM's prefix).
pip install -e ".[testing]" # editable install + test depsThe C++ bridge is compiled on first use (first reference of
SDFGBuilder / build_sdfg), not at import — dace_fortran/build_bridge.py
runs cmake + build and symlinks the resulting hlfir_bridge*.so into the
package. To build it explicitly, or to force a clean rebuild:
# Auto-detect LLVM, configure, and build:
python -m dace_fortran.build_bridge # build if stale
python -m dace_fortran.build_bridge --clean # wipe build dir and rebuild
# Or drive cmake directly:
cd dace_fortran/build
cmake .. -DLLVM_VERSION=21 -DCMAKE_BUILD_TYPE=Release
make -j8Override LLVM discovery with the LLVM_VERSION / LLVM_DIR env vars if
auto-detection misses.
import dace_fortran
from dace_fortran.bindings import build_fortran_library
src = open("kernel.f90").read()
# Build the SDFG. ``entry`` selects the target procedure; it accepts a
# plain Fortran name (``kernel``), a ``module::proc`` qualifier
# (``mo_x::kernel``), or a mangled Flang symbol (``_QPkernel`` /
# ``_QMmo_xPkernel``). Omit it only when the source has exactly one
# procedure.
sdfg = dace_fortran.build_sdfg(src, entry="kernel", name="kernel")
# ... optimise the SDFG here with any DaCe transformation ...
# Emit + drift-verify + link a Fortran-callable .so. The caller-facing
# interface and AoS→SoA flatten plan are auto-derived from the SDFG.
lib = build_fortran_library(sdfg, out_dir="build", name="kernel")Other entry points (all return a built, validated dace.SDFG):
# A multi-file project (driver + the modules it USEs, in any order):
sdfg = dace_fortran.build_sdfg_from_files([driver, mod], entry="mo_x::kernel")
# A large / dependency-tangled project: emit .hlfir from your own build,
# then consume compile_commands.json directly (tier 3):
sdfg = dace_fortran.build_sdfg_from_project(
"build/compile_commands.json", entry="_QMmymodPmysub")
# A kernel that CALLs a separately-compiled bind(c) function — keep it
# external and bind it to a C-ABI symbol / library:
dace_fortran.register_external("foo", dace_fortran.ExternalSignature(
c_name="foo",
args=[dace_fortran.Arg("array", "float64")], # intent defaults to inout
libraries=["/path/libfoo.so"]))For end-to-end real-codebase recipes (ICON from source, Quantum ESPRESSO
exx), see docs/ICON_INTEGRATION.md, docs/CODEBASE_HELPERS.md, the
external-call policy above (§4), and the worked examples under
tests/external_call/, tests/icon/full/, and tests/qe/.
Three integration paths run the source-text preprocess passes in place so your existing compiler builds the result:
# Standalone CLI — atomic in-place rewrite, no build-file changes:
python -m dace_fortran.preprocess_cli \
--all-defaults --rewrite-external --rewrite-string-enum \
--search-dir src/utils --inplace \
--in src/kernel.f90 --in src/helper.f90# CMake (cmake/DaceFortran.cmake):
include(DaceFortran)
dace_fortran_preprocess(
TARGET mylib SOURCES src/kernel.f90 src/helper.f90
SEARCH_DIRS src/utils
PASSES all_defaults rewrite_external rewrite_string_enum)
add_library(mylib ${mylib_PREPROCESSED_SOURCES})Autotools is supported via autotools/dace_fortran.m4 + an included
dace_fortran.mk.
# Main sweep — excludes multi-rank MPI and slow ICON-build tests:
python3 -m pytest -n 4 -m "not mpi and not long" tests/
# Multi-rank MPI tests (run under mpirun; --oversubscribe for <4 cores):
mpirun --oversubscribe -n 4 python3 -m pytest -m mpi -p no:cacheprovider tests/
# Dump built SDFGs for inspection:
__DACE_HLFIR_GEN_TEST_SDFGS=1 python3 -m pytest tests/tests/conftest.py sets the test-environment defaults automatically (each via
setdefault, so an explicit override still wins): HWLOC_COMPONENTS=-gl
(disable hwloc's GL/X11 probe so MPI_Init can't hang on a desktop X display),
UCX_VFS_ENABLE=n plus OMPI_MCA_pml=ob1 / OMPI_MCA_btl=self,vader (steer
Open MPI onto in-node transports so UCX/PMIx finalize can't abort xdist
workers), and raises the stack soft-limit to its hard limit for deeply-inlined
kernels. The pytest markers are mpi, long, sequential, and xdist_group
(see pyproject.toml).
Set TMPDIR to control where scratch .f90/.hlfir and .dacecache build
artifacts land. Executable-Fortran tests compile and run with gfortran/f2py
against a seeded numerical reference.
Validated corpora include: a broad construct-level suite (types, control flow,
allocatable/pointer, slicing/intrinsics, reductions, BLAS/LAPACK, derived
types, MPI send/recv); ICON ocean + atmosphere dycore, graupel, velocity
advection, and full ICON-from-source integration (tests/icon/); Quantum
ESPRESSO exx (tests/qe/); CLOUDSC (tests/cloudsc/); NPB LU
(tests/npb/); FV3 and LULESH; build-system integration
(tests/buildsys_integration/); and binding-specific tests
(tests/bindings/).
dace_fortran/
build.py public build_sdfg* entry points
hlfir_to_sdfg.py SDFGBuilder + DEFAULT_PIPELINE re-export
build_bridge.py auto-build + import the C++ bridge
preprocess.py source-text preprocess passes
preprocess_cli.py CLI for the preprocess passes
fparser_inliner.py fparser-AST single-TU inliner
flang_codebase.py real-codebase flang driver helpers (ICON/IFS/…)
external.py / external_functions.py external-call policy + registry
emit_hlfir.py tier-3 .hlfir emission helper
CMakeLists.txt bridge build (LLVM 21, nanobind)
bridge/ C++/MLIR bridge (nanobind ext: hlfir_bridge)
bridge.cpp nanobind boundary
extract_vars.cpp / extract_ast.cpp / trace_utils.cpp
ast/ expressions, assigns, elementals, control_flow, dispatch
passes/ the MLIR passes (one .cpp per pass + Passes.cpp)
builder/ SDFG construction (access, descriptors, emit_*)
bindings/ Fortran bind(c) binding generator + C-ABI shim
intrinsics/ Fortran intrinsic lowering (elementwise/reduction/linalg)
inliner/ fparser-based module inliner / ast_desugaring
data/ distributed-data helpers
cmake/ DaceFortran.cmake (CMake integration)
autotools/ dace_fortran.m4 + dace_fortran.mk
scripts/ ICON build/configure helpers
docs/ CODEBASE_HELPERS, ICON_INTEGRATION, …
tests/ test corpora (see Testing)
- GPU target bindings. The binding generator currently marshals host (CPU) arrays. Accepting GPU device-pointer inputs — so an SDFG compiled for a GPU can be called with device memory directly, without a host round-trip — is future work. (GPU codegen itself is a DaCe capability; what is missing here is the device-pointer marshalling at the Fortran/C-ABI boundary.)
- Dimensional reductions in AST emit. The bridge pipeline now safely lifts
SUM(arr, DIM=k)-style reductions, but the AST emit path for the dimensional case still has a gap. CHARACTERstring content. Only the enum-as-integer pattern (viarewrite_string_enum_to_integer) is supported; arbitrary character data is not modelled.- Polymorphism beyond monomorphic CLASS.
SELECT TYPE/CLASS(*)with genuine runtime type discrimination is rejected; static devirtualisation handles only the resolvable case. - Caller-mutable global classification.
PreserveMutableGlobalscurrently treats everyfir.zero_bits(BSS-default) global as caller-mutable; a genuinely zero-initialised mutable global is indistinguishable from an uninitialised input in the IR (seeWORK_PACKAGES.md).
These items, and the per-construct support matrix, are tracked in
WORK_PACKAGES.md and the docs/ planning notes.
- Re-parsing Fortran in Python — Flang is authoritative.
- Cross-kernel fusion across translation-unit boundaries (inline-all handles intra-TU fusion; cross-TU is the binding emitter's concern).
- Unstructured
GOTO(it does not lift toscf).
Concrete before→after for the key source-text transforms (the conceptual tables
are above; every snippet below is exercised by a test in tests/inliner/).
Type-bound call → free call (deconstruct_procedure_calls). The type loses
its CONTAINS block (becomes pure data); each obj%bind(a) becomes a free call
with obj threaded as the first argument:
type Square ; real::side ; contains ; procedure::area ; end type ! before
a = s%area(1.0)
! after: TYPE :: Square ; REAL :: side ; END TYPE (no CONTAINS)
a = area(s, 1.0)Local constant propagation (exploit_locally_constant_variables). Constant
and pointer values are folded into later uses — but a pointer passed as an
actual argument is left intact (a POINTER dummy needs a pointer actual, not
its target expression):
ptr => data ; x = ptr + 1. → x = data + 1. ! folded in an expression
ptr => data ; call f(ptr) → call f(ptr) ! NOT f(data) — pointer keptStubbing & the external-call / IO-binding policy. The callee shell (and its call sites) always survive; only the body is rewritten. Pick the variant by what the procedure is:
make_noop / do_not_emit : SUBROUTINE log(x) ; END ! body emptied; bridge keeps/DROPs the call
make_return_false : LOGICAL FUNCTION isRestart() ; isRestart = .FALSE. ; END
ExternalFunction("sync_patch_array", c_function=…, library=…) ! body emptied; bridge EMITS a bind(c) calldo_not_emit is for pure side-effects (IO, timers, finish); ExternalFunction
is for calls the runtime must still make (MPI halo) — bound to a bind(c) symbol,
typically via a thin hand-written shim. One generic name stubs the whole specific
family (sync_patch_array ⇒ sync_patch_array_3d_dp, …).
AoS → SoA flatten (hlfir-flatten-structs). A derived-type access becomes a
plain per-member array indexed by the record index (7 variants: scalar dummy,
array-of-records, multi-dim, nested connectivity, pointer/allocatable box-section,
local never-allocated, double-buffer):
type(t) :: p(:) ; ... = p(i)%w → ... = p_w(i)Monomorphization (monomorphize(program, spec)) — removes Fortran virtual
dispatch (fir.dispatch, which has no SDFG node) so a CLASS-heavy kernel like the
ICON-O solver becomes lowerable. Two strategies:
! retype: pin a CLASS to one concrete type at its construction site
CLASS(t_transfer), POINTER :: trans → TYPE(t_trivial_transfer), POINTER :: trans
! ladder: a runtime factory over a closed arm-set → static type-tag if-ladder
call s%apply(x) → IF (s__tag==1) CALL gmres_apply(s__t_gmres, x)
ELSE IF (s__tag==2) CALL cg_apply(s__t_cg, x) ...BSD 3-Clause. Copyright ETH Zurich and the DaCe authors (see AUTHORS /
LICENSE).