Skip to content

spcl/dace-fortran

Repository files navigation

dace-fortran — a Fortran (HLFIR) frontend for DaCe

dace-fortran lowers Fortran HPC kernels into optimisable DaCe SDFGs and hands back a Fortran-callable shared library that preserves the caller's original interface, so the SDFG can be dropped into the source program in place of the kernel it replaces. Flang (LLVM Flang / flang-new) owns the front-end — parsing, name binding, type inference, intrinsic lowering. This package consumes Flang's already-elaborated HLFIR (an MLIR dialect), runs a pipeline of MLIR passes to normalise it into one narrow IR shape, walks that IR into a dace.SDFG, and regenerates a bind(c) Fortran wrapper around the optimised result. It targets real HPC codes: ICON (ocean + atmosphere dycore, graupel, velocity advection), Quantum ESPRESSO (exx), CLOUDSC, NPB, FV3, and LULESH, plus a large construct-level test corpus.

 kernel.f90
     │  (0) preprocess (source-text rewrites) + flang -fc1 -emit-hlfir
     ▼
 kernel.hlfir            MLIR (HLFIR dialect) — flang did the parsing,
     │  (1) C++/MLIR bridge   name binding, type inference, intrinsic lowering
     ▼
 normalised single-TU IR (struct-flattened, inlined, shape-propagated, …)
     │  (2) walk → DaCe
     ▼
 dace.SDFG  ──(you optimise it with any DaCe transformation)──▶
     │  (3) emit binding
     ▼
 <entry>_bindings.f90  +  lib<entry>.so   ◀── the caller links here

Key features

  • Flang-authoritative front-end. No Fortran re-parsing in Python; the bridge consumes HLFIR that flang has already elaborated.
  • MLIR pass pipeline. Struct flattening (AoS→SoA), whole-kernel inlining, pointer-assignment rewriting, static devirtualisation + loud rejection of surviving polymorphism, shape propagation, reduction lifting, and a set of passes that delete non-numerical noise (error helpers, runtime I/O, character runtime) before the bridge sees the IR.
  • AoS / nested derived types. Path-flattened to per-member arrays; supports array-of-struct with array and allocatable members, and ICON's array-of-pointer-records (Graupel) pattern.
  • Fortran binding generation. Emits a <entry>_bindings.f90 module that preserves the caller's interface, aliases zero-copy where layouts agree, and generates copy-in/copy-out do-loops where the flatten plan requires it. An optional flat-C-ABI shim (bind(c)) exposes a stable C entry point.
  • External-call policy. A kernel's CALL to a separately-compiled bind(c) procedure (e.g. ICON's MPI halo exchange sync_patch_array) can be kept external and lowered to a DaCe library node, or dropped — declared once and applied to both the inliner and the bridge.
  • Build-system integration. A standalone preprocess CLI, a CMake module, and Autotools macros run the source-text rewrites in place so an existing build emits flang-consumable Fortran with no other changes.

Architecture / pipeline

Source-text preprocess passes (before flang)

Run before flang on the raw source. All are SED-style regex transforms with shared comment/string awareness, not a Fortran parser; each is narrow and idempotent. Importable standalone from dace_fortran.preprocess:

Pass Default What it does
merge_used_modules on Inlines USE-d module sources so flang sees one self-contained TU. Regex text-splice by default; an fparser AST engine is available via merge_engine="fparser".
strip_openmp_directives on Drops !$OMP / !$ACC / !$ sentinels and #ifdef _OPENMP / _OPENACC blocks.
normalize_kind_parameters on Substitutes precision aliases (wp, sp, dp, qp) with literal kind integers when the alias isn't locally bound.
rewrite_integer_powers on Expands integer-valued REAL-literal powers (x**2.0x*x).
replace_external_with_modules opt-in Resolves EXTERNAL :: name to USE mod, ONLY: name against search_dirs.
rewrite_string_enum_to_integer opt-in Converts CHARACTER enum-style dummies into INTEGER, returning a map for binding generation.
preprocess_fortran (IF-intvar) opt-in Rewrites IF (intvar)IF (intvar /= 0) for INTEGER scalars (flang-21 accepts only LOGICAL).

dace_fortran.fparser_inliner is an alternative single-TU inliner built on an fparser AST: it parses the whole project, resolves every USE/ONLY:/=> rename, prunes to what the entry reaches, consolidates surviving USE clauses, runs a desugaring pipeline (deconstruct ASSOCIATE / GOTO / statement-functions, remove interfaces), restores cross-module USE clauses so the output is legal single-file Fortran, and re-emits one .f90. Requires fparser > 0.2.

HLFIR pass pipeline (inside the bridge)

The exact order is DEFAULT_PIPELINE in dace_fortran/builder/__init__.py (MULTI_FILE_PIPELINE is the multi-file variant). The pipeline runs on a dedicated 2 GB-stack worker thread with MLIRContext multithreading disabled. Current order:

# Pass Purpose
1 hlfir-prune-unreachable Erase dispatch-table bindings the entry never dynamically invokes.
2 symbol-dce (early) Drop private functions the entry never reaches.
3 lower-fir-select-case fir.select_casecf.cond_br before inlining (the inliner segfaults on select-case callees).
4 lift-cf-to-scf (first) Structurise callees (fold early RETURN / CFG into scf.if) so inlining can't corrupt a structured region.
5 hlfir-strip-error-helpers Delete CALL errore / finish / abor1 etc. — their STOP-terminated shape stays multi-block and crashes the inliner.
6 hlfir-strip-runtime-io Delete diagnostic _FortranAio* calls (WRITE/PRINT/…); file-bound chains are preserved as dace.libraries.fortran_io nodes.
7 hlfir-strip-character-runtime Delete _FortranACharacter* calls (compare/Trim/Adjust) — the bridge models no character data.
8 hlfir-inline-all Splice every callee body into the entry; refuses multi-block callees as a safety net.
9 hlfir-unwrap-eval-in-mem hlfir.eval_in_memfir.alloca + body + plain reads.
10 hlfir-fold-element-aliases Erase element-scoped alias declares left by inlined elementals.
11–12 hlfir-expand-vector-subscript-{gather,scatter} Noncontiguous gather temps / scatter destinations → explicit do loops.
13 symbol-dce (late) Drop private callees once inlined.
14 fir-polymorphic-op Statically devirtualise resolvable fir.dispatch / fir.select_type.
15 hlfir-reject-polymorphism Loud-fail on surviving virtual dispatch (CLASS-as-monomorphic-box only).
16 hlfir-rewrite-sequence-association Collapse sequence-association adapters into section designates.
17 hlfir-fold-copy-in-out Fold flang's copy-in/copy-out temporaries.
18 hlfir-lift-alloc-array-of-records Lift type(t), allocatable :: f(:) into top-level companions.
19 hlfir-lift-aos-pointer-records Materialise concat companions for ICON's AoS-of-pointer-records (Graupel).
20 hlfir-split-aor-dummies Split allocatable-array-of-records dummies into per-member descriptors.
21 hlfir-marshal-external-structs Expand registered-external aos calls into per-member arguments.
22 hlfir-flatten-structs AoS → SoA; emits the hlfir.flatten_plan attribute.
23 hlfir-mark-bounds-remap-views Tag F2003 bounds-remapping pointer assigns so a DaCe View is emitted.
24 hlfir-rewrite-pointer-assigns Collapse plain ptr => target rebinds under strict-no-alias.
25 hlfir-propagate-shapes Assumed-shape dummies acquire real extent symbols.
26 hlfir-version-shape-scalars SSA-version a straight-line reassigned scalar used as an array extent.
27 hlfir-lift-reduction-operands Lift inline reductions (max(x, MAXVAL(slice))) into a preceding scalar temp.
28 hlfir-default-intent Intent-less dummies default to intent_inout.
29 lift-cf-to-scf (late) Raw-CFG loops (DO WHILE, DO…EXIT) → scf.while + scf.if.
30 hlfir-preserve-mutable-globals Clear init bodies of caller-mutable BSS globals so sccp can't fold their loads.
31 hlfir-fold-assumed-rank-queries Fold fir.box_rank / fir.is_assumed_size when the box's rank/shape is statically known.
32 sccp,canonicalize,cse Fold + simplify + dedupe.

Bridge (HLFIR → SDFG)

The C++/MLIR bridge under dace_fortran/bridge/ is a nanobind Python extension (hlfir_bridge). bridge.cpp owns an MLIRContext and ModuleOp and delegates to: trace_utils.cpp (declaration tracing), extract_vars.cpp (variable/descriptor extraction), extract_ast.cpp plus bridge/ast/ (expressions, assigns, elementals, control_flow, dispatch) for the IR walk. The MLIR passes above live under dace_fortran/passes/ and link into the hlfir_bridge_passes static library. On the Python side, dace_fortran/hlfir_to_sdfg.py (SDFGBuilder) and dace_fortran/builder/ construct the SDFG; dace_fortran/intrinsics/ lowers Fortran intrinsics (elementwise, reductions, BLAS/LAPACK).

Binding generation (SDFG → Fortran-callable .so)

dace_fortran/bindings/ runs after the SDFG is built, consuming three inputs: a FrozenSignature (the SDFG's argument list snapshotted at build time and drift-checked at codegen), an OriginalInterface (the caller-facing Fortran surface), and a FlattenPlan (the AoS→SoA record from hlfir-flatten-structs). It emits <entry>_bindings.f90, aliases zero-copy where layouts agree, generates copy-in/copy-out loops where the recipe demands, optionally emits a bind(c) flat-C-ABI shim, then compiles and links a .so.

Key design decisions

Three mechanisms do most of the work of turning idiomatic Fortran into a flat, monomorphic SDFG. They are documented here so the implementation files below are the only further reading needed.

1. Devirtualisation / monomorphisation (removing CLASS dispatch)

DaCe SDFGs are monomorphic: there is no runtime type dispatch. Two layers cooperate to guarantee every call site is statically resolved before SDFG construction.

  • Source-level monomorphisationdace_fortran/inliner/ast_desugaring/monomorphize_rewrite.py (with the analyzer monomorphize.py). A polymorphic CLASS(base) slot — a local variable, or a CLASS(base) component of a container type (e.g. ICON's t_ocean_solve%act) — is expanded into an INTEGER :: <var>__tag discriminator plus one concrete companion per arm, TYPE(arm) [, ALLOCATABLE] :: <var>__<arm>. ALLOCATE(concrete :: v) becomes v__tag = <k> (+ allocate(v__<arm>)), and a virtual dispatch CALL v%binding(args) becomes a static emit-all-always ladder — IF (v__tag == k) THEN; CALL <arm-proc>(v__<arm>, args); ELSE IF .... Each arm calls a concrete subroutine on a concrete TYPE, so the program lowers with only direct fir.calls and the <var>__tag reads become free symbols the SDFG sees (e.g. this_act__tag). Four primitives compose over the per-translation-unit MonomorphizationSpec: local-dispatch and component-dispatch ladders, shared-interposer cloning (specialise an inherited solve/construct per arm so its internal dispatch resolves statically), and RETYPE (an axis pinned to one concrete type at its construction site is collapsed by rewriting CLASS(base)TYPE(concrete) on its declarations, no tag needed). With stack_slots, arms become plain stack objects rather than allocatables, since the bridge cannot lower an allocatable derived-type scalar.

  • Bridge-level guarddace_fortran/passes/RejectPolymorphism.cpp (hlfir-reject-polymorphism). After flang's own fir-polymorphic-op pass statically devirtualises the resolvable cases, this pass walks for any surviving fir.dispatch, fir.select_type, or the lowered-but-still-virtual fir.box_tdesc (a type-info read) and loud-fails with a source-located error. Non-polymorphic CLASS(t) boxes (member access without virtual dispatch) are supported and peeled like fir.box<T>; only genuine runtime type discrimination is rejected.

2. Struct flattening (AoS → SoA)

dace_fortran/passes/FlattenStructs.cpp (hlfir-flatten-structs) eliminates Fortran derived types from the IR before SDFG construction — DaCe handles flat arrays well and structures awkwardly. It is the post-SDFG mirror of DaCe-core's StructToContainerGroups: a recursive walk over record members producing one flat per-member companion array, with SoA naming and outer-shape concatenation. Three shapes are handled: a scalar struct with flat members (t%u(M)t_u(M)); an array-of-struct with array members, where outer and inner extents concatenate (type(t), dimension(K) :: AA_u(K, M), and A(i)%u(j)A_u(i, j)); and nested records, which unfold recursively to the flat leaf (o%inner%x(j)o_inner_x(j)). Struct dummy arguments get the same treatment — replaceStructArg inserts one block arg per member/leaf and _soa-suffixes the function; inlined alias chains from hlfir-inline-all are followed transparently. The pass records a hlfir.flatten_plan attribute that the bindings emitter consumes.

The bind(c) wrapper marshals the host's Array-of-Structs ⇄ the SDFG's Struct-of-Arrays with copy-in/out gather loops — _render_aos_copy_in / _render_aos_copy_out / _aos_loop_pieces in dace_fortran/bindings/block_builders.py. The SoA buffer layout is [element-dims…, member-dims…]: an N-D record array contributes N leading element-index loops; the member's own dims follow. It handles N-dim record arrays and both kinds of member — allocatable/pointer members (extent = per-element max cap, guarded by allocated/associated, zero-filled where unallocated) and fixed-shape value members (e.g. t_cartesian_coordinates%x(3) — literal extents, always present, so the cap-scan and presence guard are skipped). Copy-out scatters back only when the argument is written.

3. Allocation-buffer SSA (the unifying ALLOCATABLE model)

This is the bridge's model for ALLOCATABLE arrays under arbitrary ALLOCATE / DEALLOCATE / conditional-allocate patterns (consolidated here from the former ALLOC_BUFFER_SSA_DESIGN.md).

Semantics modelled. An ALLOCATABLE at routine/BLOCK scope has one name bound to at most one current buffer; allocation status persists across control flow within the scope (an ALLOCATE in a taken IF branch leaves it allocated after the IF); referencing an unallocated allocatable is prohibited. So the bridge never has to prove allocation — it models "the current buffer at each point", trusting the program conforms, and may safely over-allocate on a path where Fortran would have left the name unallocated (a conforming program never reads it there).

The abstraction: buffer reaching-definitions. Each ALLOCATE site is a buffer definition and each DEALLOCATE a kill. Two sites belong to the same DaCe transient iff their buffers can reach a common use/join as alternatives (the two arms of an IF both live to the join); sites never simultaneously reaching are distinct transients (sequential re-allocation — one dies before the next is born). Formally: build a merge relation s ~ t (both reaching at some join/use) and take its union-find equivalence classes — each class is one DaCe transient. This single rule reproduces every pattern: IF/ELSE-both-allocate → one buffer; sequential alloc; dealloc; alloc → two buffers; conditional + later realloc → two classes; realloc-chain inside one branch → the right split.

Per-class shape (the PHI). A class whose sites share an extent gets a concrete shape; a class whose sites differ (a real conditional) gets a branch-dependent extent symbol <buf>_d<i> — each site assigns it on its own path and DaCe binds it from whichever path ran. Classes are named in first-definition order: class 0 keeps the base name, later classes get <name>_alloc1, <name>_alloc2, …; the bridge's existing alias map routes reads/writes to the current class buffer as it walks the IR, and both IF arms set the same merged-class buffer so post-join reads need no special handling. The grouping (a structured recursive walk over scf/fir.if regions, no general iterative dataflow) lives in the bridge's extract_vars allocation-site analysis. Out of scope: MOVE_ALLOC / allocatable-assignment auto-realloc, and buffer-reuse storage aliasing (a DaCe-core concern).

A related hazard — shape-scalar versioningdace_fortran/passes/VersionShapeScalars.cpp (hlfir-version-shape-scalars). A local integer scalar used as an array extent (ALLOCATE(x(m))) may be reassigned (m = m + 3) before another array is sized from it. Both extents otherwise resolve to the bare name m, making m a mutable SDFG symbol — so a whole-array op over x's shape after the reassignment iterates m's new value over a buffer allocated to its old one (OOB / heap corruption). For a scalar that feeds an fir.allocmem extent and is reassigned after that allocation in straight-line code, the pass SSA-versions it (m, m_2, …) so each array binds the version live at its allocation. Reassignment inside a loop or branch is refused with a clear error rather than silently emitting a mutable shape. Non-hazards (accumulate-then-allocate-once; loop bounds and subscripts that mint a fir.shape but never an fir.allocmem extent) are left untouched.

4. External-call policy (calls left un-inlined)

Some CALLs should not be pulled into the translation unit — their internals are unlowerable (MPI, polymorphic dispatch, string scans) or they target a separately-compiled bind(c) library. A single declaration drives both halves of the toolchain: the inliner stubs the named procedure's body (keeping its interface) so its internals never enter the TU, and the bridge then either emits the surviving CALL as an ExternalCall library node (dace_fortran/external.py) bound to a C-ABI symbol, or drops it. Three behaviours: inline (default), don't-inline + emit, don't-inline + ignore (structural invariant ignore ⊆ don't-inline).

Declare it once with apply_external_functions(EXTERNAL, IGNORE), where EXTERNAL is a list of ExternalFunction(name, c_function=…, library=…) and IGNORE is the drop list (finish, message, timers, …). The emitted call's argument plan is derived from the HLFIR call site (array → inout pointer, scalar / free-symbol → by-value), so you supply only what HLFIR can't know: the extern "C" symbol and the .so that exports it (linked into the SDFG library via rpath, resolving at load time). The contract: an emitted target must be bind(c, name="…") — Fortran name mangling is compiler-specific and a .mod is not C-consumable, so a stable C symbol (native or a thin forwarding shim) is the only portable handle. When the C ABI carries facts HLFIR cannot infer — whole derived-type (AoS) struct args, an MPI_Comm handle, per-leaf dynamic extents, cross-library module-global forwarding, or intent narrowing — register an authored ExternalSignature via keep_external(name, c_name=…, args=…, libraries=…) (the same registry apply_external_functions uses under the hood). See tests/external_call/ and tests/external_aos_test.py.

Prerequisites

  • LLVM / Flang 21flang-new-21 (validated against LLVM 21.1.8; the default is set by LLVM_VERSION = "21" in dace_fortran/CMakeLists.txt and dace_fortran/build_bridge.py, overridable via the LLVM_VERSION env var). On Debian/Ubuntu install llvm-21-dev libmlir-21-dev mlir-21-tools libflang-21-dev flang-21 clang-21 from apt.llvm.org. libflang-21-dev provides both the FIR/HLFIR static libs the bridge links and the flang headers it includes.
  • Python 3.10–3.14 (CI runs 3.12).
  • DaCe — pinned to dace @ git+https://github.com/spcl/dace.git@FaCe (the FaCe branch carries DaCe-core pieces the frontend needs; see pyproject.toml). Plus fparser > 0.2, networkx, numpy.
  • nanobind — the bridge is a nanobind extension (pip install nanobind).
  • CMake ≥ 3.18, a C++17 compiler (clang-21 is auto-selected when present), and gfortran (the binding tests and numerical references compile with gfortran; Ubuntu's flang-new-21 ships without libflang_rt, so flang is emit-HLFIR-only).

The bridge locates LLVM/MLIR by deriving the install prefix from flang-new-21 and using find_package(LLVM) only (it deliberately avoids the often-broken Debian find_package(MLIR) cmake config, locating MLIR headers/libs through LLVM's prefix).

Install & build

pip install -e ".[testing]"     # editable install + test deps

The C++ bridge is compiled on first use (first reference of SDFGBuilder / build_sdfg), not at import — dace_fortran/build_bridge.py runs cmake + build and symlinks the resulting hlfir_bridge*.so into the package. To build it explicitly, or to force a clean rebuild:

# Auto-detect LLVM, configure, and build:
python -m dace_fortran.build_bridge            # build if stale
python -m dace_fortran.build_bridge --clean    # wipe build dir and rebuild

# Or drive cmake directly:
cd dace_fortran/build
cmake .. -DLLVM_VERSION=21 -DCMAKE_BUILD_TYPE=Release
make -j8

Override LLVM discovery with the LLVM_VERSION / LLVM_DIR env vars if auto-detection misses.

Quick start

import dace_fortran
from dace_fortran.bindings import build_fortran_library

src = open("kernel.f90").read()

# Build the SDFG.  ``entry`` selects the target procedure; it accepts a
# plain Fortran name (``kernel``), a ``module::proc`` qualifier
# (``mo_x::kernel``), or a mangled Flang symbol (``_QPkernel`` /
# ``_QMmo_xPkernel``).  Omit it only when the source has exactly one
# procedure.
sdfg = dace_fortran.build_sdfg(src, entry="kernel", name="kernel")

# ... optimise the SDFG here with any DaCe transformation ...

# Emit + drift-verify + link a Fortran-callable .so.  The caller-facing
# interface and AoS→SoA flatten plan are auto-derived from the SDFG.
lib = build_fortran_library(sdfg, out_dir="build", name="kernel")

Other entry points (all return a built, validated dace.SDFG):

# A multi-file project (driver + the modules it USEs, in any order):
sdfg = dace_fortran.build_sdfg_from_files([driver, mod], entry="mo_x::kernel")

# A large / dependency-tangled project: emit .hlfir from your own build,
# then consume compile_commands.json directly (tier 3):
sdfg = dace_fortran.build_sdfg_from_project(
    "build/compile_commands.json", entry="_QMmymodPmysub")

# A kernel that CALLs a separately-compiled bind(c) function — keep it
# external and bind it to a C-ABI symbol / library:
dace_fortran.register_external("foo", dace_fortran.ExternalSignature(
    c_name="foo",
    args=[dace_fortran.Arg("array", "float64")],   # intent defaults to inout
    libraries=["/path/libfoo.so"]))

For end-to-end real-codebase recipes (ICON from source, Quantum ESPRESSO exx), see docs/ICON_INTEGRATION.md, docs/CODEBASE_HELPERS.md, the external-call policy above (§4), and the worked examples under tests/external_call/, tests/icon/full/, and tests/qe/.

Build-system integration

Three integration paths run the source-text preprocess passes in place so your existing compiler builds the result:

# Standalone CLI — atomic in-place rewrite, no build-file changes:
python -m dace_fortran.preprocess_cli \
    --all-defaults --rewrite-external --rewrite-string-enum \
    --search-dir src/utils --inplace \
    --in src/kernel.f90 --in src/helper.f90
# CMake (cmake/DaceFortran.cmake):
include(DaceFortran)
dace_fortran_preprocess(
    TARGET mylib SOURCES src/kernel.f90 src/helper.f90
    SEARCH_DIRS src/utils
    PASSES all_defaults rewrite_external rewrite_string_enum)
add_library(mylib ${mylib_PREPROCESSED_SOURCES})

Autotools is supported via autotools/dace_fortran.m4 + an included dace_fortran.mk.

Testing

# Main sweep — excludes multi-rank MPI and slow ICON-build tests:
python3 -m pytest -n 4 -m "not mpi and not long" tests/

# Multi-rank MPI tests (run under mpirun; --oversubscribe for <4 cores):
mpirun --oversubscribe -n 4 python3 -m pytest -m mpi -p no:cacheprovider tests/

# Dump built SDFGs for inspection:
__DACE_HLFIR_GEN_TEST_SDFGS=1 python3 -m pytest tests/

tests/conftest.py sets the test-environment defaults automatically (each via setdefault, so an explicit override still wins): HWLOC_COMPONENTS=-gl (disable hwloc's GL/X11 probe so MPI_Init can't hang on a desktop X display), UCX_VFS_ENABLE=n plus OMPI_MCA_pml=ob1 / OMPI_MCA_btl=self,vader (steer Open MPI onto in-node transports so UCX/PMIx finalize can't abort xdist workers), and raises the stack soft-limit to its hard limit for deeply-inlined kernels. The pytest markers are mpi, long, sequential, and xdist_group (see pyproject.toml).

Set TMPDIR to control where scratch .f90/.hlfir and .dacecache build artifacts land. Executable-Fortran tests compile and run with gfortran/f2py against a seeded numerical reference.

Validated corpora include: a broad construct-level suite (types, control flow, allocatable/pointer, slicing/intrinsics, reductions, BLAS/LAPACK, derived types, MPI send/recv); ICON ocean + atmosphere dycore, graupel, velocity advection, and full ICON-from-source integration (tests/icon/); Quantum ESPRESSO exx (tests/qe/); CLOUDSC (tests/cloudsc/); NPB LU (tests/npb/); FV3 and LULESH; build-system integration (tests/buildsys_integration/); and binding-specific tests (tests/bindings/).

Repository layout

dace_fortran/
  build.py                 public build_sdfg* entry points
  hlfir_to_sdfg.py         SDFGBuilder + DEFAULT_PIPELINE re-export
  build_bridge.py          auto-build + import the C++ bridge
  preprocess.py            source-text preprocess passes
  preprocess_cli.py        CLI for the preprocess passes
  fparser_inliner.py       fparser-AST single-TU inliner
  flang_codebase.py        real-codebase flang driver helpers (ICON/IFS/…)
  external.py / external_functions.py  external-call policy + registry
  emit_hlfir.py            tier-3 .hlfir emission helper
  CMakeLists.txt           bridge build (LLVM 21, nanobind)
  bridge/                  C++/MLIR bridge (nanobind ext: hlfir_bridge)
    bridge.cpp             nanobind boundary
    extract_vars.cpp / extract_ast.cpp / trace_utils.cpp
    ast/                   expressions, assigns, elementals, control_flow, dispatch
  passes/                  the MLIR passes (one .cpp per pass + Passes.cpp)
  builder/                 SDFG construction (access, descriptors, emit_*)
  bindings/                Fortran bind(c) binding generator + C-ABI shim
  intrinsics/              Fortran intrinsic lowering (elementwise/reduction/linalg)
  inliner/                 fparser-based module inliner / ast_desugaring
  data/                    distributed-data helpers
cmake/                     DaceFortran.cmake (CMake integration)
autotools/                 dace_fortran.m4 + dace_fortran.mk
scripts/                   ICON build/configure helpers
docs/                      CODEBASE_HELPERS, ICON_INTEGRATION, …
tests/                     test corpora (see Testing)

Future work / roadmap

  • GPU target bindings. The binding generator currently marshals host (CPU) arrays. Accepting GPU device-pointer inputs — so an SDFG compiled for a GPU can be called with device memory directly, without a host round-trip — is future work. (GPU codegen itself is a DaCe capability; what is missing here is the device-pointer marshalling at the Fortran/C-ABI boundary.)
  • Dimensional reductions in AST emit. The bridge pipeline now safely lifts SUM(arr, DIM=k)-style reductions, but the AST emit path for the dimensional case still has a gap.
  • CHARACTER string content. Only the enum-as-integer pattern (via rewrite_string_enum_to_integer) is supported; arbitrary character data is not modelled.
  • Polymorphism beyond monomorphic CLASS. SELECT TYPE / CLASS(*) with genuine runtime type discrimination is rejected; static devirtualisation handles only the resolvable case.
  • Caller-mutable global classification. PreserveMutableGlobals currently treats every fir.zero_bits (BSS-default) global as caller-mutable; a genuinely zero-initialised mutable global is indistinguishable from an uninitialised input in the IR (see WORK_PACKAGES.md).

These items, and the per-construct support matrix, are tracked in WORK_PACKAGES.md and the docs/ planning notes.

Non-goals

  • Re-parsing Fortran in Python — Flang is authoritative.
  • Cross-kernel fusion across translation-unit boundaries (inline-all handles intra-TU fusion; cross-TU is the binding emitter's concern).
  • Unstructured GOTO (it does not lift to scf).

Transformation examples (Fortran→Fortran)

Concrete before→after for the key source-text transforms (the conceptual tables are above; every snippet below is exercised by a test in tests/inliner/).

Type-bound call → free call (deconstruct_procedure_calls). The type loses its CONTAINS block (becomes pure data); each obj%bind(a) becomes a free call with obj threaded as the first argument:

type Square ; real::side ; contains ; procedure::area ; end type   ! before
a = s%area(1.0)
! after:  TYPE :: Square ; REAL :: side ; END TYPE   (no CONTAINS)
a = area(s, 1.0)

Local constant propagation (exploit_locally_constant_variables). Constant and pointer values are folded into later uses — but a pointer passed as an actual argument is left intact (a POINTER dummy needs a pointer actual, not its target expression):

ptr => data ; x = ptr + 1.    →    x = data + 1.        ! folded in an expression
ptr => data ; call f(ptr)      →    call f(ptr)          ! NOT f(data) — pointer kept

Stubbing & the external-call / IO-binding policy. The callee shell (and its call sites) always survive; only the body is rewritten. Pick the variant by what the procedure is:

make_noop / do_not_emit  : SUBROUTINE log(x) ; END           ! body emptied; bridge keeps/DROPs the call
make_return_false        : LOGICAL FUNCTION isRestart() ; isRestart = .FALSE. ; END
ExternalFunction("sync_patch_array", c_function=…, library=…) ! body emptied; bridge EMITS a bind(c) call

do_not_emit is for pure side-effects (IO, timers, finish); ExternalFunction is for calls the runtime must still make (MPI halo) — bound to a bind(c) symbol, typically via a thin hand-written shim. One generic name stubs the whole specific family (sync_patch_arraysync_patch_array_3d_dp, …).

AoS → SoA flatten (hlfir-flatten-structs). A derived-type access becomes a plain per-member array indexed by the record index (7 variants: scalar dummy, array-of-records, multi-dim, nested connectivity, pointer/allocatable box-section, local never-allocated, double-buffer):

type(t) :: p(:) ; ... = p(i)%w     →     ... = p_w(i)

Monomorphization (monomorphize(program, spec)) — removes Fortran virtual dispatch (fir.dispatch, which has no SDFG node) so a CLASS-heavy kernel like the ICON-O solver becomes lowerable. Two strategies:

! retype: pin a CLASS to one concrete type at its construction site
CLASS(t_transfer), POINTER :: trans   →   TYPE(t_trivial_transfer), POINTER :: trans

! ladder: a runtime factory over a closed arm-set → static type-tag if-ladder
call s%apply(x)   →   IF (s__tag==1) CALL gmres_apply(s__t_gmres, x)
                      ELSE IF (s__tag==2) CALL cg_apply(s__t_cg, x) ...

License

BSD 3-Clause. Copyright ETH Zurich and the DaCe authors (see AUTHORS / LICENSE).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors