Skip to content

Add run_mlir.jl: pre_xla MLIR → standalone XLA execute script#2581

Open
gbaraldi wants to merge 3 commits intoEnzymeAD:mainfrom
gbaraldi:gba/run-mlir-script
Open

Add run_mlir.jl: pre_xla MLIR → standalone XLA execute script#2581
gbaraldi wants to merge 3 commits intoEnzymeAD:mainfrom
gbaraldi:gba/run-mlir-script

Conversation

@gbaraldi
Copy link
Collaborator

@gbaraldi gbaraldi commented Mar 1, 2026

Summary

  • Adds scripts/run_mlir.jl, a two-phase code-generation tool for testing the XLA compilation/execution pipeline on *_pre_xla_compile.mlir dumps — without needing the full model (e.g. Oceananigans/GB-25) setup
  • The generator parses MLIR signatures and emits a self-contained Julia script with hardcoded shapes, shardings, and compile options
  • Generated scripts support --cpu flag to run on virtual CPU devices via XLA_FLAGS="--xla_force_host_platform_device_count=N"

How it works

Phase 1 — Generator (no Reactant dependency):

julia run_mlir.jl first_time_step.mlir loop.mlir output.jl
  • Regex-parses func.func @main(...) signatures (tensor types, shapes, sdy.sharding annotations)
  • Extracts mesh spec, mhlo.num_partitions/num_replicas from module attributes
  • Detects grid constants (args before first tf.aliasing_output)
  • Emits a standalone script with all shapes/shardings baked in

Phase 2 — Generated script:

julia --project=Reactant.jl output.jl        # on GPUs
julia --project=Reactant.jl output.jl --cpu   # on N virtual CPU devices
  • Creates mock ConcreteRArray/ConcreteRNumber with correct NamedSharding
  • Loads MLIR, XLA-compiles with shardy partitioner
  • Executes first_time_step → marshals outputs → executes loop
  • Syncs all results (synced_buffer) for accurate timing
  • Handles both PJRT and IFRT runtimes

Test plan

  • Generator parses GB-25 MLIR (38/39 inputs, 26 outputs, mesh ["x"=2,"y"=2])
  • Example MLIR (single-device stablehlo.add) end-to-end on GPU
  • Example MLIR end-to-end with --cpu
  • Full GB-25 MLIR compile + execute with --cpu (4 virtual devices)
  • Full GB-25 MLIR on 4 real GPUs

🤖 Generated with Claude Code

gbaraldi and others added 2 commits March 1, 2026 21:41
… script

A code-generation tool for testing the XLA compilation/execution pipeline
on pre_xla_compile MLIR dumps without needing the full model setup.

Phase 1 (generator — no Reactant dependency):
  julia run_mlir.jl first.mlir loop.mlir output.jl

  - Parses func.func @main signatures (types, shapes, sdy shardings)
  - Extracts mesh spec, num_partitions/replicas from module attributes
  - Detects grid constants vs aliased state variables
  - Emits a self-contained Julia script with hardcoded shapes/shardings

Phase 2 (generated script):
  julia --project=Reactant.jl output.jl [--cpu]

  - Creates mock ConcreteRArrays with correct NamedSharding
  - Loads MLIR modules and XLA-compiles with shardy partitioner
  - Executes first_time_step, marshals outputs → loop inputs, executes loop
  - Syncs all results for accurate timing
  - --cpu flag: sets CUDA_VISIBLE_DEVICES="" and
    XLA_FLAGS="--xla_force_host_platform_device_count=N" to run on
    virtual CPU devices

Includes example MLIR files (1-device, stablehlo.add) for testing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Creates inputs with known values (grid=ones, dt=3, state=zeros),
executes both MLIR modules, and asserts the outputs match expected
values (first: [1,1,1,1], loop: [2,2,2,2]).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@gbaraldi
Copy link
Collaborator Author

gbaraldi commented Mar 1, 2026

@avik-pal Maybe this is useful for reproducing gb-25 bugs. Also might be useful for executing as well since it can obliviate a lot of the loop (well the logic)

@codecov
Copy link

codecov bot commented Mar 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 32.30%. Comparing base (b39a1fc) to head (ffcbd2e).
⚠️ Report is 792 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (b39a1fc) and HEAD (ffcbd2e). Click for more details.

HEAD has 93 uploads less than BASE
Flag BASE (b39a1fc) HEAD (ffcbd2e)
99 6
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #2581       +/-   ##
===========================================
- Coverage   68.16%   32.30%   -35.87%     
===========================================
  Files         109      174       +65     
  Lines       11779    28567    +16788     
===========================================
+ Hits         8029     9228     +1199     
- Misses       3750    19339    +15589     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@wsmoses wsmoses requested review from avik-pal and wsmoses March 2, 2026 16:22
@@ -0,0 +1,594 @@
#!/usr/bin/env julia
"""
run_mlir.jl — Generator: parse pre_xla MLIR files → emit standalone execute script.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this to not have reactant dependency? Seems like parsing would be much simpler with that one dep

Use sdyTensorShardingAttr*, sdyMeshAttr*, IR.FunctionType, IR.julia_type,
and IR.size to introspect MLIR modules programmatically instead of regex.

Zero regex patterns remain in the generator. The only string-level check
is a dict_get helper for dictionary attribute lookup (TODO: upstream
haskey/get for dict Attributes into Reactant's MLIR IR bindings).

Also uses randn instead of zeros for mock array data to exercise
non-trivial computation paths, and syncs results for accurate timing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
run_mlir_v2.jl — Generator (Reactant-dependent): MLIR IR introspection → emit execute script.

Usage:
julia --project=Reactant.jl run_mlir_v2.jl [first.mlir] [loop.mlir] [output.jl]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clean up these comments

# Main: analyze MLIR via IR APIs, emit script
# ──────────────────────────────────────────────────────────────

function main()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this just be added as a reactant-internal utility function in the serialization subpackage that takes a list of mlir files [in this case it would be first and loop], and an output path?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even nicer would be a cli app on 1.12+ that calls the serialization function

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is how do you choose how to pipe outputs into inputs etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants