Call comms / compute overlap passes when compile=False #304

fmassa · 2026-01-23T15:39:38Z

Previously, when we would call AutoParallel with compile=False, we wouldn't have any of the comms / compute overlap passes being applied to the model.

This effectively meant that we would need compile=True to have a performant autoparallelized model.

~~I've for now decided to call into all the post_grad passes, but it is also possible that we only call into the comms / compute overlap passes, to keep graph modifications to a minimum.~~

I'm now calling into the comms / compute reordering pass even when compile=False

xmfan · 2026-01-23T18:07:38Z

autoparallel/api.py

+        with V.set_fake_mode(fake_mode):
+            cuda_context = get_cuda_device_context(fx_g)
+            with cuda_context:
+                _recursive_post_grad_passes(fx_g, is_inference=False)


some of the post grad passes are bad for perf unless lowered e.g. view_to_reshape which materializes all views

I've changed it to only call into the comms / compute reordering pass, to keep graph changes to a minimum

… full post_grad passes

wconstab

seems OK to me. i will say that it's not super clear to me what the best formulation is. It's a little arbitrary which compiler passes to put 'inside' vs 'outside'.

from a use-case perspective, it seems nice to always have the distributed passes run, even if codegen isn't important. otoh, other things like cudagraph might also be preferred, even without codegen. For debugging, the unmodified original graphmodule might be nice to get out? (though, you can see it in its various states of transformation using tlparse).

Call post_grad passes when compile=False

585d7f7

fmassa requested review from ezyang, wconstab and xmfan January 23, 2026 15:39

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 23, 2026

xmfan reviewed Jan 23, 2026

View reviewed changes

xmfan approved these changes Jan 23, 2026

View reviewed changes

fmassa added 2 commits January 26, 2026 10:26

Fix tests

9a15998

Only call schedule_overlap_bucketing_from_inductor_configs instead of…

a1eba86

… full post_grad passes

fmassa changed the title ~~Call post_grad passes when compile=False~~ Call comms / compute overlap passes when compile=False Jan 26, 2026

Disable inductor-specific configs

f9105d1

wconstab approved these changes Jan 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Call comms / compute overlap passes when compile=False #304

Call comms / compute overlap passes when compile=False #304

Uh oh!

fmassa commented Jan 23, 2026 •

edited

Loading

Uh oh!

xmfan Jan 23, 2026

Uh oh!

fmassa Jan 26, 2026

Uh oh!

wconstab left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Call comms / compute overlap passes when compile=False #304

Are you sure you want to change the base?

Call comms / compute overlap passes when compile=False #304

Uh oh!

Conversation

fmassa commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xmfan Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

fmassa Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

wconstab left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fmassa commented Jan 23, 2026 •

edited

Loading