Adds inference script (incl. force calculation) to Unifed External Aero Recipe by peterdsharpe · Pull Request #1706 · NVIDIA/physicsnemo

peterdsharpe · 2026-06-08T16:12:09Z

PhysicsNeMo Pull Request

Description

This PR modifies the Unified External Aero Recipe (examples/cfd/external_aerodynamics/unified_external_aero_recipe), to:

Add an inference script (src/infer.py and conf/infer.yaml) to the Unified External Aero Recipe
As part of this, add integrated force/moment coefficients to the recipe.

In order to pull this off cleanly (i.e., without massive duplication between the pre-existing train.py script and infer.py), some tooling was refactored within the recipe. In general this means:

train.py got a lot shorter.
utils.py and datasets.py got a lot longer, mostly with stuff that was pulled out of train.py.
infer.py can now re-use most of that shared tooling :)

So, in-net, for code reviewers, I would recommend:

Check out the modified README.md for an overview.
Review that the refactor looks sensible, namely train.py -> utils.py and datasets.py.
Review the new infer.py, along with the new YAMLs, and let me know if it's understandable.
Review the new forces.py, which is used for force calculation.

All other changes are pretty peripheral and/or derived from this.

Some more detailed notes on how the inference script works:

infer.py is a companion to train.py that loads a trained checkpoint, runs it over a split, and writes predictions back to disk:

Model/dataset-agnostic. It keys off the same input_type / output_type / forward_kwargs / targets contract as training, so it works for every model in the recipe (GeoTransolver, Transolver, FLARE, GLOBE, ...) with no per-model code. It reuses build_dataloaders, the collate, normalize_output_to_tensordict, and MetricCalculator directly (which now live in datasets.py).
Output. One native .pdmsh DomainMesh per sample under ${output_dir}/${run_id}/predictions/, with pred_<field> and true_<field> on the interior, plus a metrics.jsonl.
Units. Metrics are reported in training space (so they line up with validation numbers); written fields are re-dimensionalized to physical units by inverting normalization then non-dimensionalization. An optional rescale_geometry restores physical-scale coordinates.
conf/infer.yaml shares conf/base.yaml with the trainer and swaps the training schedule for checkpoint/output knobs (run_id, checkpoint_dir/checkpoint_path, infer_split, redimensionalize, ...).

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.
If I am implementing a new model or modifying any existing model, I have followed the Models Implementation Coding Standards.

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

- Introduced `infer.py`, a model-agnostic inference script that loads trained checkpoints, runs inference over specified dataset splits, and outputs predictions in the native `.pdmsh` format. - Added `infer.yaml` configuration file to define inference parameters, including dataset selection, checkpoint locations, and output settings. - Ensured compatibility with existing training configurations by sharing base settings, allowing for consistent precision and data loading across training and inference phases.

- Moved the import of `Float` from `jaxtyping` to improve code organization. - Removed unnecessary blank lines to enhance readability.

…ynamics - Updated the README to clarify that users can both train and run inference with models. - Added detailed instructions for running inference on trained checkpoints, including command examples and output specifications. - Introduced force and moment coefficient integration in the inference configuration, enabling users to obtain physical coefficients from surface cases. - Improved the `infer.yaml` configuration to support force coefficient calculations and added relevant parameters for reference areas and moment centers. - Refactored the `datasets.py` and `infer.py` scripts to streamline data loading and inference processes, ensuring consistency across training and inference phases.

copy-pr-bot · 2026-06-08T16:12:13Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-06-08T16:21:43Z

Greptile Summary

This PR adds an inference script (src/infer.py + conf/infer.yaml) and integrated aerodynamic force/moment coefficient support (src/forces.py) to the Unified External Aero Recipe, and refactors the previous train.py monolith to share dataloader assembly, device movement, autocast, and JSONL logging with the new companion script.

forces.py: new module that integrates per-cell Cp/Cf surface fields into CD/CL/CS/CMR/CMP/CMY via Mesh.integrate; physics derivation and sign conventions are well-documented and backed by closed-body analytical tests.
infer.py: model/dataset-agnostic inference loop (reuses build_dataloaders, collate, MetricCalculator, and ForceContext); outputs one .pdmsh per sample with pred_<field> / true_<field>, metrics in training space, and optional force summaries.
Refactor (datasets.py, utils.py): build_dataloaders, recursive_to_device, get_autocast_context, make_jsonl_logger, and resolve_dict moved from train.py to shared modules; train.py delegates to these with no logic change.

Important Files Changed

Filename	Overview
examples/cfd/external_aerodynamics/unified_external_aero_recipe/src/forces.py	New module implementing integrated aerodynamic force/moment coefficients (CD/CL/CS/CMR/CMP/CMY) via surface traction integration; physics looks correct, well-documented with unit tests, minor missing guard for U_inf key.
examples/cfd/external_aerodynamics/unified_external_aero_recipe/src/infer.py	New inference companion to train.py; reuses build_dataloaders, collate, MetricCalculator, and ForceContext cleanly; distributed all-reduce for metrics is correct but the force accumulator all-reduce has a latent tensor-size-mismatch risk in edge cases.
examples/cfd/external_aerodynamics/unified_external_aero_recipe/src/datasets.py	build_dataloaders and related helpers moved from train.py to datasets.py to enable shared use by infer.py; logic is identical to the original, refactor looks clean.
examples/cfd/external_aerodynamics/unified_external_aero_recipe/src/train.py	Significantly simplified by delegating dataloader construction, recursive device movement, autocast, and JSONL logging to shared utilities; no logic changes, only removals of code now in utils.py/datasets.py.
examples/cfd/external_aerodynamics/unified_external_aero_recipe/src/utils.py	Added shared utilities extracted from train.py: get_autocast_context, recursive_to_device, make_jsonl_logger, resolve_dict, and the Precision type alias; implementations are identical to the originals.
examples/cfd/external_aerodynamics/unified_external_aero_recipe/tests/test_forces.py	Good coverage: closed-surface pressure identity, uniform shear drag, reference-area scaling, orthonormal frame, degenerate-up handling, field identification, ForceContext config variants, and accumulator means/MAE.
examples/cfd/external_aerodynamics/unified_external_aero_recipe/tests/test_infer.py	Tests cover field-type resolution, redimensionalization, _to_pointwise, sample-id derivation, checkpoint-path resolution, and the DomainMesh write/round-trip without needing a real model or dataset.
examples/cfd/external_aerodynamics/unified_external_aero_recipe/conf/infer.yaml	Well-documented YAML with sensible defaults; shares base.yaml, aliases infer_split to val_split cleanly, and documents the moment_center / CenterMesh caveat.
examples/cfd/external_aerodynamics/unified_external_aero_recipe/src/metrics.py	Added resolve_metrics helper shared by train.py and infer.py; return type annotation says list[MetricName] but OmegaConf.to_container returns Any, so invalid names aren't caught until MetricCalculator lookup.
examples/cfd/external_aerodynamics/unified_external_aero_recipe/tests/test_train_helpers.py	Minor updates to reflect functions moved from train.py to datasets.py/utils.py; test logic unchanged.

Comments Outside Diff (2)

examples/cfd/external_aerodynamics/unified_external_aero_recipe/src/infer.py, line 936-940 (link)

Distributed all-reduce tensor size mismatch when force accumulator is empty on some ranks

_allreduce_sums packs {key: running_sum} entries into a single tensor whose length equals len(totals) + 1. If force_ctx is active (the dataset YAML declares Cp/Cf fields) but a rank's sampler shard happens to contain only non-surface samples (e.g., the per-rank shape check interior.points.shape[0] != vehicle.n_cells returns None for every sample), that rank's force_acc.totals remains an empty dict (len == 0) while other ranks carry 18 keys. dist.all_reduce requires all participating tensors to have the same size, so this would produce a NCCL error or hang. A simple fix is to pre-populate force_acc.totals with zeros for all 18 keys at construction time so the size is always consistent.
examples/cfd/external_aerodynamics/unified_external_aero_recipe/src/infer.py, line 900-912 (link)

Silent data loss in distributed inference — no runtime warning

attach_and_save is guarded by if is_rank0, so in multi-rank execution (e.g., torchrun --nproc_per_node=N) only rank 0's sampler shard ever gets written to disk. The other N−1 shards are processed (metrics and force coefficients are all-reduced correctly) but their predictions are silently dropped. The module docstring mentions this limitation, but there is no runtime logger.warning to alert users who accidentally run with torchrun expecting a full prediction set.

_{Reviews (1): Last reviewed commit: "interrogate fixes" | Re-trigger Greptile}

…dynamics - Modified the file patterns in `drivaer_ml_surface.yaml`, `drivaer_ml_volume.yaml`, and `highlift_volume.yaml` to specify directory structures, ensuring correct data loading for the respective readers. - Adjusted patterns to include specific prefixes for better organization and clarity in dataset management.

…stency - Changed file patterns in `README.md`, `shift_suv_estate_surface.yaml`, and `shift_suv_fastback_surface.yaml` to include a `run_*` prefix, ensuring uniformity in data loading paths across the unified external aerodynamics recipe. - Adjusted the pattern in `merge_global_data.py` to match the updated structure for better integration with the data pipeline.

peterdsharpe · 2026-06-09T12:12:00Z

/blossom-ci

peterdsharpe · 2026-06-09T12:12:09Z

/ok to test 0101ef0

…/github.com/peterdsharpe/physicsnemo into psharpe/add-unified-recipe-inference-script

peterdsharpe · 2026-06-10T17:06:00Z

/blossom-ci

peterdsharpe · 2026-06-10T17:06:04Z

/ok to test b290b3a

peterdsharpe · 2026-06-10T21:43:32Z

/blossom-ci

peterdsharpe added 10 commits June 4, 2026 14:03

Refactor import statements in collate.py for clarity

262df3f

- Moved the import of `Float` from `jaxtyping` to improve code organization. - Removed unnecessary blank lines to enhance readability.

ruff changes

3f0b3b1

ruff changes

549bd3b

docs

991e8dc

adds tests

272c980

ruff

f611c48

cleanup

f131f93

interrogate fixes

7915b4d

peterdsharpe requested a review from coreyjadams as a code owner June 8, 2026 16:12

greptile-apps Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread examples/cfd/external_aerodynamics/unified_external_aero_recipe/src/forces.py

peterdsharpe added 3 commits June 8, 2026 18:40

Merge branch 'main' into psharpe/add-unified-recipe-inference-script

0101ef0

peterdsharpe added 2 commits June 9, 2026 12:14

fixes an asymmetry in the files

a434fde

Merge branch 'psharpe/add-unified-recipe-inference-script' of https:/…

ce0324e

…/github.com/peterdsharpe/physicsnemo into psharpe/add-unified-recipe-inference-script

coreyjadams approved these changes Jun 9, 2026

View reviewed changes

peterdsharpe added 8 commits June 9, 2026 20:42

inference on one dataset only

e5aef68

docs for -cfd reference

3ede0c5

drop broken fp8 support

adc12c5

fixes a possible collectives bug

5a149d9

fp8 scrub

d19cdf6

add changelog

bf43247

Merge branch 'main' into psharpe/add-unified-recipe-inference-script

926beb3

validated changes

75e754d

peterdsharpe added 2 commits June 10, 2026 13:01

validated fixes

5451ab0

add utils tests

b290b3a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds inference script (incl. force calculation) to Unifed External Aero Recipe#1706

Adds inference script (incl. force calculation) to Unifed External Aero Recipe#1706
peterdsharpe wants to merge 25 commits into
NVIDIA:mainfrom
peterdsharpe:psharpe/add-unified-recipe-inference-script

peterdsharpe commented Jun 8, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jun 8, 2026

Uh oh!

greptile-apps Bot commented Jun 8, 2026 •

edited

Loading

Comments Outside Diff (2)

Uh oh!

Uh oh!

peterdsharpe commented Jun 9, 2026

Uh oh!

peterdsharpe commented Jun 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

peterdsharpe commented Jun 10, 2026

Uh oh!

peterdsharpe commented Jun 10, 2026

Uh oh!

peterdsharpe commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

peterdsharpe commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PhysicsNeMo Pull Request

Description

Checklist

Dependencies

Review Process

Uh oh!

copy-pr-bot Bot commented Jun 8, 2026

Uh oh!

greptile-apps Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Important Files Changed

Comments Outside Diff (2)

Uh oh!

Uh oh!

peterdsharpe commented Jun 9, 2026

Uh oh!

peterdsharpe commented Jun 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

peterdsharpe commented Jun 10, 2026

Uh oh!

peterdsharpe commented Jun 10, 2026

Uh oh!

peterdsharpe commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

peterdsharpe commented Jun 8, 2026 •

edited

Loading

greptile-apps Bot commented Jun 8, 2026 •

edited

Loading