All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Refactored diffusion preconditioners in
physicsnemo.diffusion.preconditionersrelying on a new abstract base classBaseAffinePreconditionerfor preconditioning schemes using affine transformations. Existing preconditioners (VPPrecond,VEPrecond,iDDPMPrecond,EDMPrecond) reimplemented based on this new interface. - New
SO2Convolutionlayer inphysicsnemo.experimental.nn.symmetryusing grid-based layout for efficient GPU parallelization. The new implementation uses a single vectorized einsum operation instead of per-m-order loops.
- PhysicsNemo v2.0 contains significant reorganization of tools. Please see the v2.0-MIGRATION-GUIDE.md to understand what has changed and why.
- Added mixture_of_experts for weather example in physicsnemo.examples.weather.
⚠️ Warning: - It uses experimental DiT model subject to future API changes. Added some modifications to DiT architecture in physicsnemo.experimental.models.dit. Added learnable option to PositionalEmbedding in physicsnemo.models.diffusion.layers. - Added lead-time aware training support to the StormCast example.
- Add a device aware kNN method to physicsnemo.utils.neighbors. Works with CPU or GPU by dispatching to the proper optimized library, and torch.compile compatible.
- Added additional testing of the DoMINO datapipe.
- Examples: added a new example for full-waveform inversion using diffusion
models. Accessible in
examples/geophysics/diffusion_fwi. - Domain Parallelism: Domain Parallelism is now available for kNN, radius_search, and torch.nn.functional.pad.
- Unified recipe for crash modeling, supporting Transolver and MeshGraphNet, and three transient schemes.
- Added a check to
stochastic_samplerthat helps handle theEDMPrecondmodel, which has a specific.forward()signature - Examples: added a new example for reservoir simulation using X-MeshGraphNet.
Accessible in
examples/reservoir_simulation - Added abstract interfaces for constructing active learning workflows, contained
under the
physicsnemo.active_learningnamespace. A preliminary example of how to compose and define an active learning workflow is provided inexamples/active_learning. Themoonsexample provides a minimal (pedagogical) composition that is meant to illustrate how to define the necessary parts of the workflow. - Added a new example for temporal interpolation of weather forecasts using ModAFNO.
Accessible in
examples/weather/temporal_interpolation.
- Migrated Stokes MGN example to PyTorch Geometric.
- Migrated Lennard Jones example to PyTorch Geometric.
- Migrated physicsnemo.utils.sdf.signed_distance_field to a static return, torch-only interface. It also now works on distributed meshes and input fields.
- Refactored DiTBlock to be more modular
- Added NATTEN 2D neighborhood attention backend for DiTBlock
- Migrated blood flow example to PyTorch Geometric.
- Refactored DoMINO model code and examples for performance optimizations and improved readability.
- Migrated HydroGraphNet example to PyTorch Geometric.
- Support for saving and loading nested
physicsnemo.Modules. It is now possible to create nested modules withm = Module(submodule, ...), and save and load them withModule.saveandModule.from_checkpoint.⚠️ Warning: - The modules have to bephysicsnemo.Modules, and nottorch.nn.Modules. - Support passing custom tokenizer, detokenizer, and attention
Modules in experimental DiT architecture - Improved Transolver training recipe's configuration for checkpointing and normalization.
- Bumped
multi-storage-clientversion to 0.33.0 with rust client. - Improved configuration for DLWP Healpix (checkpoint directory) and GraphCast (W&B settings).
- Set
skip_scaleto Python float in U-Net to ensure compilation works. - Ensure stream dependencies are handled correctly in physicsnemo.utils.neighbors
- Fixed the issue with incorrect handling of files with consecutive runs of
combine_stl_solids.pyin the X-MGN recipe. - Fixed the
RuntimeError: Worker data receiving interruptederror in the datacenter example.
- Diffusion Transformer (DiT) model. The DiT model can be accessed in
physicsnemo.experimental.models.dit.DiT.⚠️ Warning: - Experimental feature subject to future API changes. - Improved documentation for diffusion models and diffusion utils.
- Safe API to override
__init__'s arguments saved in checkpoint file withModule.from_checkpoint("chkpt.mdlus", override_args=set(...)). - PyTorch Geometric MeshGraphNet backend.
- Functionality in DoMINO to take arbitrary number of
scalarorvectorglobal parameters and encode them usingclass ParameterModel - TopoDiff model and example.
- Added ability for DoMINO model to return volume neighbors.
- Added functionality in DoMINO recipe to introduce physics residual losses.
- Diffusion models, metrics, and utils: implementation of Student-t
distribution for EDM-based diffusion models (t-EDM). This feature is adapted
from the paper Heavy-Tailed Diffusion Models, Pandey et al..
This includes a new EDM preconditioner (
tEDMPrecondSuperRes), a loss function (tEDMResidualLoss), and a new option in corrdiffdiffusion_step.⚠️ This is an experimental feature that can be accessed through thephysicsnemo.experimentalmodule; it might also be subjected to API changes without notice. - Bumped Ruff version from 0.0.290 to 0.12.5. Replaced Black with
ruff-format. - Domino improvements with Unet attention module and user configs
- Hybrid MeshGraphNet for modeling structural deformation
- Enabled TransformerEngine backend in the
transolvermodel. - Inference code for x-meshgraphnet example for external aerodynamics.
- Added a new example for external_aerodynamics: training
transolveron irregular mesh data for DrivaerML surface data. - Added a new example for external aerodynamics for finetuning pretrained models.
- Diffusion utils:
physicsnemo.utils.generativerenamed intophysicsnemo.utils.diffusion - Diffusion models: in CorrDiff model wrappers (
EDMPrecondSuperResolutionandUNet), the argumentsprofile_modeandamp_modecannot be overriden byfrom_checkpoint. They are now properties that can be dynamically changed after the model instantiation with, for example,model.amp_mode = Trueandmodel.profile_mode = False. - Updated healpix data module to use correct
DistributedSamplertarget for test data loader - Existing DGL-based vortex shedding example has been renamed to
vortex_shedding_mgn_dgl. Added newvortex_shedding_mgnexample that uses PyTorch Geometric instead. - HEALPixLayer can now use earth2grid HEALPix padding ops, if desired
- Migrated Vortex Shedding Reduced Mesh example to PyTorch Geometric.
- CorrDiff example: fixed bugs when training regression
UNet. - Diffusion models: fixed bugs related to gradient checkpointing on non-square images.
- Diffusion models: created a separate class
Attentionfor clarity and modularity. UpdatedUNetBlockaccordingly to use theAttentionclass instead of custom attention logic. This will update the model architecture forSongUNet-based diffusion models. Changes are not BC-breaking and are transparent to the user. ⚠️ BC-breaking: refactored the automatic mixed precision (AMP) API in layers and models defined inphysicsnemo/models/diffusion/for improved usability. Note: it is now, not only possible, but required to explicitly setmodel.amp_mode = Truein order to use the model in atorch.autocastclause. This applies to allSongUNet-based models.- Diffusion models: fixed and improved API to enable fp16 forward pass in
UNetandEDMPrecondSuperResolutionmodel wrappers; fp16 forward pass can now be toggled/untoggled by settingmodel.use_fp16 = True. - Diffusion models: improved API for Apex group norm.
SongUNet-based models will automatically perform conversion of the input tensors totorch.channels_lastmemory format whenmodel.use_apex_gnisTrue. New warnings are raised when attempting to use Apex group norm on CPU. - Diffusion utils: systematic compilation of patching operations in
stochastic_samplerfor improved performance. - CorrDiff example: added option for Student-t EDM (t-EDM) in
train.pyandgenerate.py. When training a CorrDiff diffusion model, this feature can be enabled with the hydra overrides++training.hp.distribution=student_tand++training.hp.nu_student_t=<nu_value>. For generation, this feature can be enabled with similar overrides:++generation.distribution=student_tand++generation.nu_student_t=<nu_value>. - CorrDiff example: the parameters
P_meanandP_std(used to compute the noise levelsigma) are now configurable. They can be set with the hydra overrides++training.hp.P_mean=<P_mean_value>and++training.hp.P_std=<P_std_value>for training (and similar ones withtraining.hpreplaced bygenerationfor generation). - Diffusion utils: patch-based inference and lead time support with deterministic sampler.
- Existing DGL-based XAeroNet example has been renamed to
xaeronet_dgl. Added newxaeronetexample that uses PyTorch Geometric instead. - Updated the deforming plate example to use the Hybrid MeshGraphNet model.
⚠️ BC-breaking: Refactored thetransolvermodel to improve readability and performance, and extend to more use cases.- Diffusion models: improved lead time support for
SongUNetPosLtEmbdandEDMLoss. Lead-time embeddings can now be used with/without positional embeddings. - Diffusion models: consolidate
ApexGroupNormandGroupNorminmodels/diffusion/layers.pywith a factoryget_group_normthat can be used to instantiate either one of them.get_group_normis now the recommended way to instantiate a GroupNorm layer inSongUNet-based and other diffusion models. - Physicsnemo models: improved checkpoint loading API in
Module.from_checkpointthat now exposes astrictparameter to raise error on missing/unexpected keys, similar to that used intorch.nn.Module.load_state_dict. - Migrated Hybrid MGN and deforming plate example to PyTorch Geometric.
- Bug fixes in DoMINO model in sphere sampling and tensor reshaping
- Bug fixes in DoMINO utils random sampling and test.py
- Optimized DoMINO config params based on DrivAer ML
- Fixed an inadvertent change to the deterministic sampler 2nd order correction
- Bug Fix in Domino model ball query layer
- Fixed bug models/unet/unet.py: setting num_conv_layers=1 gives errors
- Added ReGen score-based data assimilation example
- General purpose patching API for patch-based diffusion
- New positional embedding selection strategy for CorrDiff SongUNet models
- Added Multi-Storage Client to allow checkpointing to/from Object Storage
- Added a new aerodynamics example using DoMINO to compute design sensitivities (e.g., drag adjoint) with respect to underlying input geometry.
- Simplified CorrDiff config files, updated default values
- Refactored CorrDiff losses and samplers to use the patching API
- Support for non-square images and patches in patch-based diffusion
- ERA5 download example updated to use current file format convention and restricts global statistics computation to the training set
- Support for training custom StormCast models and various other improvements for StormCast
- Updated CorrDiff training code to support multiple patch iterations to amortize
regression cost and usage of
torch.compile - Refactored
physicsnemo/models/diffusion/layers.pyto optimize data type casting workflow, avoiding unnecessary casting under autocast mode - Refactored Conv2d to enable fusion of conv2d with bias addition
- Refactored GroupNorm, UNetBlock, SongUNet, SongUNetPosEmbd to support usage of Apex GroupNorm, fusion of activation with GroupNorm, and AMP workflow.
- Updated SongUNetPosEmbd to avoid unnecessary HtoD Memcpy of
pos_embd - Updated
from_checkpointto accommodate conversion between Apex optimized ckp and non-optimized ckp - Refactored CorrDiff NVTX annotation workflow to be configurable
- Refactored
ResidualLossto support patch-accumlating training for amortizing regression costs - Explicit handling of Warp device for ball query and sdf
- Merged SongUNetPosLtEmb with SongUNetPosEmb, add support for batch>1
- Add lead time embedding support for
positional_embedding_selector. Enable arbitrary positioning of probabilistic variables - Enable lead time aware regression without CE loss
- Bumped minimum PyTorch version from 2.0.0 to 2.4.0, to minimize
support surface for
physicsnemo.distributedfunctionality.
- Made
nvidia.dalian optional dependency
- Added version checks to ensure compatibility with older PyTorch for distributed utilities and ShardTensor
EntryPointerror that occured during physicsnemo checkpoint loading
- DoMINO model architecture, datapipe and training recipe
- Added matrix decomposition scheme to improve graph partitioning
- DrivAerML dataset support in FIGConvNet example.
- Retraining recipe for DoMINO from a pretrained model checkpoint
- Prototype support for domain parallelism of using ShardTensor (new).
- Enable DeviceMesh initialization via DistributedManager.
- Added Datacenter CFD use case.
- Add leave-in profiling utilities to physicsnemo, to easily enable torch/python/nsight profiling in all aspects of the codebase.
- Refactored StormCast training example
- Enhancements and bug fixes to DoMINO model and training example
- Enhancement to parameterize DoMINO model with inlet velocity
- Moved non-dimensionaliztion out of domino datapipe to datapipe in domino example
- Updated utils in
physicsnemo.launch.loggingto avoid unnecessarywandbandmlflowimports - Moved to experiment-based Hydra config in Lagrangian-MGN example
- Make data caching optional in
MeshDatapipe - The use of older
importlib_metadatalibrary is removed
- ProcessGroupConfig is tagged for future deprecation in favor of DeviceMesh.
- Update pytests to skip when the required dependencies are not present
- Bug in data processing script in domino training example
- Fixed NCCL_ASYNC_ERROR_HANDLING deprecation warning
- Remove the numpy dependency upper bound
- Moved pytz and nvtx to optional
- Update the base image for the Dockerfile
- Introduce Multi-Storage Client (MSC) as an optional dependency.
- Introduce
wraptas an optional dependency, needed when using ShardTensor's automatic domain parallelism
- Graph Transformer processor for GraphCast/GenCast.
- Utility to generate STL from Signed Distance Field.
- Metrics for CAE and CFD domain such as integrals, drag, and turbulence invariances and spectrum.
- Added gradient clipping to StaticCapture utilities.
- Bistride Multiscale MeshGraphNet example.
- FIGConvUNet model and example.
- The Transolver model.
- The XAeroNet model.
- Incoporated CorrDiff-GEFS-HRRR model into CorrDiff, with lead-time aware SongUNet and cross entropy loss.
- Option to offload checkpoints to further reduce memory usage
- Added StormCast model training and simple inference to examples
- Multi-scale geometry features for DoMINO model.
- Refactored CorrDiff training recipe for improved usability
- Fixed timezone calculation in datapipe cosine zenith utility.
- Refactored EDMPrecondSRV2 preconditioner and fixed the bug related to the metadata
- Extended the checkpointing utility to store metadata.
- Corrected missing export of loggin function used by transolver model
- Graph Transformer processor for GraphCast/GenCast.
- Utility to generate STL from Signed Distance Field.
- Metrics for CAE and CFD domain such as integrals, drag, and turbulence invariances and spectrum.
- Added gradient clipping to StaticCapture utilities.
- Bistride Multiscale MeshGraphNet example.
- Refactored CorrDiff training recipe for improved usability
- Fixed timezone calculation in datapipe cosine zenith utility.
- Code logging for CorrDiff via Wandb.
- Augmentation pipeline for CorrDiff.
- Regression output as additional conditioning for CorrDiff.
- Learnable positional embedding for CorrDiff.
- Support for patch-based CorrDiff training and generation (stochastic sampling only)
- Enable CorrDiff multi-gpu generation
- Diffusion model for fluid data super-resolution (CMU contribution).
- The Virtual Foundry GraphNet.
- A synthetic dataloader for global weather prediction models, demonstrated on GraphCast.
- Sorted Empirical CDF CRPS algorithm
- Support for history, cos zenith, and downscaling/upscaling in the ERA5 HDF5 dataloader.
- An example showing how to train a "tensor-parallel" version of GraphCast on a Shallow-Water-Equation example.
- 3D UNet
- AeroGraphNet example of training of MeshGraphNet on Ahmed body and DrivAerNet datasets.
- Warp SDF routine
- DLWP HEALPix model
- Pangu Weather model
- Fengwu model
- SwinRNN model
- Modulated AFNO model
- Raise
PhysicsNeMoUndefinedGroupErrorwhen querying undefined process groups - Changed Indexing error in
examples/cfd/swe_nonlinear_pinoforphysicsnemoloss function - Safeguarding against uninitialized usage of
DistributedManager
- Remove mlflow from deployment image
- Fixed bug in the partitioning logic for distributing graph structures intended for distributed message-passing.
- Fixed bugs for corrdiff diffusion training of
EDMv1andEDMv2 - Fixed bug when trying to save DDP model trained through unified recipe
- Update DALI to CUDA 12 compatible version.
- Update minimum python version to 3.10
- The citation file.
- Link to the CWA dataset.
- ClimateDatapipe: an improved datapipe for HDF5/NetCDF4 formatted climate data
- Performance optimizations to CorrDiff.
- Physics-Informed Nonlinear Shallow Water Equations example.
- Warp neighbor search routine with a minimal example.
- Strict option for loading PhysicsNeMo checkpoints.
- Regression only or diffusion only inference for CorrDiff.
- Support for organization level model files on NGC file system
- Physics-Informed Magnetohydrodynamics example.
- Updated Ahmed Body and Vortex Shedding examples to use Hydra config.
- Added more config options to FCN AFNO example.
- Moved posiitonal embedding in CorrDiff from the dataloader to network architecture
physicsnemo.models.diffusion.preconditioning.EDMPrecondSR. UseEDMPecondSRV2instead.
- Pickle dependency for CorrDiff.
- Consistent handling of single GPU runs in DistributedManager
- Output location of objects downloaded with NGC file system
- Bug in scaling the conditional input in CorrDiff deterministic sampler
- Updated DGL build in Dockerfile
- Updated default base image
- Moved Onnx from optional to required dependencies
- Optional Makani dependency required for SFNO model.
- Distributed process group configuration mechanism.
- DistributedManager utility to instantiate process groups based on a process group config.
- Helper functions to faciliate distributed training with shared parameters.
- Brain anomaly detection example.
- Updated Frechet Inception Distance to use Wasserstein 2-norm with improved stability.
- Molecular Dynamics example.
- Improved usage of GraphPartition, added more flexible ways of defining a partitioned graph.
- Physics-Informed Stokes Flow example.
- Profiling markers, benchmarking and performance optimizations for CorrDiff inference.
- Unified weather model training example.
- MLFLow logging such that only proc 0 logs to MLFlow.
- FNO given seperate methods for constructing lift and spectral encoder layers.
- The experimental SFNO
- Removed experimental SFNO dependencies
- Added CorrDiff dependencies (cftime, einops, pyspng, nvtx)
- Made tqdm a required dependency
- Added Stokes flow dataset
- An experimental version of SFNO to be used in unified training recipe for weather models
- Added distributed FFT utility.
- Added ruff as a linting tool.
- Ported utilities from PhysicsNeMo Launch to main package.
- EDM diffusion models and recipes for training and sampling.
- NGC model registry download integration into package/filesystem.
- Denoising diffusion tutorial.
- The AFNO input argument
img_sizetoinp_shape - Integrated the network architecture layers from PhysicsNeMo-Sym.
- Updated the SFNO model, and the training and inference recipes.
- Fixed physicsnemo.Module
from_checkpointto work from custom model classes
- Updated the base container to PyTorch 23.10.
- Updated examples to use Pydantic v2.
- Added ability to compute CRPS(..., dim: int = 0).
- Added EFI for arbitrary climatological CDF.
- Added Kernel CRPS implementation (kcrps)
- Added distributed utilities to create process groups and orthogonal process groups.
- Added distributed AFNO model implementation.
- Added distributed utilities for communication of buffers of varying size per rank.
- Added distributed utilities for message passing across multiple GPUs.
- Added instructions for docker build on ARM architecture.
- Added batching support and fix the input time step for the DLWP wrapper.
- Updating file system cache location to physicsnemo folder
- Fixed physicsnemo uninstall in CI docker image
- Handle the tar ball extracts in a safer way.
- Updated the base container to latest PyTorch 23.07.
- Update DGL version.
- Updated require installs for python wheel
- Added optional dependency list for python wheel
- Added a workaround fix for the CUDA graphs error in multi-node runs
- Update
certifipackage version
- Added a CHANGELOG.md
- Added build support for internal DGL
- 4D Fourier Neural Operator model
- Ahmed body dataset
- Unified Climate Datapipe
- DGL install changed from pypi to source
- Updated SFNO to add support for super resolution, flexible checkpoining, etc.
- Fixed issue with torch-harmonics version locking
- Fixed the PhysicsNeMo editable install
- Fixed AMP bug in static capture
- Fixed security issues with subprocess and urllib in
filesystem.py
- Updated the base container to latest PyTorch base container which is based on torch 2.0
- Container now supports CUDA 12, Python 3.10
- Initial public release.