Skip to content
Merged
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
1950c41
minimal implementation for stormcast
negin513 May 20, 2026
5fd5fa4
test update
negin513 May 20, 2026
9abe917
improving checkpoint
negin513 May 21, 2026
eb894cf
Merge branch 'main' into fsdp2-stormcast
negin513 May 26, 2026
77a67c6
needed continqguity
negin513 May 28, 2026
d5e8d99
stormcast: add channels_last contiguity unit test
negin513 May 28, 2026
328efad
test: fix missing shard_dim arg in test_domain_parallel_sampling
negin513 May 28, 2026
6092f97
fix(stormcast): force parameter contiguity before FSDP2 fully_shard
negin513 May 28, 2026
ee7cfd5
test(stormcast): add channels_last contiguity test for FSDP2 migration
negin513 May 28, 2026
6dbb73b
fix(checkpoint): handle FSDP2 DTensor edge cases in save/load
negin513 May 28, 2026
0311812
test(checkpoint): add FSDP2 distributed checkpoint round-trip tests
negin513 May 28, 2026
ed9548d
Merge branch 'main' into fsdp2-stormcast
negin513 May 28, 2026
1dc3ed0
fix(checkpoint): use MRO walk for FSDP2 class name resolution
negin513 May 28, 2026
4775d44
Merge branch 'main' into fsdp2-stormcast
negin513 May 29, 2026
9eff446
Update examples/weather/stormcast/utils/parallel.py
negin513 Jun 2, 2026
f38bf7a
Merge branch 'main' into fsdp2-stormcast
negin513 Jun 2, 2026
13e22ff
fix(checkpoint): avoid DCP broadcast_from_rank0 hang for FSDP2
negin513 May 29, 2026
8bd7da1
tensor dtensor fix
negin513 Jun 3, 2026
adce49f
physicsnemo/utils/checkpoint.py
negin513 Jun 3, 2026
7d1a180
normalize param contiguity before distribute_module for FSDP2
negin513 Jun 3, 2026
b3fc26f
ruff
negin513 Jun 3, 2026
bdf39fd
update comment
negin513 Jun 3, 2026
263c29e
Merge branch 'main' into fsdp2-stormcast
negin513 Jun 3, 2026
e852942
better comments
negin513 Jun 3, 2026
ded4a62
no end to end test needed
negin513 Jun 4, 2026
8753018
name change for helper
negin513 Jun 4, 2026
ad91988
update checkpoint.py
negin513 Jun 5, 2026
b3518e8
update checkpoint dtensor routing
negin513 Jun 5, 2026
c2694f3
improve docstrings
negin513 Jun 5, 2026
e39124d
test(checkpoint): add FSDP2 counterparts for all FSDP1 checkpoint tests
negin513 Jun 5, 2026
c76de40
ruff fixes
negin513 Jun 5, 2026
ed1bca7
fix(checkpoint): correctly route FSDP2 optimizer state loading by mes…
negin513 Jun 9, 2026
946044e
Merge branch 'main' into fsdp2-stormcast
negin513 Jun 9, 2026
9d10ede
Merge branch 'main' into fsdp2-stormcast
negin513 Jun 9, 2026
7c3706c
Apply suggestion from @negin513
negin513 Jun 11, 2026
0890287
revert(_unique_model_names): restore r-string prefix and numeric-suff…
negin513 Jun 11, 2026
e08743a
revert(_cache_if_needed): restore r-string docstring style from main
negin513 Jun 11, 2026
ffda31a
docs(changelog): add FSDP2 checkpoint and StormCast migration entries
negin513 Jun 11, 2026
3c7fb85
Merge branch 'main' into fsdp2-stormcast
negin513 Jun 11, 2026
d877eb7
Merge branch 'main' into fsdp2-stormcast
negin513 Jun 11, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Loading