Skip to content

feat: ban retired UMA checkpoints by md5#1986

Draft
misko wants to merge 4 commits into
mainfrom
feat/banned-checkpoints
Draft

feat: ban retired UMA checkpoints by md5#1986
misko wants to merge 4 commits into
mainfrom
feat/banned-checkpoints

Conversation

@misko
Copy link
Copy Markdown
Contributor

@misko misko commented May 5, 2026

Refuse to load uma-s-1p1 and uma-s-1p2 inference checkpoints. The check runs at the single chokepoint (MLIPPredictUnit.init) immediately before torch.load, so every end-user inference path -- get_predict_unit, FAIRChemCalculator.from_model_checkpoint, and direct construction -- is covered. New bans can be added by appending an md5 to _BANNED_CHECKPOINTS in predict.py.

Includes a parameterized integration test that loads the cached HF artifacts and asserts BannedCheckpointError, plus a small utility script for computing the md5 of a downloaded checkpoint.

@meta-cla meta-cla Bot added the cla signed label May 5, 2026
@misko misko force-pushed the feat/banned-checkpoints branch from 90b2a52 to e4f3256 Compare May 5, 2026 16:41
@misko misko added enhancement New feature or request minor Minor version release labels May 5, 2026
Both uma-s-1 and uma-s-1p2 were trained with a size-extensivity bug.
Recommended replacements: uma-s-1 -> uma-s-1p1, uma-s-1p2 -> uma-s-1p2p1.

The check runs at the single chokepoint (MLIPPredictUnit.__init__)
immediately before torch.load, so every end-user inference path --
get_predict_unit, FAIRChemCalculator.from_model_checkpoint, and direct
construction -- is covered. New bans can be added by appending an md5
to _BANNED_CHECKPOINTS in predict.py.

Includes a parameterized integration test that loads the cached HF
artifacts (using hardcoded HF coordinates so retired-and-removed-from-
registry models stay testable) and asserts BannedCheckpointError.

Downstream tests that previously loaded uma-s-1p2 are updated:
- test_single_atom_predict_1p2 marked xfail(BannedCheckpointError) until
  uma-s-1p2p1 is wired in.
- uma-s-1p2 dropped from the (1p1, 1p2) parametrize lists in
  test_predict.py and test_execution_backends.py; coverage continues
  via uma-s-1p1.
@misko misko force-pushed the feat/banned-checkpoints branch from e4f3256 to f0230be Compare May 5, 2026 20:37
Michael Dzamba added 3 commits May 5, 2026 15:24
configs/uma/speed/uma-speed.yaml pinned uma-s-1p2, which is now banned
due to a size-extensivity bug. The two tests in
test_uma_speed_benchmark.py load this config end-to-end via
launch_main and were failing with BannedCheckpointError.

Switch the active model_checkpoints entry to uma-s-1p1 and leave a
note to re-enable once uma-s-1p2p1 ships.
uma-s-1p2 is banned at load time; keep that intent end-to-end:

- src/fairchem/core/calculate/pretrained_models.json: remove the
  uma-s-1p2 entry. Precedent: uma-s-1 was removed when uma-s-1p1
  shipped. Users now get a clean KeyError listing alternatives
  instead of a 2.2 GB download followed by BannedCheckpointError.
  This also fixes silent test breakage from dynamic registry-driven
  selection (`available_models[0]`, parametrize-over-all,
  random.choice) in test_batcher.py, test_ase_calculator.py,
  test_predict.py, components/conftest.py, and perf/test_inference.py.

- tests/core/conftest.py: drop the orphan uma_s_1p2_checkpoint
  fixture (no consumers; lookup would now KeyError).

- configs/uma/benchmark/perf_check/benchmark.yaml: switch to
  uma-s-1p1 so manual `fairchem -c ...` runs don't KeyError.

- tests/core/components/benchmark/test_perf_check.py: refresh the
  three uma-s-1p2 literals (in skipped tests) to uma-s-1p1.
After removing uma-s-1p2 from pretrained_models.json, the fixture
get_predict_unit("uma-s-1p2") fails with KeyError at registry lookup
before reaching the ban check, so the xfail's narrow
raises=BannedCheckpointError no longer matched and the test reported
ERROR at setup. Broaden to (BannedCheckpointError, KeyError) so the
expected-failure intent holds in both states (model in registry but
banned, or model removed entirely).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed enhancement New feature or request minor Minor version release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant