Add MinTrialsWithLILOInputHashCheck transition criterion#4994
Open
ItsMrLin wants to merge 3 commits intofacebook:mainfrom
Open
Add MinTrialsWithLILOInputHashCheck transition criterion#4994ItsMrLin wants to merge 3 commits intofacebook:mainfrom
ItsMrLin wants to merge 3 commits intofacebook:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4994 +/- ##
==========================================
- Coverage 96.84% 96.82% -0.02%
==========================================
Files 601 602 +1
Lines 64732 64901 +169
==========================================
+ Hits 62687 62843 +156
- Misses 2045 2058 +13 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary: Add hash-based data freshness tracking for LILO (Language-in-the-Loop) pairwise preference labels. When LILOPairwiseMetric produces labels, it now stamps a SHA-256 hash of the experiment's LILO inputs (metric data for input_metric_names + LLM messages) onto the trial's _properties. If any of these inputs change (new data arrives, data is updated, or the user modifies LLM messages), the hash changes, indicating that existing LILO labels are stale. Changes: - Add `LILO_INPUT_HASH` key to `Keys` enum in `constants.py` - Create `ax/utils/common/hash_utils.py` with `compute_lilo_input_hash` (standalone hash function) and `get_current_lilo_hash` (convenience helper that looks up the pairwise `DerivedMetric` on an experiment, extracts `input_metric_names`, and computes the hash — returns `None` if no pairwise metric is registered) - Stamp hash in `LILOPairwiseMetric._compute_derived_values` after producing labels - Add tests for hash determinism, sensitivity to data/message changes, stamping, and `get_current_lilo_hash` helper Differential Revision: D95284287
Summary: When building the RankingDataset for PairwiseGP model fitting, exclude LILO trial data whose input hash doesn't match the current experiment state. This ensures PairwiseGP is only fitted on labels that are consistent with the current metric data and LLM messages. Changes: - Add `_get_fresh_pairwise_trial_indices` helper to `adapter_utils.py`: uses `get_current_lilo_hash` from `hash_utils` to compute the current hash and returns trial indices whose stamped hash matches, or `None` if not a LILO experiment (preserving BOPE compatibility) - Filter pairwise data in `TorchAdapter._convert_experiment_data` before calling `prep_pairwise_data`, ensuring stale rows are excluded - Add tests for hash-based filtering logic Differential Revision: D95284286
ItsMrLin
added a commit
to ItsMrLin/Ax
that referenced
this pull request
Mar 9, 2026
Summary: Add a hash-aware transition criterion for LILO GS loops. Unlike plain MinTrials which counts all completed trials from a node, MinTrialsWithLILOInputHashCheck only counts trials whose LILO input hash matches the current experiment state. This ensures the GS correctly transitions from LILO labeling → MBG only when enough *fresh* labels exist (labels produced under the current experiment data + LLM messages). Trials without a LILO input hash (non-LILO trials) are always counted, preserving backward compatibility. Changes: - Add `MinTrialsWithLILOInputHashCheck` class to `transition_criterion.py` that delegates hash computation to `get_current_lilo_hash` from `hash_utils` (replacing a private `_compute_current_hash` static method) - Remove redundant pass-through `__init__` — the parent class handles all args - Register in JSON encoder/decoder registries for serialization support - Add tests verifying fresh/stale counting behavior Reviewed By: saitcakmak Differential Revision: D95284285
1c0ab8f to
2d0aa0e
Compare
Summary: Pull Request resolved: facebook#4994 Add a hash-aware transition criterion for LILO GS loops. Unlike plain MinTrials which counts all completed trials from a node, MinTrialsWithLILOInputHashCheck only counts trials whose LILO input hash matches the current experiment state. This ensures the GS correctly transitions from LILO labeling → MBG only when enough *fresh* labels exist (labels produced under the current experiment data + LLM messages). Trials without a LILO input hash (non-LILO trials) are always counted, preserving backward compatibility. Changes: - Add `MinTrialsWithLILOInputHashCheck` class to `transition_criterion.py` that delegates hash computation to `get_current_lilo_hash` from `hash_utils` (replacing a private `_compute_current_hash` static method) - Remove redundant pass-through `__init__` — the parent class handles all args - Register in JSON encoder/decoder registries for serialization support - Add tests verifying fresh/stale counting behavior Reviewed By: saitcakmak Differential Revision: D95284285
2d0aa0e to
9facfc4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Add a hash-aware transition criterion for LILO GS loops. Unlike plain
MinTrials which counts all completed trials from a node,
MinTrialsWithLILOInputHashCheck only counts trials whose LILO input hash
matches the current experiment state. This ensures the GS correctly
transitions from LILO labeling → MBG only when enough fresh labels exist
(labels produced under the current experiment data + LLM messages).
Trials without a LILO input hash (non-LILO trials) are always counted,
preserving backward compatibility.
Changes:
MinTrialsWithLILOInputHashCheckclass totransition_criterion.pythat delegates hash computation to
get_current_lilo_hashfromhash_utils(replacing a private
_compute_current_hashstatic method)__init__— the parent class handles all argsReviewed By: saitcakmak
Differential Revision: D95284285