Hash-based filtering of stale LILO data in adapter#4993
Open
ItsMrLin wants to merge 2 commits intofacebook:mainfrom
Open
Hash-based filtering of stale LILO data in adapter#4993ItsMrLin wants to merge 2 commits intofacebook:mainfrom
ItsMrLin wants to merge 2 commits intofacebook:mainfrom
Conversation
ItsMrLin
added a commit
to ItsMrLin/Ax
that referenced
this pull request
Mar 6, 2026
Summary: When building the RankingDataset for PairwiseGP model fitting, exclude LILO trial data whose input hash doesn't match the current experiment state. This ensures PairwiseGP is only fitted on labels that are consistent with the current metric data and LLM messages. Changes: - Add `_get_fresh_pairwise_trial_indices` helper to `adapter_utils.py`: uses `get_current_lilo_hash` from `hash_utils` to compute the current hash and returns trial indices whose stamped hash matches, or `None` if not a LILO experiment (preserving BOPE compatibility) - Filter pairwise data in `TorchAdapter._convert_experiment_data` before calling `prep_pairwise_data`, ensuring stale rows are excluded - Add tests for hash-based filtering logic Reviewed By: saitcakmak Differential Revision: D95284286
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4993 +/- ##
==========================================
- Coverage 96.84% 96.82% -0.02%
==========================================
Files 601 602 +1
Lines 64732 64833 +101
==========================================
+ Hits 62687 62777 +90
- Misses 2045 2056 +11 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary: Add hash-based data freshness tracking for LILO (Language-in-the-Loop) pairwise preference labels. When LILOPairwiseMetric produces labels, it now stamps a SHA-256 hash of the experiment's LILO inputs (metric data for input_metric_names + LLM messages) onto the trial's _properties. If any of these inputs change (new data arrives, data is updated, or the user modifies LLM messages), the hash changes, indicating that existing LILO labels are stale. Changes: - Add `LILO_INPUT_HASH` key to `Keys` enum in `constants.py` - Create `ax/utils/common/hash_utils.py` with `compute_lilo_input_hash` (standalone hash function) and `get_current_lilo_hash` (convenience helper that looks up the pairwise `DerivedMetric` on an experiment, extracts `input_metric_names`, and computes the hash — returns `None` if no pairwise metric is registered) - Stamp hash in `LILOPairwiseMetric._compute_derived_values` after producing labels - Add tests for hash determinism, sensitivity to data/message changes, stamping, and `get_current_lilo_hash` helper Reviewed By: saitcakmak Differential Revision: D95284287
Summary: When building the RankingDataset for PairwiseGP model fitting, exclude LILO trial data whose input hash doesn't match the current experiment state. This ensures PairwiseGP is only fitted on labels that are consistent with the current metric data and LLM messages. Changes: - Add `_get_fresh_pairwise_trial_indices` helper to `adapter_utils.py`: uses `get_current_lilo_hash` from `hash_utils` to compute the current hash and returns trial indices whose stamped hash matches, or `None` if not a LILO experiment (preserving BOPE compatibility) - Filter pairwise data in `TorchAdapter._convert_experiment_data` before calling `prep_pairwise_data`, ensuring stale rows are excluded - Add tests for hash-based filtering logic Reviewed By: saitcakmak Differential Revision: D95284286
e228964 to
6be636a
Compare
ItsMrLin
added a commit
to ItsMrLin/Ax
that referenced
this pull request
Mar 9, 2026
Summary: When building the RankingDataset for PairwiseGP model fitting, exclude LILO trial data whose input hash doesn't match the current experiment state. This ensures PairwiseGP is only fitted on labels that are consistent with the current metric data and LLM messages. Changes: - Add `_get_fresh_pairwise_trial_indices` helper to `adapter_utils.py`: uses `get_current_lilo_hash` from `hash_utils` to compute the current hash and returns trial indices whose stamped hash matches, or `None` if not a LILO experiment (preserving BOPE compatibility) - Filter pairwise data in `TorchAdapter._convert_experiment_data` before calling `prep_pairwise_data`, ensuring stale rows are excluded - Add tests for hash-based filtering logic Reviewed By: saitcakmak Differential Revision: D95284286
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
When building the RankingDataset for PairwiseGP model fitting, exclude LILO
trial data whose input hash doesn't match the current experiment state. This
ensures PairwiseGP is only fitted on labels that are consistent with the
current metric data and LLM messages.
Changes:
_get_fresh_pairwise_trial_indiceshelper toadapter_utils.py:uses
get_current_lilo_hashfromhash_utilsto compute the current hashand returns trial indices whose stamped hash matches, or
Noneif not aLILO experiment (preserving BOPE compatibility)
TorchAdapter._convert_experiment_databeforecalling
prep_pairwise_data, ensuring stale rows are excludedReviewed By: saitcakmak
Differential Revision: D95284286