Hash-based filtering of stale LILO data in adapter by ItsMrLin · Pull Request #4993 · facebook/Ax

ItsMrLin · 2026-03-06T21:26:50Z

Summary:
When building the RankingDataset for PairwiseGP model fitting, exclude LILO
trial data whose input hash doesn't match the current experiment state. This
ensures PairwiseGP is only fitted on labels that are consistent with the
current metric data and LLM messages.

Changes:

Add _get_fresh_pairwise_trial_indices helper to adapter_utils.py:
uses get_current_lilo_hash from hash_utils to compute the current hash
and returns trial indices whose stamped hash matches, or None if not a
LILO experiment (preserving BOPE compatibility)
Filter pairwise data in TorchAdapter._convert_experiment_data before
calling prep_pairwise_data, ensuring stale rows are excluded
Add tests for hash-based filtering logic

Reviewed By: saitcakmak

Differential Revision: D95284286

meta-codesync · 2026-03-06T21:27:10Z

@ItsMrLin has exported this pull request. If you are a Meta employee, you can view the originating Diff in D95284286.

Summary: When building the RankingDataset for PairwiseGP model fitting, exclude LILO trial data whose input hash doesn't match the current experiment state. This ensures PairwiseGP is only fitted on labels that are consistent with the current metric data and LLM messages. Changes: - Add `_get_fresh_pairwise_trial_indices` helper to `adapter_utils.py`: uses `get_current_lilo_hash` from `hash_utils` to compute the current hash and returns trial indices whose stamped hash matches, or `None` if not a LILO experiment (preserving BOPE compatibility) - Filter pairwise data in `TorchAdapter._convert_experiment_data` before calling `prep_pairwise_data`, ensuring stale rows are excluded - Add tests for hash-based filtering logic Reviewed By: saitcakmak Differential Revision: D95284286

codecov-commenter · 2026-03-06T22:01:46Z

Codecov Report

❌ Patch coverage is 89.10891% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.82%. Comparing base (3e8e4e7) to head (6be636a).

Files with missing lines	Patch %	Lines
ax/adapter/torch.py	20.00%	8 Missing ⚠️
ax/utils/common/hash_utils.py	92.30%	2 Missing ⚠️
ax/adapter/adapter_utils.py	94.11%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4993      +/-   ##
==========================================
- Coverage   96.84%   96.82%   -0.02%     
==========================================
  Files         601      602       +1     
  Lines       64732    64833     +101     
==========================================
+ Hits        62687    62777      +90     
- Misses       2045     2056      +11

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Summary: Add hash-based data freshness tracking for LILO (Language-in-the-Loop) pairwise preference labels. When LILOPairwiseMetric produces labels, it now stamps a SHA-256 hash of the experiment's LILO inputs (metric data for input_metric_names + LLM messages) onto the trial's _properties. If any of these inputs change (new data arrives, data is updated, or the user modifies LLM messages), the hash changes, indicating that existing LILO labels are stale. Changes: - Add `LILO_INPUT_HASH` key to `Keys` enum in `constants.py` - Create `ax/utils/common/hash_utils.py` with `compute_lilo_input_hash` (standalone hash function) and `get_current_lilo_hash` (convenience helper that looks up the pairwise `DerivedMetric` on an experiment, extracts `input_metric_names`, and computes the hash — returns `None` if no pairwise metric is registered) - Stamp hash in `LILOPairwiseMetric._compute_derived_values` after producing labels - Add tests for hash determinism, sensitivity to data/message changes, stamping, and `get_current_lilo_hash` helper Reviewed By: saitcakmak Differential Revision: D95284287

Summary: When building the RankingDataset for PairwiseGP model fitting, exclude LILO trial data whose input hash doesn't match the current experiment state. This ensures PairwiseGP is only fitted on labels that are consistent with the current metric data and LLM messages. Changes: - Add `_get_fresh_pairwise_trial_indices` helper to `adapter_utils.py`: uses `get_current_lilo_hash` from `hash_utils` to compute the current hash and returns trial indices whose stamped hash matches, or `None` if not a LILO experiment (preserving BOPE compatibility) - Filter pairwise data in `TorchAdapter._convert_experiment_data` before calling `prep_pairwise_data`, ensuring stale rows are excluded - Add tests for hash-based filtering logic Reviewed By: saitcakmak Differential Revision: D95284286

meta-cla bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Mar 6, 2026

meta-codesync bot added fb-exported meta-exported labels Mar 6, 2026

ItsMrLin added 2 commits March 8, 2026 21:12

ItsMrLin force-pushed the export-D95284286 branch from e228964 to 6be636a Compare March 9, 2026 04:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hash-based filtering of stale LILO data in adapter#4993

Hash-based filtering of stale LILO data in adapter#4993
ItsMrLin wants to merge 2 commits intofacebook:mainfrom
ItsMrLin:export-D95284286

ItsMrLin commented Mar 6, 2026

Uh oh!

meta-codesync bot commented Mar 6, 2026

Uh oh!

codecov-commenter commented Mar 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ItsMrLin commented Mar 6, 2026

Uh oh!

meta-codesync bot commented Mar 6, 2026

Uh oh!

codecov-commenter commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov-commenter commented Mar 6, 2026 •

edited

Loading