Add FreshLILOLabelCheck transition criterion (#4994) by ItsMrLin · Pull Request #4994 · facebook/Ax

ItsMrLin · 2026-03-06T21:27:39Z

Summary:

Add a hash-aware transition criterion for LILO GS loops.
FreshLILOLabelCheck counts only trials whose LILO input hash matches the
current experiment state, ensuring transitions are gated on fresh labels
(produced under current data + LLM messages).

The require_sufficient flag controls the transition direction:

require_sufficient=True (LILO_LABELING -> MBG): is_met when fresh count

= threshold. "Enough fresh labels -- proceed to BO generation."
require_sufficient=False (MBG -> LILO_LABELING): is_met when fresh count
< threshold. "Labels are stale -- relabel before generating."

Non-LILO experiments (no pairwise DerivedMetric) short-circuit:
require_sufficient=True -> always met, require_sufficient=False -> never
met. This prevents false relabeling triggers on non-LILO experiments.

Reviewed By: saitcakmak

Differential Revision: D95284285

meta-codesync · 2026-03-06T21:28:02Z

@ItsMrLin has exported this pull request. If you are a Meta employee, you can view the originating Diff in D95284285.

codecov-commenter · 2026-03-06T22:01:41Z

Codecov Report

❌ Patch coverage is 93.89671% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.83%. Comparing base (1e48d0f) to head (4d64f87).

Files with missing lines	Patch %	Lines
ax/adapter/torch.py	20.00%	8 Missing ⚠️
ax/generation_strategy/transition_criterion.py	90.00%	3 Missing ⚠️
ax/adapter/adapter_utils.py	94.11%	1 Missing ⚠️
ax/utils/common/hash_utils.py	96.15%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4994      +/-   ##
==========================================
- Coverage   96.84%   96.83%   -0.01%     
==========================================
  Files         604      605       +1     
  Lines       65022    65235     +213     
==========================================
+ Hits        62971    63172     +201     
- Misses       2051     2063      +12

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Summary: Add a hash-aware transition criterion for LILO GS loops. Unlike plain MinTrials which counts all completed trials from a node, MinTrialsWithLILOInputHashCheck only counts trials whose LILO input hash matches the current experiment state. This ensures the GS correctly transitions from LILO labeling → MBG only when enough *fresh* labels exist (labels produced under the current experiment data + LLM messages). Trials without a LILO input hash (non-LILO trials) are always counted, preserving backward compatibility. Changes: - Add `MinTrialsWithLILOInputHashCheck` class to `transition_criterion.py` that delegates hash computation to `get_current_lilo_hash` from `hash_utils` (replacing a private `_compute_current_hash` static method) - Remove redundant pass-through `__init__` — the parent class handles all args - Register in JSON encoder/decoder registries for serialization support - Add tests verifying fresh/stale counting behavior Reviewed By: saitcakmak Differential Revision: D95284285

Summary: Pull Request resolved: facebook#4994 Add a hash-aware transition criterion for LILO GS loops. Unlike plain MinTrials which counts all completed trials from a node, MinTrialsWithLILOInputHashCheck only counts trials whose LILO input hash matches the current experiment state. This ensures the GS correctly transitions from LILO labeling → MBG only when enough *fresh* labels exist (labels produced under the current experiment data + LLM messages). Trials without a LILO input hash (non-LILO trials) are always counted, preserving backward compatibility. Changes: - Add `MinTrialsWithLILOInputHashCheck` class to `transition_criterion.py` that delegates hash computation to `get_current_lilo_hash` from `hash_utils` (replacing a private `_compute_current_hash` static method) - Remove redundant pass-through `__init__` — the parent class handles all args - Register in JSON encoder/decoder registries for serialization support - Add tests verifying fresh/stale counting behavior Reviewed By: saitcakmak Differential Revision: D95284285

Summary: Add a hash-aware transition criterion for LILO GS loops. `FreshLILOLabelCheck` counts only trials whose LILO input hash matches the current experiment state, ensuring transitions are gated on *fresh* labels (produced under current data + LLM messages). The `require_sufficient` flag controls the transition direction: - `require_sufficient=True` (LILO → MBG): is_met when fresh count ≥ threshold. "Enough fresh labels — proceed to BO generation." - `require_sufficient=False` (MBG → LILO): is_met when fresh count < threshold. "Labels are stale — relabel before generating." Non-LILO experiments (no pairwise DerivedMetric) short-circuit: `require_sufficient=True` → always met, `require_sufficient=False` → never met. This prevents false relabeling triggers on non-LILO experiments. Renamed from `MinTrialsWithLILOInputHashCheck`. Reviewed By: saitcakmak Differential Revision: D95284285

Summary: Pull Request resolved: facebook#4994 Add a hash-aware transition criterion for LILO GS loops. `FreshLILOLabelCheck` counts only trials whose LILO input hash matches the current experiment state, ensuring transitions are gated on *fresh* labels (produced under current data + LLM messages). The `require_sufficient` flag controls the transition direction: - `require_sufficient=True` (LILO → MBG): is_met when fresh count ≥ threshold. "Enough fresh labels — proceed to BO generation." - `require_sufficient=False` (MBG → LILO): is_met when fresh count < threshold. "Labels are stale — relabel before generating." Non-LILO experiments (no pairwise DerivedMetric) short-circuit: `require_sufficient=True` → always met, `require_sufficient=False` → never met. This prevents false relabeling triggers on non-LILO experiments. Renamed from `MinTrialsWithLILOInputHashCheck`. Reviewed By: saitcakmak Differential Revision: D95284285

Summary: Add a hash-aware transition criterion for LILO GS loops. `FreshLILOLabelCheck` counts only trials whose LILO input hash matches the current experiment state, ensuring transitions are gated on *fresh* labels (produced under current data + LLM messages). The `require_sufficient` flag controls the transition direction: - `require_sufficient=True` (LILO_LABELING -> MBG): is_met when fresh count >= threshold. "Enough fresh labels -- proceed to BO generation." - `require_sufficient=False` (MBG -> LILO_LABELING): is_met when fresh count < threshold. "Labels are stale -- relabel before generating." Non-LILO experiments (no pairwise DerivedMetric) short-circuit: `require_sufficient=True` -> always met, `require_sufficient=False` -> never met. This prevents false relabeling triggers on non-LILO experiments. Reviewed By: saitcakmak Differential Revision: D95284285

Summary: Add hash-based data freshness tracking for LILO (Language-in-the-Loop) pairwise preference labels. When LILOPairwiseMetric produces labels, it now stamps a SHA-256 hash of the experiment's LILO inputs (metric data for input_metric_names + LLM messages) onto the trial's _properties. If any of these inputs change (new data arrives, data is updated, or the user modifies LLM messages), the hash changes, indicating that existing LILO labels are stale. Changes: - Add `LILO_INPUT_HASH` key to `Keys` enum in `constants.py` - Create `ax/utils/common/hash_utils.py` with `compute_lilo_input_hash` (standalone hash function) and `get_current_lilo_hash` (convenience helper that looks up the pairwise `DerivedMetric` on an experiment, extracts `input_metric_names`, and computes the hash — returns `None` if no pairwise metric is registered) - Stamp hash in `LILOPairwiseMetric._compute_derived_values` after producing labels - Add tests for hash determinism, sensitivity to data/message changes, stamping, and `get_current_lilo_hash` helper Differential Revision: D95284287

Summary: When building the RankingDataset for PairwiseGP model fitting, exclude LILO trial data whose input hash doesn't match the current experiment state. This ensures PairwiseGP is only fitted on labels that are consistent with the current metric data and LLM messages. Changes: - Add `_get_fresh_pairwise_trial_indices` helper to `adapter_utils.py`: uses `get_current_lilo_hash` from `hash_utils` to compute the current hash and returns trial indices whose stamped hash matches, or `None` if not a LILO experiment (preserving BOPE compatibility) - Filter pairwise data in `TorchAdapter._convert_experiment_data` before calling `prep_pairwise_data`, ensuring stale rows are excluded - Add tests for hash-based filtering logic Differential Revision: D95284286

Summary: Add a hash-aware transition criterion for LILO GS loops. `FreshLILOLabelCheck` counts only trials whose LILO input hash matches the current experiment state, ensuring transitions are gated on *fresh* labels (produced under current data + LLM messages). The `require_sufficient` flag controls the transition direction: - `require_sufficient=True` (LILO_LABELING -> MBG): is_met when fresh count >= threshold. "Enough fresh labels -- proceed to BO generation." - `require_sufficient=False` (MBG -> LILO_LABELING): is_met when fresh count < threshold. "Labels are stale -- relabel before generating." Non-LILO experiments (no pairwise DerivedMetric) short-circuit: `require_sufficient=True` -> always met, `require_sufficient=False` -> never met. This prevents false relabeling triggers on non-LILO experiments. Reviewed By: saitcakmak Differential Revision: D95284285

Summary: Pull Request resolved: facebook#4994 Add a hash-aware transition criterion for LILO GS loops. `FreshLILOLabelCheck` counts only trials whose LILO input hash matches the current experiment state, ensuring transitions are gated on *fresh* labels (produced under current data + LLM messages). The `require_sufficient` flag controls the transition direction: - `require_sufficient=True` (LILO_LABELING -> MBG): is_met when fresh count >= threshold. "Enough fresh labels -- proceed to BO generation." - `require_sufficient=False` (MBG -> LILO_LABELING): is_met when fresh count < threshold. "Labels are stale -- relabel before generating." Non-LILO experiments (no pairwise DerivedMetric) short-circuit: `require_sufficient=True` -> always met, `require_sufficient=False` -> never met. This prevents false relabeling triggers on non-LILO experiments. Reviewed By: saitcakmak Differential Revision: D95284285

meta-codesync · 2026-03-13T20:37:35Z

This pull request has been merged in e2056d2.

meta-cla Bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Mar 6, 2026

meta-codesync Bot added fb-exported meta-exported labels Mar 6, 2026

ItsMrLin force-pushed the export-D95284285 branch from 1c0ab8f to 2d0aa0e Compare March 9, 2026 04:13

ItsMrLin force-pushed the export-D95284285 branch 2 times, most recently from 9facfc4 to bfc2cc4 Compare March 11, 2026 19:52

meta-codesync Bot changed the title ~~Add MinTrialsWithLILOInputHashCheck transition criterion~~ Add FreshLILOLabelCheck transition criterion (#4994) Mar 13, 2026

ItsMrLin force-pushed the export-D95284285 branch from bfc2cc4 to da15e27 Compare March 13, 2026 02:48

ItsMrLin force-pushed the export-D95284285 branch from da15e27 to 2f58b4b Compare March 13, 2026 02:57

meta-codesync Bot changed the title ~~Add FreshLILOLabelCheck transition criterion (#4994)~~ Add FreshLILOLabelCheck transition criterion Mar 13, 2026

ItsMrLin force-pushed the export-D95284285 branch from 2f58b4b to 36eef43 Compare March 13, 2026 16:12

meta-codesync Bot changed the title ~~Add FreshLILOLabelCheck transition criterion~~ Add FreshLILOLabelCheck transition criterion (#4994) Mar 13, 2026

ItsMrLin force-pushed the export-D95284285 branch from 36eef43 to 2bb65c9 Compare March 13, 2026 17:09

ItsMrLin added 2 commits March 13, 2026 10:11

ItsMrLin force-pushed the export-D95284285 branch from 2bb65c9 to 91ef649 Compare March 13, 2026 17:13

ItsMrLin force-pushed the export-D95284285 branch from 91ef649 to ac531be Compare March 13, 2026 17:14

ItsMrLin force-pushed the export-D95284285 branch from ac531be to 4d64f87 Compare March 13, 2026 17:17

meta-codesync Bot closed this in e2056d2 Mar 13, 2026

facebook-github-tools Bot added the Merged label Mar 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FreshLILOLabelCheck transition criterion (#4994)#4994

Add FreshLILOLabelCheck transition criterion (#4994)#4994
ItsMrLin wants to merge 3 commits into
facebook:mainfrom
ItsMrLin:export-D95284285

ItsMrLin commented Mar 6, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

meta-codesync Bot commented Mar 6, 2026

Uh oh!

codecov-commenter commented Mar 6, 2026 •

edited

Loading

Uh oh!

meta-codesync Bot commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ItsMrLin commented Mar 6, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-codesync Bot commented Mar 6, 2026

Uh oh!

codecov-commenter commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

meta-codesync Bot commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ItsMrLin commented Mar 6, 2026 •

edited by meta-codesync Bot

Loading

codecov-commenter commented Mar 6, 2026 •

edited

Loading