Generalize `lr_adapt_proxy` Into a Framework-Level Adaptation Module (PyCMA Demo Client, Exact Parity)

Summary

Build a reusable adaptation framework inside this repo, then prove it with one client integration (pycma) that reproduces current lr_adapt_proxy behavior exactly under deterministic conditions (workers=1). This remains a decision-complete implementation spec for external peer review.

Round-2 precision updates:

Removed redundant initial_value from AdaptationContext.
Converted multi-worker soft-gate parity from qualitative wording to measurable criteria.
Split test phases so deterministic parity tests land before runner rewiring.
Explicitly defined direction="maximize" behavior and hard-gate trace keys.
Clarified was_clamped surfacing as internal-only in v1.

Round-3 strict polish updates:

Pinned soft-gate baseline to a specific run ID and manifest commit.
Pinned soft-gate metric provenance to specific artifact file and row filters.
Added explicit types for fitness, current_value, and AdaptationAction fields in plan prose.
Explicitly documented that pipeline-level parity is intentionally workers>=1 (not workers=1).

Locked decisions for v1:

Use repo-relative references only (no absolute local filesystem links).
Use a tiered parity contract: deterministic exact parity for workers=1; measurable consistency checks for multi-worker runs.
No rollback/feature-flag path in v1.

Scope

In scope: internal architecture refactor from hardcoded proxy logic to a generic policy interface with one production client.
In scope: preserve current CLI/config behavior and output schemas.
In scope: preserve current method names (vanilla_cma, lr_adapt_proxy, pop4x).
Out of scope: adding a second optimizer client, changing benchmark matrix, changing inferential methodology, retuning hyperparameters, or revising scientific claims.
Out of scope: fallback toggles or dual-path runtime switches for the legacy implementation.

Current Baseline (to preserve)

Current hook is method-specific and runs post-tell only for lr_adapt_proxy in experiments/methods.py.
Current rule directly mutates es.sigma in experiments/lr_adapt_proxy.py.
Current diagnostics contract includes proxy_signal, proxy_noise, proxy_snr, proxy_ema_snr, proxy_sigma_factor, proxy_sigma in experiments/lr_adapt_proxy.py.
Current run-row contract includes proxy_sigma_factor_last and proxy_ema_snr_last in experiments/methods.py.
Soft-gate baseline run is pinned to artifacts/runs/high-rigor/20260305T060114Z-cac939ce/results.
Baseline manifest for that run pins source commit 675b7bd in manifest.json.
Baseline config is experiments/config/high_rigor.yaml.

Important API / Interface / Type Changes

Add adaptation core package at experiments/adaptation.
Add typed context and action models in experiments/adaptation/types.py.
Add policy protocol in experiments/adaptation/protocols.py.
Add LRProxyPolicy in experiments/adaptation/policies/lr_proxy.py.
Add pycma client adapter in experiments/adaptation/clients/pycma_sigma.py.
Refactor runner wiring in experiments/methods.py to use the policy interface instead of method-specific mutation branches.
Keep experiments/lr_adapt_proxy.py as a backward-compatible shim that delegates to the new policy core while preserving existing function signature and diagnostics keys.

Parity Definition (Tiered)

Step-level parity (workers=1, hard gate): generation-by-generation exact equality using Python float == on the following keys:
- proxy_signal
- proxy_noise
- proxy_snr
- proxy_ema_snr
- proxy_sigma_factor
- proxy_sigma
Run-level parity (workers=1, hard gate): exact equality for key run outputs:
- final_best
- proxy_sigma_factor_last
- proxy_ema_snr_last
Pipeline-level parity (workers>=1, soft gate): measurable consistency checks (byte-identical output is not required). This is intentionally workers>=1 and not workers=1.

CI/review gates:

Hard gate: deterministic workers=1 step-level and run-level parity must pass.
Soft gate: multi-worker runs must satisfy all of the following against the pinned baseline:
- schema is identical for runs_long.csv, cell_stats.csv, method_aggregate.csv, and findings.json,
- sign of median_of_cell_median_delta for lr_adapt_proxy matches baseline, where value is read from results/method_aggregate.csv row method=lr_adapt_proxy,
- abs(cells_q_lt_0_05_current - cells_q_lt_0_05_baseline) <= 2, where values are read from results/method_aggregate.csv row method=lr_adapt_proxy.

Detailed Interface Spec

LRProxyParams dataclass in policies/lr_proxy.py with existing parameters only: ema_alpha, snr_up_threshold, snr_down_threshold, sigma_up_factor, sigma_down_factor, sigma_min_ratio, sigma_max_ratio.
AdaptationContext dataclass in types.py with fields:
- fitness: np.ndarray
- generation_index: int (zero-based, job-scoped)
- current_value: float
- direction: Literal["minimize", "maximize"]
AdaptationAction dataclass in types.py with fields:
- next_value: float
- factor: float
- was_clamped: bool
AdaptationStep dataclass in types.py with fields: action, diagnostics.
AdaptationPolicy protocol in protocols.py with step(context: AdaptationContext) -> AdaptationStep.
LRProxyPolicy class state fields: ema_snr, best_so_far, initial_sigma.
direction="maximize" behavior in v1: raise NotImplementedError (explicit fail-fast to prevent silent misuse).
initial_value is intentionally absent from AdaptationContext; policy-owned initial_sigma is the single source of truth.
Diagnostics names returned by LRProxyPolicy.step remain exactly: proxy_signal, proxy_noise, proxy_snr, proxy_ema_snr, proxy_sigma_factor, proxy_sigma.
Runner output column names remain unchanged, including proxy_sigma_factor_last and proxy_ema_snr_last.
was_clamped is internal diagnostic data in v1 and is not persisted to run-level CSV/manifest schema.
Strict purity boundary: policy code is pure decision logic and never mutates optimizer internals.

Data Flow After Refactor

Runner builds method descriptor for each method_name.
For lr_adapt_proxy, runner instantiates LRProxyPolicy(params, initial_sigma) once per job.
Each generation: runner computes fitness via existing objective flow.
Runner builds AdaptationContext and calls policy.step(context) after es.tell.
Client adapter applies returned action to optimizer state (es.sigma <- action.next_value).
Runner records last-step diagnostics into existing output fields.
Non-adaptive methods (vanilla_cma, pop4x) bypass policy path and keep current behavior.
Boundary rule: policy never receives an optimizer object; only the adapter mutates optimizer state.

Implementation Phases

Phase 1: Add adaptation core types/protocols/policy/client skeletons.
Phase 2: Add compatibility shim behavior in experiments/lr_adapt_proxy.py that forwards to LRProxyPolicy.
Phase 3a: Add and pass deterministic parity tests (unit + golden trace) against shim path before runner rewiring.
Phase 3b: Refactor experiments/methods.py to generic policy wiring and remove hardcoded use_lr mutation branch.
Phase 4: Run post-refactor integration/regression checks (eval-only pipeline checks, verifier checks, schema checks, multi-worker soft-gate checks).
Phase 5: Update docs in docs/analysis/lr_adapt_proxy_technical_spec.md, docs/analysis/lr_adapt_proxy_mechanism.md, and README.md to reflect finalized architecture and unchanged empirical claims.

Test Cases and Scenarios

Doc-gate check: no absolute local filesystem paths in this plan.
Plan lint check: pipeline-level parity line contains workers>=1 (and not workers=1).
Baseline pin check: plan contains run ID 20260305T060114Z-cac939ce, commit 675b7bd, and config experiments/config/high_rigor.yaml.
Metric provenance check: soft-gate section names results/method_aggregate.csv with row filter method=lr_adapt_proxy.
Type clarity check: plan explicitly types fitness, current_value, and AdaptationAction fields.
Unit: robust_spread parity for representative arrays, including near-constant values.
Unit: deterministic sequence parity for SNR/EMA/factor/clamp logic across many generations.
Unit: first-generation edge case parity (best_so_far is None path).
Unit: no-improvement path parity (signal=0) and clamp-bound saturation behavior.
Unit: policy purity guard (no optimizer object passed to policy).
Unit: direction="maximize" raises NotImplementedError.
Integration: golden trace parity test (workers=1, fixed seed) asserting exact equality for all six trace keys.
Integration: single-job run with fixed seed and workers=1 asserting exact equality for final_best, proxy_sigma_factor_last, proxy_ema_snr_last.
Integration: eval-only pipeline on small config to confirm no schema drift in runs_long.csv, cell_stats.csv, method_aggregate.csv, findings.json.
Integration: verifier pass using existing script scripts/verify_rerun_artifacts.py.
Contract: was_clamped available in step diagnostics but absent from persisted run schemas.
Regression: ensure vanilla_cma and pop4x rows are unchanged by adaptation refactor path.
Contract: ensure pairwise artifact naming and manifest links remain unchanged.
Multi-worker soft gate checks:

schema invariance,
median_of_cell_median_delta sign consistency,
cells_q_lt_0_05 count within ±2 of baseline.

Acceptance Criteria

PLAN.md has no unresolved ambiguities identified in round-3 feedback.
PLAN.md explicitly resolves all round-2 findings:
- initial_value redundancy removed,
- measurable soft gate added,
- phase-order contradiction resolved,
- maximize behavior defined,
- explicit hard-gate trace keys and was_clamped treatment documented.
Baseline and metric provenance are explicit enough that implementers cannot choose different references.
Type-level expectations for context/action fields are explicit in the plan.
Parity section is enforceable and testable (contains numeric/measurable conditions).
Existing configs run unchanged, including experiments/config/high_rigor.yaml and experiments/config/eval_only_lr_vs_vanilla.yaml.
Output schemas and key names remain identical to baseline.
lr_adapt_proxy numeric behavior passes deterministic exact parity tests at workers=1.
Existing wrapper scripts still execute without interface changes.
No new scope creep is introduced (still one client, no rollback path).

Risks and Mitigations

Risk: accidental behavior drift while extracting logic. Mitigation: Phase 3a parity tests are locked and passing before Phase 3b runner refactor; merge is gated on hard parity checks.
Risk: over-generalization adds unused abstraction noise. Mitigation: one concrete protocol, one policy, one client in v1 only.
Risk: hidden downstream schema dependency. Mitigation: run existing verifier and compare key artifact columns/keys as explicit checks in Phase 4.
Risk: reviewers assume implied rollback support. Mitigation: explicit no-fallback stance in v1; failures are fixed in-path rather than via feature flags.
Risk: proxy_* diagnostic key coupling can be misread as full generality. Mitigation: treat key naming as intentional v1 compatibility debt; document decoupling as a future generalization step in the architecture note.

Review Deliverables

This plan as review artifact.
A short architecture note in repo docs describing policy/context/action model, why pycma is first client, and why proxy_* naming remains in v1.
A parity matrix table template for reviewers listing each invariant and its test.

Assumptions and Defaults

This revision edits only PLAN.md (no code/config changes).
Baseline pin uses tracked high-rigor run 20260305T060114Z-cac939ce, manifest commit 675b7bd, and config experiments/config/high_rigor.yaml.
cells_q_lt_0_05 tolerance default is fixed at ±2 cells for multi-worker soft gate.
Soft-gate thresholds are calibrated for the current 36-cell matrix; if matrix size changes, revisit tolerance policy (for example, percentage-based bounds).
Default target is exact pycma parity for deterministic workers=1 runs.
Multi-worker parity target is measurable consistency plus schema invariance, not byte-identical outputs.
Default API style is pure policy API; mutation remains in client adapter only.
Default first client is pycma sigma control only.
Default compatibility target is strict for config/CLI/artifact schemas.
No fallback/feature-flag path is introduced in v1.
v1 intentionally does not implement maximize semantics; fail-fast behavior is preferred over silent behavior.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize `lr_adapt_proxy` Into a Framework-Level Adaptation Module (PyCMA Demo Client, Exact Parity)

Summary

Scope

Current Baseline (to preserve)

Important API / Interface / Type Changes

Parity Definition (Tiered)

Detailed Interface Spec

Data Flow After Refactor

Implementation Phases

Test Cases and Scenarios

Acceptance Criteria

Risks and Mitigations

Review Deliverables

Assumptions and Defaults

FilesExpand file tree

PLAN.md

Latest commit

History

PLAN.md

File metadata and controls

Generalize lr_adapt_proxy Into a Framework-Level Adaptation Module (PyCMA Demo Client, Exact Parity)

Summary

Scope

Current Baseline (to preserve)

Important API / Interface / Type Changes

Parity Definition (Tiered)

Detailed Interface Spec

Data Flow After Refactor

Implementation Phases

Test Cases and Scenarios

Acceptance Criteria

Risks and Mitigations

Review Deliverables

Assumptions and Defaults

Generalize `lr_adapt_proxy` Into a Framework-Level Adaptation Module (PyCMA Demo Client, Exact Parity)