Skip to content

Add knob ablation test for visual regression rendering#508

Draft
agentydragon wants to merge 1 commit intodevelfrom
claude/test-rendering-knobs-Hf26W
Draft

Add knob ablation test for visual regression rendering#508
agentydragon wants to merge 1 commit intodevelfrom
claude/test-rendering-knobs-Hf26W

Conversation

@agentydragon
Copy link
Copy Markdown
Owner

Summary

Adds a comprehensive knob ablation test that systematically disables each rendering knob individually and compares the resulting screenshots against a baseline (all knobs enabled). This helps identify which knobs actually affect visual rendering and validates the hermetic rendering setup.

Key Changes

  • New test file (tests/knob_ablation.mjs):

    • Tests 40+ rendering knobs across Chrome flags, CSS overrides, environment variables, media features, and viewport settings
    • Runs on two representative scenarios: DefinitionDetail (text-heavy) and DistributionChartRecall (chart rendering)
    • Uses pixelmatch with stricter threshold (0.1) to detect subtle rendering differences
    • Generates pixel-by-pixel diff images and JSON/Markdown reports
    • Includes timeout handling and process cleanup for flaky Chrome flag combinations
  • Test runner script (tests/run_knob_ablation.sh):

    • Orchestrates local and remote (RBE) test execution
    • Collects artifacts from Bazel test outputs
    • Supports local, remote, or both modes
  • Bazel integration (BUILD.bazel):

    • Adds knob_ablation test target with eternal timeout
    • Configures dependencies (Puppeteer, pngjs, pixelmatch, fonts, harness)
    • Sets up environment variables for hermetic rendering (FONTCONFIG_FILE, FREETYPE_PROPERTIES)
  • Sample results (knob_ablation_results/):

    • Local test execution results showing which knobs affect rendering
    • Key findings: --disable-lcd-text and --font-render-hinting have measurable impact; most other flags are neutral

Implementation Details

  • Knobs are categorized by type (chrome_flag, css, env, media, viewport) for organized testing
  • Infrastructure flags (--no-sandbox, --single-process, etc.) are excluded from ablation
  • Each knob test runs in isolation with its own browser process to prevent state leakage
  • Handles Chrome crashes/hangs gracefully with per-knob timeouts and SIGKILL cleanup
  • Generates both human-readable (Markdown table) and machine-readable (JSON) reports
  • Supports environment variable override for execution context tracking

https://claude.ai/code/session_01FksebaqNvhq1t48eQ2AHGF

Exhaustive test of all 33 hermetic rendering knobs (Chrome flags, CSS
overrides, env vars, media features, viewport settings). Each knob is
individually removed and the resulting screenshot compared against a
baseline.

Key finding: only 4 of 33 knobs actually affect rendering output:
- --disable-lcd-text (0.6% diff — largest single-flag impact)
- --font-render-hinting=none (0.1-0.2% on charts)
- Hermetic Inter font CSS (layout changes without it)
- deviceScaleFactor=1 (different pixel dimensions)

The remaining 20+ flags produce identical output when removed, both
locally (gVisor) and on RBE (BuildBuddy). Full results in report.md.

https://claude.ai/code/session_01FksebaqNvhq1t48eQ2AHGF
@agentydragon agentydragon force-pushed the devel branch 2 times, most recently from cb3df7d to 2a0568e Compare February 20, 2026 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants