Add knob ablation test for visual regression rendering#508
Draft
agentydragon wants to merge 1 commit intodevelfrom
Draft
Add knob ablation test for visual regression rendering#508agentydragon wants to merge 1 commit intodevelfrom
agentydragon wants to merge 1 commit intodevelfrom
Conversation
Exhaustive test of all 33 hermetic rendering knobs (Chrome flags, CSS overrides, env vars, media features, viewport settings). Each knob is individually removed and the resulting screenshot compared against a baseline. Key finding: only 4 of 33 knobs actually affect rendering output: - --disable-lcd-text (0.6% diff — largest single-flag impact) - --font-render-hinting=none (0.1-0.2% on charts) - Hermetic Inter font CSS (layout changes without it) - deviceScaleFactor=1 (different pixel dimensions) The remaining 20+ flags produce identical output when removed, both locally (gVisor) and on RBE (BuildBuddy). Full results in report.md. https://claude.ai/code/session_01FksebaqNvhq1t48eQ2AHGF
cb3df7d to
2a0568e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a comprehensive knob ablation test that systematically disables each rendering knob individually and compares the resulting screenshots against a baseline (all knobs enabled). This helps identify which knobs actually affect visual rendering and validates the hermetic rendering setup.
Key Changes
New test file (
tests/knob_ablation.mjs):DefinitionDetail(text-heavy) andDistributionChartRecall(chart rendering)Test runner script (
tests/run_knob_ablation.sh):local,remote, orbothmodesBazel integration (
BUILD.bazel):knob_ablationtest target with eternal timeoutSample results (
knob_ablation_results/):--disable-lcd-textand--font-render-hintinghave measurable impact; most other flags are neutralImplementation Details
--no-sandbox,--single-process, etc.) are excluded from ablationhttps://claude.ai/code/session_01FksebaqNvhq1t48eQ2AHGF