Skip to content

Latest commit

 

History

History
1074 lines (786 loc) · 75 KB

File metadata and controls

1074 lines (786 loc) · 75 KB

Changelog

2026-05-05 — Y-pixel coverage fix (typed_gapfill flavor)

Branch: bbox-y-coverage-fix. Adds a fifth AOI attribution flavor (typed_gapfill) as a pragmatic post-processing modifier on the typed cascade.

Why

A 2026-05-05 audit during AllSERP descriptive work surfaced a 22.7 % silent contamination of approached & clicked records (391 / 1,723) under typed. The legacy data_loader.click_to_position does Y-band-only assignment with no X check, rolling right-rail dd_right ad clicks (67), page chrome clicks, and inter-result-gap clicks into adjacent organics. The hypothesis that bboxes had a Y-pixel calibration drift was tested and refuted (clicks bias downward, fixations bias upward — opposite directions). The fix is a midpoint-split gap-fill on organic bboxes plus X+Y bbox-aware click attribution and an is_main_axis_click() trial-level filter.

Pragmatic, not principled — DOM-anchored bbox extraction is the principled alternative, deferred as future work. Both typed and typed_gapfill flavors stay queryable side-by-side per the cascade rule (CLAUDE.md).

What landed

  • Producer: scripts/extract_organic_bboxes.py adds --flavor organic_gapfill (midpoint-split semantics, organic-only; ads pass through unchanged). New helpers apply_midpoint_split and assert_no_y_overlap. scripts/apply_gapfill_to_existing.py provides a no-screenshot path for environments where the AdSERP screenshot volume isn't mounted.
  • Typed map + CSV export: build_typed_aoi_map.py --source organic_gapfill; export_aois_by_trial_id.py --attribution typed_gapfill. New outputs at data/aoi-typed-gapfill/ and scripts/output/adserp_aois_by_trial_id_typed_gapfill.csv.
  • data_loader.py helpers (notebooks-v2): load_typed_gapfill_aois, typed_gapfill_aoi_bands, typed_gapfill_aoi_tops, typed_gapfill_aoi_etypes, attribute_click_to_typed_gapfill, is_main_axis_click. X+Y bbox-aware attribution prefers strict containment over tolerance, smallest-area on overlap.
  • Cursor-approach features: compute_cursor_approach_features.py --attribution typed_gapfill writes AdSERP/data/cursor-approach-features-typed-gapfill.json (18,218 records vs legacy 19,774; 231 hard-error trials filtered).
  • AllSERP descriptives: scripts/allserp_descriptives.py --flavor typed_gapfill writes scripts/output/allserp_descriptives_gapfill/.
  • Audit scripts (cite-ready for AllSERP resource paper): scripts/audit_unattributed_clicks.py, audit_dd_right.py, audit_cascade_contamination.py, audit_calibration_bias.py. Each carries regime tag and headline number in the docstring.

Headline shifts (typed → typed_gapfill)

legacy gapfill Δ
AllSERP descriptives total clicks attributed 2,479 2,634 +155
organic fixated % 52.7 55.6 +2.9 pp
paa fixated % 32.8 40.6 +7.8 pp
was_clicked=True records 2,594 2,375 −219
approached & clicked 1,723 1,562 −161
organic clicked records 2,021 1,886 −135
native_ad clicked records 186 137 −49
paa clicked records 27 31 +4

Cascade landed

  • NB21 LOSO click prediction: M3 AUC 0.871 → 0.856 (Δ = −0.015); position coefficient strengthens; per-etype AUC ordering preserved (dd_top 0.913, organic 0.852, native_ad 0.833). K-bbox-y-1..12 rows in docs/notebook-key-claims.md.
  • NB22 four-class taxonomy: full per-etype breakdown under typed_gapfill via compute_regression_labels.py --attribution typed_gapfill. Class proportions invariant within ±0.3 pp (clicked 13.0 % preserved; deferred 13.0 → 13.3; eval-rejected 2.9 % preserved; not-approached 71.0 → 70.8 %). Honest population shed 219 contaminated was_clicked=True records and gained 4 paa records (genuine recovery).
  • NB30 etype × viewport: LOPO AUC 0.687 → 0.701; per-etype max_overlap_frac interaction Δ widens (dd_top −0.108 → −0.163; native_ad −0.236 → −0.288). The "ads need higher viewport overlap to convert to clicks" dissociation strengthens under the cleaner population.
  • AR replay rebuild: build_replay_trial.py --flavor typed_gapfill shipped with screenshot fallback. 4 of 6 confirmed-issue trials rebuilt from local cache; remaining 2 + full 147-trial set auto-complete on volume mount.

NB28 also landed (2026-05-05 PM)

  • NB28 calibration: M4 + vt_bands LOSO AUC = 0.8423 (typed_gapfill) vs 0.842 (legacy absolute). Three-decimal replication — the viewport-band × cursor-retreat discriminator is bbox-attribution-invariant. viewport_time_calibration.viewport_ms_for_trial extended with optional bands= parameter; scripts/nb28_typed_gapfill.py shipped. K-bbox-y-NB28-* rows in docs/notebook-key-claims.md.

What's still pending

  • DOM-anchored bbox extraction (the principled alternative) is named as future work and not started — refuted as a wholesale replacement (re-rendering 2022 SERP HTML in 2026 produces 13–45 px layout drift, docs/plan-demo-fix.md); kept as a future direction for individual element-level geometry rather than full layout.

Pointers


2026-05-04 — Typed AOI cascade (HTML+vision joint typing)

Branch: feat/aoi-pipeline-v3-typed. Extends the prior cascade (feat/aoi-pipeline-v2, organic + organic_hybrid) with a fourth attribution flavor (typed) in which every SERP card is labelled by joint HTML + vision typing. The taxonomy:

organic | dd_top | native_ad | dd_right | top_places | knowledge_panel | paa | image_pack | related_searches | other_widget | unknown_widget | chrome

Pipeline:

  1. Phase 1: scripts/extract_html_widget_types.py parses AdSERP/data/serps/<tid>.html for all 2,776 trials, identifies card-level DOM units in #rso (descending into "Main results" wrappers when present) and #botstuff (Related Searches), and types each card by heading text → structural markers → data-attrid → class → fallback. Outputs data/aoi-html-types/<tid>.json. Type distribution: organic 22,530 (81.4 %), related_searches 1,811 (6.5 %), image_pack 1,600 (5.8 %), knowledge_panel 826 (3.0 %), paa 769 (2.8 %), top_places 86 (0.3 %), other_widget 51 (0.2 %).
  2. Phase 2: scripts/build_typed_aoi_map.py joins HTML types to existing CV bbox coordinates from organic-boundary-data + ad bboxes from ad-boundary-data. Walks bboxes in y-order, matches each to ad-overlap (≥30 %) → ad type, otherwise to HTML #rso card in DOM order. Bottom-of-page CV-detected cells with deep position (≥10) and small height (<200 px) are swept to chrome (off-axis, position = -1). Outputs data/aoi-typed/<tid>.json with [{position, type, x, y, width, height, html_handle, ...}, ...]. Match quality: 90 % of trials have |Δ| ≤ 2 between HTML and bbox card counts; 1.8 % residual unknown_widget after chrome sweep (down from 7.1 % pre-sweep).

Per-corpus typed AOI distribution (entries with position ≥ 0): organic 22,530 (53.0 %), native_ad 9,217 (21.8 %), related_searches 1,811 (4.3 %), image_pack 1,600 (3.8 %), dd_top 1,582 (3.7 %), knowledge_panel 826 (1.9 %), paa 769 (1.8 %), unknown_widget 756 (1.8 %), top_places 86 (0.2 %), other_widget 51 (0.1 %). Off-axis (position = -1): chrome 2,255, dd_right 861, plus the #botstuff related_searches and #rhs knowledge_panel counts above.

Findings replicate under typed

All 2026-05-03 stress-test findings reproduce nearly identically under typed. Hybrid values are preserved at scripts/output/<name>_HYBRID_BACKUP/ for direct comparison.

Finding organic_hybrid typed verdict
Within-item paired LF/HF Δ (return − first), median +6.31 +6.44 ✓ replicates
Same — Wilcoxon two-sided p 5.7×10⁻²³ 2.5×10⁻²³
Participant-level mean-of-means Δ +10.73 +10.90
Pre-scroll cross-position Spearman ρ (P0–P6) −0.857 −0.857 ✓ identical
Pooled steep-vs-plateau MW p 2.6×10⁻²⁵ 2.3×10⁻²⁵
Within-trial Spearman ρ (≥3 segs), median −0.400 −0.400 ✓ identical
Within-trial Spearman, % negative 62.0 % 61.8 %
Cap-10 audit Spearman ρ −0.689 −0.733 ✓ stronger
RIPA2 paired Δ, median +8.05×10⁻⁶ +8.19×10⁻⁶ ✓ both null
RIPA2 paired p (two-sided) 0.17 0.16
Argmax LF/HF → click hit rate 0.320 0.319 ✓ identical
Argmax — chance baseline (1/N) 0.535 0.534
First-scroll-vs-gaze: median above-fold coverage 0.500 0.500 ✓ identical
First-scroll-vs-gaze: % reaching last-visible 10.3 % 10.3 %
Knee × mean click position (per-ppt Spearman) +0.460 +0.471
Knee × P0-fraction of clicks −0.408 −0.408 ✓ identical
Knee × click entropy +0.447 +0.465
Knee × regression rate (NS) −0.021 −0.021 ✓ identical
satisficer trial median knee P2 P2
optimizer trial median knee P1 P1
satopt × knee MW two-sided p 0.022 0.022 ✓ identical

The 5,148 widgets that were previously pooled with organics or filtered out under hybrid are now correctly typed, but the cognitive findings do not shift. This is itself a strong robustness story: the within-item paired return finding, the pre-emptive-scroll behaviour, the rank-value-prior reframe, and the satopt × knee dissociation are all properties of the trial-level cognitive operations, not of widget-vs-organic mis-attribution.

Producer scripts updated

  • scripts/compute_cursor_approach_features.py --attribution typed → emits AdSERP/data/cursor-approach-features-typed.json (19,774 records, 2,774 trials, 9 etypes).
  • scripts/compute_regression_labels.py --attribution typed → emits scripts/output/approach_threshold_sensitivity/regression_labels_cache_typed.json (12,600 regressed / 7,174 not_regressed = 63.7 % regression rate; matches hybrid's 63.x %).
  • notebooks-v2/data_loader.py extended with typed_aoi_bands, typed_aoi_tops, typed_aoi_etypes, attribute_click_to_typed mirroring organic_aoi_* and _hybrid_aoi_tops conventions.

Stress-test scripts swapped to typed (hybrid backups in *_HYBRID_BACKUP)

lfhf_first_vs_return_paired.py (multi-attribution: typed added), lfhf_pre_vs_post_scroll.py, ripa2_first_vs_return_paired.py, lfhf_argmax_predicts_click.py, first_scroll_vs_gaze.py, knee_vs_click_distribution.py, knee_by_satopt.py (hard-swapped), knee_by_rank_variant.py (multi-attribution: typed column added), lfhf_rank_gradient_typed.py (forked from _hybrid.py).

Not yet ported

  • pupil-lfhf sibling repo's compute_butterworth_lfhf.py and compute_ripa2.py are NOT yet typed-aware. The stress tests bypass these by computing LF/HF / RIPA2 from raw pupil on the fly using typed_aoi_tops for window assignment, so the headline pupil paper findings do not depend on pre-computed *-by-position-typed.json JSONs. Notebook re-execution under typed (NB14, NB18, NB22, NB28, etc.) requires the pre-computed JSONs and is deferred to a future cascade pass.
  • compute_lab_gaze_gated_features.py not yet ported to typed; pupil paper does not depend on it.

Empirical changes to flag in paper prose

  • §3.3 (ettac-paper/sections/adserp.tex): updated draft is docs/drafts/ettac-adserp-2026-05-04-v2.4.md. Numbers shift by < 0.05 in correlation strength relative to v2.3 hybrid version; no qualitative changes. The paper does not use the word "typed" or contrast attribution flavors in prose (per "avoid alternate-rank framing" instruction). The Stimuli paragraph mentions display-order ranks across organic, ad, and widget surfaces inline.
  • Internal OSEC memo (docs/drafts/rank-value-prior-osec-2026-05-03.md): rank-value-prior axis × verification-appetite axis remain robust under typed.

2026-05-03 — Pre-vs-post-scroll dissociation, knee distribution, and rank-value prior reframe

Stress tests on AdSERP under organic_hybrid attribution, motivated by the §3.3 rewrite, surface several findings that update OSEC framing and sharpen the cognitive interpretation of the LF/HF rank gradient.

Pre-emptive scroll (operational marker for Survey-active → Survey-external transition)

The median user issues their first significant scroll after fixating only ~50 % of the visible above-fold candidate set; only 10.3 % of trials reach the last visible position before scrolling. The modal deepest pre-scroll fixated position (the per-trial knee) is P1 (34.7 % of trials), with P0 14.3 %, P2 24.3 %, P3 14.4 %, P4 8.9 %, P5+ 3 %. Per-position fraction fixated before first scroll: P0 98.3 %, P1 81.1 %, P2 47.7 %, P3 29.3 %, P4 19.6 %. Active criterion compilation is not an exhaustive pass through the viewport. Source: scripts/output/first_scroll_vs_gaze/.

Pre-vs-post-scroll LF/HF dissociation (Survey-active vs Survey-external sub-modes)

Pre-scroll first-visit LF/HF is sharply rank-correlated (Spearman ρ = −0.857, p = 1.4 × 10⁻², N = 7 positions P0–P6); post-scroll first-visit LF/HF is essentially flat (ρ = −0.482, p = 0.13, N = 11 positions P0–P10). At P0 specifically, pre-scroll first-visit median LF/HF is 27.99 (N = 1,465); the rare post-scroll first-visit at P0 is 14.52 (N = 92), Δ = −13.47, MW p = 3.6 × 10⁻⁶. The same instrument reads two different cognitive modes: Survey-active (criterion compilation in working memory, pre-scroll, sharp gradient) and Survey-external (the SERP as external memory for confirmation under a now-stable criterion, post-scroll, flat). Source: scripts/output/lfhf_pre_vs_post_scroll/.

Within-item paired return-vs-first LF/HF (Evaluate signature)

Across 2,646 paired (trial, position) records under organic_hybrid, return-visit LF/HF is significantly higher than first-visit LF/HF on the same item by the same user (median Δ = +6.31, mean Δ = +12.55, Wilcoxon two-sided p = 5.7 × 10⁻²³; 60 % Δ > 0). Participant-level 80 % Δ > 0, p = 3.1 × 10⁻⁴. Per-rank, the elevation is significant P1–P5. Drift-control rules out within-trial baseline shift: forward-only within-trial Δ between the latest and earliest visited positions is −1.97 (p = 2.6 × 10⁻⁴), opposite direction to the paired return Δ. Metric-specificity control rules out generalised pupil amplitude: RIPA2 paired Δ on the same records is at the noise floor (median +8 × 10⁻⁶, two-sided p = 0.17). The return-elevation lives in the autonomic spectral ratio, not in per-fixation amplitude. Source: scripts/output/lfhf_first_vs_return_paired/, scripts/output/ripa2_first_vs_return_paired/, scripts/output/lfhf_within_trial_drift_control/.

LF/HF as click-prediction feature: null

First-pass LF/HF argmax over visited positions hit-rate vs click_pos under hybrid attribution: 0.32 against a 1/N consideration-set chance baseline of 0.54 (lift = −22 pp, under-performs chance, N = 2,446). LF/HF features alone in LOSO logistic: AUC = 0.636 ± 0.089. Trivial gaze-dwell baseline (n_fixations, total_dwell_ms): AUC = 0.718 ± 0.098. Combined: AUC = 0.715 ± 0.100 — no lift over dwell. LF/HF is a measurement instrument for cognitive state, not a click-prediction feature. Source: scripts/output/lfhf_predicts_return_stress/, scripts/output/lfhf_argmax_predicts_click/, scripts/output/lfhf_click_prediction_test/.

§Predicting return claim from current pupil paper draft does not reproduce

The current ettac-paper/sections/adserp.tex claim that per-participant Wilcoxon on per-(trial, position) median LF/HF gives p = 0.0055 with a participant-cluster bootstrap CI of [+0.94, +3.85] does not reproduce under any of three attribution flavors. Under absolute attribution (matching the paper's stated methodology), participant-Wilcoxon mean-Δ p = 0.57; cluster bootstrap CI [−1.72, +5.68] straddles zero. Median-of-medians-Δ variant gives p = 0.017 (still 3× weaker than the paper). Under organic_hybrid the signal weakens further; under organic (bbox) it goes negative. The wr/nr stratification is rank-confounded: returned items concentrate at top ranks, where LF/HF is higher; once rank is partialled out (per-rank Cohen's d), the effect is null or slightly negative in every cell. The §Predicting return paragraph should be removed and replaced with the within-item paired return finding (which is rank-controlled by construction). Source: scripts/output/lfhf_predicts_return_stress/report.md (12-angle stress test).

Rank-value prior as the cognitive primitive (replaces satopt-as-primitive)

Per-participant correlations (n = 45, all under organic_hybrid):

Correlation Spearman ρ p
mean knee × mean click position +0.460 1.5 × 10⁻³
mean knee × P0-fraction of clicks −0.408 5.5 × 10⁻³
mean knee × P0-or-P1 fraction −0.363 1.4 × 10⁻²
mean knee × P3-or-deeper fraction +0.337 2.4 × 10⁻²
mean knee × click entropy +0.447 2.1 × 10⁻³
mean knee × regression rate −0.021 0.89 (NS)
regression rate × mean click position +0.045 0.77 (NS)
regression rate × P0-fraction +0.101 0.51 (NS)

The participant-level rank-value-prior axis (top-heavy ↔ flat) predicts both knee depth and click distribution shape with consistent signs. Regression rate predicts neither at participant level. The trial-level satopt × knee effect (median split p = 5.9 × 10⁻⁵, optimizer-trial median P1 vs satisficer-trial P2) is real but is dominated by trial-count imbalance — optimizer participants generate more trials with knee data.

Implication: the OSEC-relevant individual-difference space is at least two-dimensional — rank-value prior strength (controls knee depth, click distribution shape, Survey-active investment) and verification appetite (controls return rate, Survey-external duration). Satisficer/optimizer is a one-dimensional projection; the two axes are nearly orthogonal in this dataset.

Source: scripts/output/knee_vs_click_distribution/, scripts/output/knee_by_satopt/, scripts/output/knee_by_rank_variant/. Working memo: docs/drafts/rank-value-prior-osec-2026-05-03.md.

Top-of-fold ad effect on knee (modulator of Survey-active depth)

When the display-order top position (P0 under hybrid) is a top-of-page ad (dd_top, n = 1,306 trials, 59 % of cohort), median knee = P1; under organic-only attribution 25.8 % of the full cohort never fixates an organic before the first scroll. Native-ad P0 trials (n = 452): hybrid knee P3 — native ads enter active criterion compilation as if they were candidates. Organic-top trials (n = 459): knee P2. Top-of-page display ads consume one slot of the active-compilation budget without contributing to organic-result criterion compilation. Source: scripts/output/knee_by_rank_variant/.

Empirical changes to flag in paper prose

  • §3.3 (ettac-paper/sections/adserp.tex): drop the §Predicting return paragraph (lines 164–183 of current draft); replace with within- item paired return finding. Plateau slope is non-significant under hybrid (was marginal under absolute) — the steep-vs-plateau separation now rests on the pooled MW (still very strong, p = 2.6 × 10⁻²⁵), not on the plateau slope itself. See docs/drafts/ettac-adserp-2026-05-03-v2.md.
  • Task-model paper / methods paper: the satopt → knee interpretation needs splitting into two axes; the four-class consideration-set taxonomy sits inside Survey-external + Evaluate, not across all of Survey; Survey-active is typically just ~2 positions per trial (median knee P1).

2026-05-02 — Render pipeline migrated to --attribution organic default

scripts/render_*.py (the canonical paper-figure producers) now accept --attribution {organic,absolute} and default to organic. Bbox-attributed inputs (cursor-approach-features-organic.json + regression_labels_cache_organic.json) flow through to class_distributions.png, coupling_traces.png, cursor_gaze_array.png, cursor_gaze_timeseries.png, deferred_vs_rejected_*.png, gaze_around_cursor.png, gaze_density_class.png, class_distributions_wild_mode.png.

The per_record_coupling.json and per_record_trajectory.json caches inside render_deferred_vs_rejected.py are now keyed by attribution (*_organic.json vs the unsuffixed legacy file) so the n=14,760 cache doesn't collide with the n=13,419 cache.

Empirical changes to flag in paper prose

coupling_traces.png previously showed three well-separated horizontal bands (eval-rejected ≈ 220 px / deferred ≈ 300 px / clicked ≈ 390 px). Under bbox attribution the three traces collapse to ~400 px with heavily overlapping IQR ribbons. The renderer's hardcoded legend captions ("EVAL-REJECTED tracks gaze closely") describe the legacy shape and no longer match the data. The motor-signature dissociation in deferred_vs_rejected_four_panel.png (cursor-gaze distance and dwell deltas, p < 10⁻⁹ and p < 10⁻¹⁹) survives the cascade.

r1_dissociation.png / r1_2x2_dissociation.png (pupil-paper-relevant). The R1 per-(trial, position) RIPA2 vs LF/HF dissociation collapses on the RIPA2 side under bbox attribution. Per-fixation effect on later-returned vs never-returned items: LF/HF d=+0.041, p=1.1e-03 (preserved, sign unchanged); RIPA2 d=+0.006, p=8.0e-01 (was p=0.0058 under absolute, per the JEMR-2025 implementation-bug fix). The "lingered first time" LF/HF claim survives. The "lingered but processed shallowly" joint LF/HF × RIPA2 signature does not — the RIPA2 component appears to have been rank-pooling artifact, not a per-fixation arousal-amplitude difference. pupil paper §3 should drop the RIPA2 leg of the joint dissociation claim unless absolute-attribution is held as the primary.

plot_approach_retreat_hero.png is pinned to absolute attribution. The curated COMMIT exemplar (p015-b1-t5 pos=2) reattributes away from 'clicked' under bbox so the "Commit (clicked)" caption stops matching. New exemplars need hand-picked from cursor-approach-features-organic.json before this hero figure migrates.

plots-v1/plot_ettac_*.png regenerated under bbox-organic. Headline position-load result holds (full-corpus ρ = -0.655, p < 10⁻⁴; steep-phase ρ = -1.000 over P0–P3, p = 3.2 × 10⁻²³). Plateau ρ flipped to +0.321 (p=0.482, n.s.) — directional, but no longer surprising at this attribution.

Aggregate refactor: notebooks-v2/update_key_claims.py is now a reader (notebooks are canonical) instead of a template-writer; emits docs/notebook-key-claims.md directly from each notebook's K-claims cell. Eliminates the two-copy sync problem behind the 2026-05-01 --force-clobber guard.

2026-05-01 — AOI consumer cascade (branch feat/aoi-pipeline-v2)

What shipped

Pipeline + consumer API for the bbox AOI enrichment, plus first-pass K-ID delta evidence under organic-rank attribution. Notebook migrations not yet shipped — Andy's deep dive on pupil paper this weekend will decide which findings move to organic-rank as primary.

Branch state

  • 60a2e7b9 widget filter + composite-cell split + is_ad x-overlap fix
  • da0a8aae band-y guard against featured-snippet false positives
  • This commit: consumer API in data_loader.py + producer migrations + comparison harness

Consumer API additions (notebooks-v2/data_loader.py)

Three new functions consume the bbox JSONs written by scripts/extract_organic_bboxes.py:

  • load_aois(trial_id, include_widgets=False, include_cells=False) — full structured AOI dict; widgets and composite cells are opt-in (default-off matches "second-column variable" convention from methodology §7).
  • organic_aoi_bands(trial_id) — pixel-accurate (y_top, y_bottom) bands per organic; drop-in replacement for result_bands(n, doc_h).
  • organic_aoi_tops(trial_id) — convenience for the y-tops, drop-in for result_band_tops(n, doc_h).

All three fall back to band estimation when a trial's bbox JSON is missing. The 'source' field in the load_aois return discriminates 'bbox' vs 'band_estimate'.

Producer migrations

Both compute_butterworth_lfhf.py and compute_ripa2.py gained --attribution {absolute,organic}. Default is absolute (legacy). Organic-attribution outputs land at:

  • AdSERP/data/butterworth-lfhf-by-position-organic.json
  • AdSERP/data/ripa2-by-position-organic.json

Headline AOI-side audit (full corpus, n=2,776)

pipeline_organic_count vs count_organic_ranks (HTML-derived, ad-overlap excluded; not ground truth — includes some widget-heading h3s):

exact (delta=0):     683/2,776 = 24.6%
|delta| ≤ 1:       1,801/2,776 = 64.9%
|delta| ≤ 2:       2,451/2,776 = 88.3%
median 0, mean -0.20

Widget filter caught 2,008 widgets across 1,628 trials (58.6%). Composite cells found in 166 trials (6.0%, 376 cells).

Consumer-side cascade evidence

Per-fixation re-attribution rate: 73.9% (scripts/output/aoi-consumer-cascade/per-rank-shifts.json).

rank 0:  band 51,255 → bbox 48,908   (-4.6%)
rank 1:  band 43,778 → bbox 28,130   (-35.7%)
rank 2:  band 36,306 → bbox 17,094   (-52.9%)   ← rank-2 peak under absolute is artifactual
rank 3:  band 24,094 → bbox 12,698   (-47.3%)
rank 8:  band  3,872 → bbox  5,449   (+40.7%)
rank 9:  band  2,245 → bbox  3,422   (+52.4%)
rank 10: band    862 → bbox  1,320   (+53.1%)

Top re-attribution flow:

band rank 0 → bbox rank -1: 38,512 fixations  (band-attributed to organic but actually outside any AOI)
band rank 1 → bbox rank -1: 21,791
band rank 2 → bbox rank  0: 16,475  (re-numbered down by ad/widget exclusion)

bbox rank -1 = fixation didn't land on any organic AOI. ~60K fixations were attributed by band estimation to organic ranks 0-1 that actually fall outside organic AOIs entirely — likely on ad cards, search box, knowledge panels, widgets.

NB14 (Butterworth LF/HF × position) under organic attribution

Full table at scripts/output/aoi-consumer-cascade/nb14_nb18_comparison.md. K-IDs computed with the canonical published denominator (positions 0–10, N=11), matching the original Key Claims block — earlier draft of this entry used a wider position range and produced misleading K3 values.

K Claim Old (absolute, ads pooled) New (organic, bbox) Verdict
K1 trials 2,416 2,174 (−242) sample shrinks
K2 segments 6,112 4,450 (−1,662)
K3 ρ pos 0–10 (N=11) −0.927, p=4e-5 −0.655, p=0.029 survives, weaker
K4 ρ pos 1–10 (N=10) −0.903, p=3e-4 −0.539, p=0.108 ⚠ ns
K6 clicked > non-clicked p 3.5e-6 2.5e-7 ✓ stronger
K9 steep vs plateau MW p 1.6e-23 8.8e-9 ✓ holds
K10 steep ρ (pos 0–3) −1.000 (perfect) −0.800, p=0.20 ⚠ ns
K11 plateau ρ (pos 4–10) −0.714, p=0.071 +0.321, p=0.482 ⚠ sign flip

NB18a (RIPA2 × position) under organic attribution

K Old New Verdict
K6 RIPA2 × position ρ −0.262, p=0.366 −0.080, p=0.776 ⚠ ns under both

NB23 (Click share + fixation + dwell × rank) under organic attribution

Full table at scripts/output/aoi-consumer-cascade/nb23_comparison.md. Generated by scripts/compare_nb23_under_attributions.py on n=2,776 trials.

K Claim Old (band, abs rank) New (bbox, org rank) Verdict
K1 Click share × rank ρ −0.952, p=2.3e-5 −0.988, p=9.3e-8 ✓ sharper monotone
K1 N clicks attributed 2,764 2,363 (−401) clicks on ads/KP/widgets correctly excluded
K2 Fixation count × rank ρ −1.000, p=6.6e-64 −0.988, p=9.3e-8 sharper N reflects rank-0 share jump
K2 Fixations attributed 202,792 144,874 ads/widgets/KP fixations correctly excluded
K3 Total dwell × rank ρ −1.000 −0.988 similar
K8 Forward fixations % 74.0% 74.6% stable
K9 Regression fixations % 26.0% 25.4% stable

Ski jump returns under organic attribution. Per-rank click distribution shows the ad-displacement artifact disappear and a genuine terminal-click ski jump emerge at rank 8:

                    band/abs    bbox/org    Δ from prev (org)
rank 0:             18.85%      44.86%
rank 1:             19.10%      17.60%      −27.25
rank 2:             24.57%      10.83%       −6.77    ← ad-displacement peak gone
rank 3:             14.83%       7.24%       −3.60
rank 7:              1.88%       2.20%       −0.55
rank 8:              1.77%       2.71%      +0.51 ⬆  ← terminal-click ski jump
rank 9:              1.12%       1.48%       −1.23

Under absolute rank, the spurious "ski jump" was at rank 2 (24.57%, +5.46% above rank 1) — that was the ad-displacement artifact (top-organic clicks attributed to rank 2 because ads occupied ranks 0–1). Under bbox attribution, it's at rank 8 with a +0.51% bump (52 → 64 clicks at rank 8 vs rank 7) — the canonical end-of-first-viewport terminal-click effect.

NB04 (fixation coverage / per-position budget) under organic attribution

Full table at scripts/output/aoi-consumer-cascade/nb04_comparison.md. abs n=2,764 / org n=2,363.

K Claim Absolute Organic
K2 First-viewport clickers 504 (18.2%) 382 (16.2%)
K4 Mean share of results-above-click fixated 98.0% 96.9%
K6 Mean share of max-scroll-depth results fixated 74.0% 70.7%
K7 FV clickers — share of first-screen results fixated 68.5% 60.3%
K8 Scrollers — share of first-screen results fixated 93.9% 90.8%

Per-position fixation budget shifts dramatically for FV clickers:

Position K-ID Absolute Organic
0 K13 45.4% 67.9%
1 K14 35.7% 28.5%
2 22.9% 17.2%
3 16.1% 12.2%

K13 jumps from 45.4% to 67.9% under bbox attribution — when first-viewport clickers click, 68% of their fixation time is on position 0 (the top organic), not 45% as previously reported. This is consistent with the rank-0 click share jump (NB23 K1: 18.8% → 44.9%): under organic attribution, the top organic is clearly the dominant attentional target.

The N=504 → N=382 shift in FV clickers reflects the 411 trials in NB22 where the click was on an ad/widget — those drop out of the FV-organic-clicker cohort under bbox.

NB22 (four-class taxonomy) under organic attribution

Full table at scripts/output/aoi-consumer-cascade/nb22_comparison.md. Generated on n=2,775 trials.

Class distribution shifts:

Class Absolute share Organic share Δ
clicked 8.2% 8.9% +0.7
deferred 26.2% 27.1% +0.9
evaluated_rejected 15.5% 20.1% +4.6
not_approached 50.1% 43.9% −6.2

The evaluated_rejected class grows substantially under bbox attribution because ad-slot positions that were "not_approached" under absolute rank simply don't exist as positions under organic rank — so the visited-but-not-clicked fraction shifts up.

Per-trial averages:

Absolute Organic Note
Mean visited positions / trial 6.03 5.32 bbox tighter — ad/widget visits not counted
Mean regressed positions / trial 3.86 3.15
% of visited that are regressed 64.0% 59.3%

Per-trial label stability:

  • 99.4% of trials (2,757/2,775) have at least one shifted four-class label when switching from absolute to organic attribution.
  • 411 trials have clicked count differing — i.e., the click landed on a different bucket (organic vs ad/widget) under the two methods.
  • 2,047 trials have deferred count differing — gaze regression picked up different positions because position attribution shifted.

Implication for AR replay rebuild: nearly every curated example in approach-retreat/site/replay/data/curation.json may have stale labels. Re-running build_replay_trial.py on those trials will produce fresh AOI labels via M5; caption claims like "5 DEFERRED AOIs" need automated cross-check against the regenerated labels before re-publishing demos. The 411 click-shifts are particularly important because curation.json filters trials by class profile.

NB25 (corpus structure) shifts

K Old (h3) New (bbox)
K11 modal organic count 10 (26.3%) 9 (33.1%)
K12 range 1–15 1–17
K13 ∈ {9,10,11} 69.8% 75.8%
K14 exactly 10 26.3% (731) 30.5% (847)

What this means — summary

The "monotonic load decline by rank" finding is partly an absolute-rank artifact driven by ad-screening discrimination cost contaminating early positions. Under organic-only attribution, the gradient collapses to ns; what survives strongly is (a) clicked > non-clicked (K6 strengthens) and (b) steep early band vs plateau late band dichotomy (K9 holds at p<10⁻⁸).

Andy's proposed reframe: organic rank as primary, ads as essential distractors. Headline becomes "cognitive engagement on organic search results is two-band — early evaluation-heavy band + late satisficer plateau, with clicked positions uniformly elevated regardless of band". K6 + K9 carry the new headline; K3/K4/K10/K11 retire to a robustness section that shows the absolute-rank curves and explains the ad-distractor contamination.

Click-attribution split (full corpus, n=2,775)

Bbox AOIs are extracted tight to visual content; clicks frequently land in the small visual gap between adjacent card rectangles (~10–15 px typical). Under strict containment those count as "off-AOI" even when they were almost certainly intended for an adjacent card.

Distribution of off-AOI click distance to nearest organic edge:

median 10 px,  P75 15 px,  P90 22 px
 ≤ 10 px:  55.1% rescued
 ≤ 20 px:  88.8% rescued
 ≤ 30 px:  92.5% rescued ← elbow; further loosening rescues only ~0.3pp more
 ≤ 50 px:  92.5% rescued
 ≤ 100 px: 92.8% rescued

attribute_click_to_organic(click_y, trial_id, tolerance_px=30) added to data_loader.py. Logic: strict containment in organic always wins; if click falls inside any ad rect, refuse to snap (it's an ad click); if inside any filtered widget, refuse; otherwise snap to nearest organic if within tolerance_px.

Click attribution under each method:

Bucket Strict (tolerance=0) Tolerant (30 px)
Organic 1,785 (64.3%) 2,181 (78.6%)
All ads 557 (20.1%) 557 (20.1%)
Widgets (filtered) 5 (0.2%) 5 (0.2%)
Off-AOI (KP / image carousel / footer / large gaps) 428 (15.4%) 32 (1.2%)

The 30 px tolerance rescues 396 clicks that strict containment loses — these are the visual-margin clicks ("clicked the bottom edge of card 3") not the truly off-AOI clicks (which stay at ~32 = 1.2%).

Headline for paper framing: under this attribution, 78.6% of clicks are on organic results, 20.1% on ads, 1.2% on content the pipeline doesn't model (Knowledge Panel, image carousel, etc.). The "ads as essential distractors" frame holds; the methodology limitation around right-pane / KP coverage affects ~1% of clicks, not 15%.

NB24 retreat arc geometry under organic_hybrid attribution

scripts/compute_retreat_arcs.py extracts NB24's extract_retreat_arcs_v2 into a producer with --attribution {absolute, organic_hybrid}. The hybrid mode combines bbox organics + shipped ad rectangles into one ordered position list with etype tags — preserving the organic-vs-top-ad-vs-native-ad comparison that NB24 needs.

Output:

  • AdSERP/data/retreat-arcs.json — 1,490 raw arcs (legacy absolute)
  • AdSERP/data/retreat-arcs-organic.json — 5,201 raw arcs (organic_hybrid; 3.5× coverage gain)

The coverage gain reflects bbox AOIs being pixel-accurate (cursor enters/exits positions cleanly) vs band estimation (cursor trajectory frequently fell into "no-position" gaps).

Metric Absolute Organic_hybrid Verdict
Retreats (valid arcs, not clicked) 907 1,651
Top Ad arc ratio (median) 1.51 1.55 ✓ unchanged
Top Ad lateral displacement (median px) 63 62 ✓ unchanged
Top Ad lateral/arc ratio (pooled) 0.166 0.170 ✓ replicates
Organic arc ratio 1.22 1.11 ✓ sharper (more linear)
Organic lateral displacement 33 px 11 px ✓ sharper (cleaner)
Organic vs Top Ad arc ratio MW p 1.1e-5 4.0e-17 ✓ much stronger

The "retreat as lateral displacement" claim survives and strengthens. Top ads still curve laterally (arc ratio 1.55, lateral 62 px) — that's stable across attribution methods. What changes is the contrast: organic retreats are now revealed to be much more linear (lateral disp 33 → 11 px) than previously thought. The Mann-Whitney p-value tightens 12 orders of magnitude (1e-5 → 4e-17).

Implication for AR / methods paper: the brand claim that "top ads impose lateral retreat arcs" is more defensible under bbox attribution, not less. The 3.5× more retreat-arc data also enables sharper per-(direction × etype) splits for the forward/regressive analysis NB24 produces.

NB15 cursor-approach-features producer migrated

scripts/compute_cursor_approach_features.py extracted from NB15 cell 4 with --attribution {absolute,organic}. Output:

  • AdSERP/data/cursor-approach-features.json — legacy absolute (13,419 records, 2,339 trials)
  • AdSERP/data/cursor-approach-features-organic.json — bbox attribution (14,760 records, 2,701 trials)

Coverage increases under organic (more trials have valid AOIs because the producer's extract_serp_results-or-fallback no longer rejects trials where h3 enumeration returns null). Per-position record counts:

Position Absolute Organic
0 2,320 2,658
1 2,244 2,360
2 2,091 1,985
7 719 936
8 481 748
9 192 401

Approach share (min_dist < 100): 28.2% → 23.3% under organic. The drop reflects cleaner per-AOI distance calculations (no spurious "approaches" to gap regions previously included as positions).

This unblocks NB20 (approach by element), NB21 (click prediction LOSO), NB22 numerical recompute, NB24 (retreat arc geometry), NB28 (viewport bands) — all consume cursor-approach-features.json directly. To rerun those under organic, point them at cursor-approach-features-organic.json (one-line change in cell 1 of each).

What's measured vs. what's pending

Side-by-side K-ID reports complete for 6 notebooks:

Notebook Method Status Headline shift
NB14 Butterworth Producer rerun + comparison ✓ done Monotone-decline (K3) survives but weaker; perfect-steep (K10) and plateau-direction (K11) lose significance; clicked>nonclicked (K6) and dichotomy (K9) strengthen
NB18a RIPA2 Producer rerun + comparison ✓ done ρ stays ns under both
NB23 rank effects Per-trial recompute ✓ done ρ tightens, ski jump returns at rank 8
NB22 four-class Per-trial recompute ✓ done 99.4% of trials shift; 411 click reattributions
NB04 fixation coverage Per-trial recompute ✓ done K13 FV pos-0 budget 45% → 68%
NB25 SERP composition Counts comparison ✓ done Modal organic count 10 → 9

Pending — heavier regeneration cost:

Notebook Why heavier
NB21 click prediction Consumes cursor-approach-features.json; needs NB15 producer regenerated under organic AOIs
NB28 viewport bands Same — depends on cursor-approach-features + regression_labels_cache
NB24 retreat arc geometry Same upstream dependency
NB20 approach by element Same
NB15 cursor approach itself The producer; rerunning it cascades to NB20/21/22/24/28

These all share one upstream artifact (AdSERP/data/cursor-approach-features.json), so regenerating it once under organic attribution unblocks the whole tier. Estimated cost: 1–2 hr to migrate NB15's per-AOI loops + re-run.

Probably unaffected (per code triage, no per-position attribution code):

  • NB05 LHIPA — per-trial pupillometric index
  • NB07a regressions prevalence — count-based
  • NB09 difficulty — token-based difficulty measures
  • NB13 survey phase — saccade-amplitude phase classifier
  • NB17 scroll retreat — scroll-event based, AOI-independent

(These should still be spot-checked, but no expected K-ID shifts from AOI cascade.)

What this picture supports for the pupil paper weekend deep-dive

Six notebooks worth of K-ID evidence + the master TL;DRs:

  1. The "monotonic load decline" framing weakens under organic but doesn't fully die. K3 (ρ over positions 0–10) survives at p=0.029 vs absolute's p=4e-5 — significant but the strength halves. K4 (positions 1–10), K10 (steep-phase perfect monotone), and K11 (plateau direction) all lose significance and K11 sign-flips. The cleaner story is dichotomy + decision-locking: K9 steep-vs-plateau still p<10⁻⁸, and K6 (clicked > non-clicked) strengthens to p=2.5e-7.
  2. Top-organic dominance is sharper than reported. Click share at rank 0: 18.8% → 44.9%. Position-0 fixation budget for FV clickers: 45% → 68%. Spurious "rank-2 peak" was ad-displacement.
  3. Ski jump returns at rank 8 under bbox — terminal-click effect at end-of-first-viewport, masked under absolute attribution.
  4. AR demos cannot ship as-is. 99.4% of trials have shifted four-class labels, 411 click reattributions. Curation captions are stale until rebuild.

Status / next moves

  • pupil paper deep-dive scheduled for the weekend (May 2–3, deadline May 15).
  • Notebook code migrations (NB14, NB18a, NB23, NB04, NB22 cell rewrites) deferred to after the deep-dive — paper framing decision drives which K-IDs are primary.
  • Approach-retreat replay-bundle rebuild also deferred until NB22 four-class taxonomy is regenerated under organic attribution AND curation captions are validated.
  • NB15 producer migration (cursor-approach-features under organic) is the unlock for NB20/21/24/28; estimated 1–2 hr next iteration.

Files

  • notebooks-v2/data_loader.pyload_aois, organic_aoi_bands, organic_aoi_tops
  • scripts/compute_butterworth_lfhf.py--attribution flag
  • scripts/compute_ripa2.py--attribution flag
  • scripts/compare_aoi_consumers.py — full-corpus per-fixation re-attribution audit
  • scripts/compare_nb14_nb18_under_attributions.py — side-by-side K-ID report (NB14 + NB18a)
  • scripts/output/aoi-consumer-cascade/ — generated reports for the weekend deep-dive

2026-04-12 — Fixation-side coordinate-space audit (symmetric to 2026-04-09)

The bug

notebooks-v2/data_loader.py documented Gazepoint FPOGY as screen-space (viewport pixels, 0..scr_h) and provided helpers (assign_fixation_to_position, gaze_cursor_distance) that added scroll offset internally. Per the AdSERP README (https://github.com/kayhan-latifzadeh/AdSERP), FPOGY is actually page-space — "relative to the top-left corner of the screenshot in pixels." The JS adserp-importer.js in Scrutinizer had the correct interpretation all along. The Python loader was wrong.

Empirical verification on 20 scrolled trials: Pearson r(FPOGY, scrollY) ≈ 0.95; (FPOGY − scrollY) falls inside the viewport [0, scr_h] for 98%+ of fixations; max FPOGY exceeds scr_h on scrolled trials (e.g. 2,143 on a 1,024-tall screen). This is definitionally impossible under the screen-space interpretation.

The result: every fix.y + scroll_y in Python code was double-counting scroll. This is the symmetric bug to the 2026-04-09 cursor-side audit (which fixed the opposite sign error: adding scroll to values that were already page-space). Both bugs originated from the same miscommunication about coordinate conventions across the two language pipelines.

Discovery

Found while validating the 31 canonical gazeplot trials against the authors' own full-page screenshots (Zenodo record 15236546, downloaded 2026-04-12). Empirically falsifiable: fixations plotted in page-space coordinates on the raw screenshots landed on meaningful content; the same fixations under the buggy fix.y + scroll interpretation landed off-page (past doc_h) on 842 of 2,776 trials.

Fix

  • notebooks-v2/data_loader.py — module docstring rewritten (FPOGY is page-space, full audit history); load_fixations drops clamp_y; assign_fixation_to_position(page_y, tops, n_results) new 3-arg signature; gaze_cursor_distance scroll-free (both inputs page-space); screen_y_to_page_y / page_y_to_screen_y renamed to viewport_y_to_page_y / page_y_to_viewport_y; classify_fixations no longer clamps or adds scroll.
  • notebooks-v2/test_coordinate_invariants.py — rewritten. New invariants enforce the correct page-space contract and fail loudly on any regression. Corpus-wide check: 234,333 / 234,339 fixations (100.00%) land within page bounds; 2,889 / 2,889 clicks (100.00%); 842 / 2,776 trials would have overflowed doc_h under the old fix.y + scroll formula.
  • Notebook callers fixed: NB07c, NB12, NB15, NB18_learning_curve, NB19, NB22 — removed + scroll patterns and updated assign_fixation_to_position calls to the new signature. 17 substitutions across 5 notebooks plus the NB07c / NB12 structural rewrites.
  • Script callers fixed: compute_butterworth_lfhf.py, compute_ripa2.py, compute_encoding_vs_retrieval.py, generate_explainer_heatmaps.py.
  • Downstream propagation: pupil-lfhf/validation/adserp_loader.py forked loader patched in-place on main (not feat/ripa2-comparison, which Duchowski requested be kept isolated). compute_butterworth_lfhf.py and compute_ripa2.py in pupil-lfhf updated and re-run. validation/README.md and CLAUDE.md noted.
  • approach-retreat: README.md, docs/one-pager.md, docs/references/arapakis-leiva-2016.md, CLAUDE.md, and the 2026-04-10 historical memo (appended with a 2026-04-12 update section) — updated to post-fix values.

NB14 — the headline moved dramatically

K Pre-fix (2026-04-10) Post-fix (2026-04-12)
K1 Trials with usable LF/HF 2,719 2,719 (unchanged)
K2 Position-segments 6,874 6,112
K3 Pos × median LF/HF ρ = −0.618, p = 0.0426 (borderline) ρ = −0.927, p < 0.0001
K4 Positions 1–10 only ρ = −0.491, p = 0.150 (ns) ρ = −0.903, p = 0.0003 (sig)
K5 Within-trial (≥3 pos) N=1,167, mean ρ = −0.105, 56.6% neg N=1,025, mean ρ = −0.152, median ρ = −0.400, 61.0% neg
K6 Clicked vs non-clicked LF/HF 22.24 (N=1,110) vs 19.01 (N=5,472); p = 1.30 × 10⁻⁴ 22.40 (N=1,463) vs 19.27 (N=4,636); p < 10⁻⁸
K9 Steep vs plateau (MW raw) p = 4.1 × 10⁻²² p = 3.2 × 10⁻²³
K10 Steep phase Spearman (pos 0–3) ρ = −1.000 (perfect monotone)
K11 Plateau phase (pos 4–10) ρ = −0.393, p = 0.383 (ns) ρ = −0.714, p = 0.071 (marginal)

The 2026-04-10 note in NB14_BODY that said "K3 unchanged by the 2026-04-09 audit because it uses fixation position, not click_pos" is now superseded — the fixation-side bug does touch K3. The claim was right for the 2026-04-09 scope, wrong for the 2026-04-12 scope.

NB21 — M3 LOSO AUC moved too

K Pre-fix (2026-04-10) Post-fix (2026-04-12)
K1 Records / participants / click rate 15,397 / 47 / 14.4% (2,214 clicks) 13,419 / 47 / 16.6% (2,228 clicks)
K3 M3 LOSO AUC 0.792 ± 0.062 0.859 ± 0.044 (+0.067)
K4 M4 (approach only) AUC 0.792 ± 0.061 0.861 ± 0.043
K7 M3 LOSO AP 0.491 0.611 (+0.120)
K9 Per-participant LOSO M3 AUC median 0.798, IQR [0.759, 0.831], min 0.589 median 0.860, IQR [0.827, 0.901], min 0.745
K12 Brier score 0.1781 0.1526
K15 Evaluated-rejected (classifier) 344 (2.2%) 974 (7.3%)
K21 position coefficient −0.380 −0.130 (same direction, weaker)
K27 direction_changes coefficient −0.005 +0.061

NB22 — four-class taxonomy strengthened

K Pre-fix Post-fix
K2 Deferred N 1,178 (7.7%) 1,916 (14.3%)
K3 Evaluated-rejected N 278 (1.8%) 439 (3.3%)
K4 Not approached N 11,727 (76.2%) 8,836 (65.8%)
K5 Retreat distance (def vs rej) 191.3 vs 96.4 px, p = 1.9 × 10⁻¹¹ 234.5 vs 90.8 px, p = 1.76 × 10⁻³⁸
K6 Gaze dwell (def vs rej) 3,842 vs 2,018 ms, p = 3.7 × 10⁻²⁶ 4,137 vs 1,612 ms, p = 9.76 × 10⁻⁷⁰
K11 M3 LOSO AUC 0.792 ± 0.062 0.859 ± 0.044

NB18 — RIPA2 vs LF/HF

K Pre-fix Post-fix
K5 LF/HF × position ρ = −0.618 ρ = −0.927
K6 RIPA2 × position ρ = −0.827 ρ = −0.909
K15 Will-regress one-sided p 0.0022 0.0106 (weaker but still sig)
K16 First-pass dwell p 4.1 × 10⁻²⁴ 8.1 × 10⁻³² (stronger)

Headline interpretation

No sign flips. No direction changes. Every effect got stronger. The pre-fix scroll double-count was injecting noise in the position direction, masking the true signal. The framework-compilation story (steep decline early, plateau later) is preserved and sharpened. The deferred-vs-rejected motor signature dissociation (the methods paper's central empirical claim) is now on dramatically firmer statistical ground (retreat-distance p went from 10⁻¹¹ to 10⁻³⁸; gaze-dwell p from 10⁻²⁶ to 10⁻⁷⁰).

K3 moving from ρ = −0.618, p = 0.0426 (borderline at α = 0.05) to ρ = −0.927, p < 0.0001 is the biggest single win. K4 (positions 1–10 only) flipping from non-significant (ρ = −0.491, p = 0.150) to highly significant (ρ = −0.903, p = 0.0003) is the second biggest — pre-fix the 1–10 subset could not be cited; post-fix it's a robust effect.

Propagation

All Key Claims blocks regenerated via notebooks-v2/update_key_claims.py (VERIFIED date bumped to 2026-04-12). docs/findings.md and docs/findings-approach-retreat.md refreshed. methods paper draft (docs/drafts/paper-output/paper.md) and model-analysis sidecar refreshed. Task-model paper, OSEC explainer, Duchowski correspondence drafts, publication roadmap, pupil paper brief, priming null result doc, Shi 2025 lit note — all substantive stale values replaced. Cross-repo: attentional-foraging / pupil-lfhf / approach-retreat are consistent. science-agent notebook-audit across all three repos returns zero substantive hits (remaining warnings are false positives on AttCur-dataset tables and historical audit memos).

Where the pre-fix state is preserved

Snapshot at docs/drafts/coord_fix_snapshot_20260412/:

  • key_claims_before.json — extracted K-ID tables as of 2026-04-12 08:51 (just before the re-runs)
  • butterworth-lfhf-by-position.json, ripa2-by-position.json, cursor-approach-features.json, cursor-approach-features-typed.json, encoding-vs-retrieval.json — pre-fix copies
  • notebook-key-claims.md — pre-fix aggregate
  • post_fix_stdout_{nb}.txt — dumped cell outputs per notebook
  • cell_output_diff.md — naive diff (superseded by this entry)
  • git_state.txt — HEAD sha + working tree at snapshot time

Historical audit memos that describe the 2026-04-09 state (attentional-foraging/CHANGELOG.md entry below, approach-retreat/docs/drafts/2026-04-10-coord-audit-update.md, docs/findings-approach-retreat.md "Refreshed" banner) have been left in place as the historical record of what was known at those dates — the 2026-04-12 entries build on them rather than replacing them.

Unreleased — 2026-04-10

pupil paper infrastructure

  • Key Claims expanded to 11 notebooks (~145 canonical rows). New: NB05 (LHIPA, K1–K15), NB12 (regression precision null, K1–K14), NB18 (RIPA2 vs LF/HF, K1–K17).
  • NB14 piecewise gradient analysis (K9–K15). Resolves K3's borderline p = 0.043:
    • Steep phase (pos 0–3): Mann–Whitney p = 4.1 × 10⁻²², medians 30.0 → 16.0
    • Plateau phase (pos 4–10): Spearman ns — flat, as predicted by framework compilation
    • Within-trial gradient strengthens with evaluation depth: 79.1% negative at ≥7 positions (K15)
  • findings.md v11: corrected 8 stale values (NB13, NB11, NB14), added Key Claims [NB__:K__] references throughout.
  • NB14:K5 inclusion criterion documented: ≥3 valid LF/HF segments at positions 0–10 (Spearman with N=2 is degenerate).
  • pupil-lfhf validation pipeline: self-contained AdSERP analysis (adserp_loader.py, validate_adserp.py) with coordinate-audited click_pos. All values match Key Claims exactly.

2026-04-09

Coordinate-space audit: scroll double-counting bug in click position

The bug. scripts/compute_butterworth_lfhf.py:147 and scripts/compute_ripa2.py:193 derived each trial's click_pos by calling assign_fixation_to_position(last_click[2], click_scroll, …). That function is designed for gaze — it adds scroll_y to convert screen-space FPOGY into page-space. But clicks[-1][2] comes from evtrack ypos, which is already page-space (verified empirically: p004-b2-t3 has cursor Y up to 1,902 px while the browser window is only 1,137 px tall). Adding scroll double-counted it, pushing clicks on scrolled trials to deeper bands than the user actually clicked.

The same pattern was cargo-culted into nine other notebooks (NB01, NB03, NB05, NB06, NB07b, NB10, NB12, NB15, NB18-learning_curve, NB23, NB24) and one additional script (forward_regressive_tolerance_sweep.py). The root cause is that half the notebooks reimplement their own mini-loader in cell 2 instead of importing data_loader.py, each with its own implicit coordinate-space assumption.

Impact, corpus-wide (see notebooks-v2/test_coordinate_invariants.py Invariant 9):

Correct formula Buggy formula
Clicks landing in their reported band 2,764 / 2,764 1,174 / 2,764 (57.5 % mis-placed)
Mis-placed clicks on scrolled trials 0 1,590 / 2,266
No-scroll trials (sanity bar) 0 disagreements

The buggy formula also produced physically impossible click_pos values (up to 15, for 10-result SERPs) in 239 trials of the old butterworth-lfhf-by-position.json.

NB14 Key Claims — before / after the fix:

Claim Before After Notes
K1 (trials) 2,719 2,719
K2 (position segments) 6,874 6,874
K3 (position × median LF/HF) ρ = −0.618, p = 0.0426 ρ = −0.618, p = 0.0426 Exact — uses fixation position, not click_pos
K4 (positions 1–10) ρ = −0.491, p = 0.150 ρ = −0.491, p = 0.150
K5 (within-trial) N = 1,167, median ρ = −0.200 N = 1,167, median ρ = −0.200
K6 (clicked vs non-clicked LF/HF) 22.86 (N = 1,145) vs 18.97 (N = 5,437); p ≈ 0 22.24 (N = 1,110) vs 19.01 (N = 5,472); U = 3,257,823, p = 1.30 × 10⁻⁴ Direction and significance preserved
K7 (LF/HF × LHIPA) ρ = −0.122, p = 9.29 × 10⁻¹⁰, N = 2,492 unchanged
K8 (position medians) pos 0: 29.98 → pos 1: 21.20 → … unchanged (uses fixation position, not click_pos)

The pupil paper central claim (K3) is unaffected. The position-level correlation, within-trial decomposition, and LHIPA cross-index validation all use fixation position (gaze → page-space, which is the coordinate-correct direction). Only click_pos-dependent rows moved.

The fix.

  1. notebooks-v2/data_loader.py — documented coordinate-space conventions in the module docstring, tightened assign_fixation_to_position to name its parameter screen_fix_y and warn that cursor/click Ys must not be passed. Added canonical helpers: get_click_page_xy, click_to_position, cursor_to_position, screen_y_to_page_y, page_y_to_screen_y, gaze_cursor_distance, interpolate_cursor_at.
  2. notebooks-v2/test_coordinate_invariants.py — nine-section regression test locking in the conventions. Corpus-wide Invariant 9 produces the 1,590-trial headline number above.
  3. scripts/compute_butterworth_lfhf.py — replaced the buggy assign_fixation_to_position call with click_to_position(clicks, tops, n_results). Regenerated butterworth-lfhf-by-position.json.
  4. notebooks-v2/update_key_claims.py — NB14 K6 row updated; aggregate docs/notebook-key-claims.md refreshed.

NB15 cursor-approach fix — the feature-generating hero notebook. Two bug sites: (1) compute_approach_features double-counted scroll on mouse_page_y, corrupting min_dist, mean_dist, final_dist, dwell_in_proximity_ms, and was_clicked; (2) click_y_page = clicks[0][2] + click_scroll corrupted click-position assignment. Fix: import click_to_position and gaze_cursor_distance from data_loader, replace both sites. Regenerated cursor-approach-features.json via jupyter nbconvert --execute; regenerated cursor-approach-features-typed.json via scripts/add_etype_to_features.py. Pre-fix JSONs preserved with .prefix-bug.json suffix.

Feature-level diff (NB15):

Metric Before After Δ
Clicked records 1,981 2,214 +233 (+11.8 %) — clicks correctly re-attributed to their real positions
Click rate 12.87 % 14.38 % +1.5 pp
Median gaze-cursor distance 256.5 px 354.7 px +98 px
"Almost clicked" (<58 px, non-clicked) 7.98 % 5.57 % −30 %
Position 3 close-distance rate 11.49 % 3.23 % −72 %
Position 5 close-distance rate 7.36 % 0.28 % −96 %
Position 9 close-distance rate 0.45 % 0.00 % −100 %

NB15 §2b's orient-phase observation is preserved at position 0 (27.8 % → 29.0 %, essentially flat — consistent with cursor parked near first result during orient). The deep-position approach signal at positions 3–9 was almost entirely scroll-bug artifact.

NB21 Key Claims — before / after:

Claim Before After Notes
K1 click rate 12.9 % (1,981) 14.4 % (2,214) 233 re-attributed clicks
K3 M3 LOSO AUC 0.827 ± 0.047 0.792 ± 0.062 −0.035; direction preserved
K4 M4 (approach only) AUC 0.821 ± 0.048 0.792 ± 0.061 M3 = M4 to three sig figs — position+dwell add no information beyond approach features
K5 M2 (pos+dwell) AUC 0.746 ± 0.069 0.707 ± 0.081 −0.039
K6 M1 (pos only) AUC 0.592 ± 0.083 0.670 ± 0.085 +0.078 — position now a stronger predictor with clicks correctly attributed
K12 Brier score 0.1615 0.1781 calibration slightly worse (consistent with dropped AUC)
K15 Evaluated-rejected (4-class) 994 (6.5 %) 344 (2.2 %) largest shift — pre-fix "rejected" was mostly scroll noise at deep positions
K21 position coefficient +0.21 (→ click) −0.380 (→ skip) SIGN FLIP — rank effect now in the correct direction
K27 direction_changes +0.20 (→ click) ≈0 (neutral) feature was largely scroll artifact

The −0.035 AUC drop is a real loss of predictive power — the pre-fix 0.827 was partly driven by scroll-leak features. K27 (direction_changes, pre-fix +0.20 → click) collapses to ≈0 post-fix, and the deep-position approach artifacts that populated the 4-class "Evaluated-rejected" set (994 → 344) were not informative in the first place.

Model-level results are preserved: M3 > M2 > M1 (0.792 > 0.707 > 0.670), M3 = M4 to three sig figs (approach features carry the full signal), and all 47 participants remain above chance (min 0.589). Feature-level coefficient signs are NOT all preserved: K21 (position) flipped +0.21 → −0.380 — the post-fix sign is the one the SERP rank-effect literature predicts. The pre-fix "11×" lift claim and the "14 % almost clicked" figure in docs/findings.md §10 are overstatements; the corrected taxonomy lives in NB21:K13–K16.

NB11.5 (chattiness) — replication updated:

Claim Before After Notes
K9 Low events/s tercile AUC 0.826 ± 0.061 (n = 15) 0.803 ± 0.052 (n = 15) median events/s: 9.4 → 9.5
K10 Mid tercile AUC 0.817 ± 0.041 (n = 16) 0.780 ± 0.065 (n = 16) median events/s: 14.7 → 14.7
K11 High tercile AUC 0.838 ± 0.034 (n = 16) 0.793 ± 0.064 (n = 16) median events/s: 32.2 → 28.8
K12 pooled replication of NB21 0.827 0.792 tracks NB21:K3 exactly
K13–K16 chattiness × AUC Spearmans +0.04 to +0.14, all ns −0.11 to +0.00, all ns direction shifted toward zero; no row crosses significance

The "robust across chattiness terciles" framing holds at the significance level (K13–K16 are all still ns with p > 0.4) but the tercile AUCs themselves dropped 0.02–0.05 uniformly with the NB21 re-run. Paper §4.3 robustness claim needs both the new tercile values AND a narrower effect-size range if the prose described it as "flat."

Remaining notebooks and scripts patched:

NB01, 03 (×2 sites), 05, 06, 07b, 10, 12, 18-learning_curve, 24 — batch-patched via notebooks-v2/_apply_coord_fixes.py. None of these have Key Claims blocks yet, so re-execution has not been triggered; they will pick up the fix on next run. scripts/compute_ripa2.py and scripts/forward_regressive_tolerance_sweep.py also patched; their JSON outputs will be refreshed the next time they run.

NB23 (rank_effects) is a separate case: its local click_positions derivation (used for panel 1, click share by position) has been patched in place, but the notebook has not been re-executed. Panels 4–5 (butterworth LF/HF + LHIPA by click position) already consume the fixed butterworth-lfhf-by-position.json, so they reflect the post-fix click_pos from that feeder. NB23 does not yet have a Key Claims block even though it's the rank-effects hero chart cited in README and CHANGELOG v9 — promoting it to Tier A is tracked separately.

Still pending:

  • Regenerate ripa2 output Done (2026-04-11): compute_ripa2.py -o AdSERP/data/ripa2-by-position.json (2,719 trials, ρ = −0.827 positional gradient confirmed. (NB18 re-execution deferred) it reads this JSON, will pick up new values on next run.)
  • Re-execute NB23 Done (2026-04-09): NB23 uses click_to_position() from data_loader (coordinate-safe); all 9 code cells executed with correct output. K1 = ρ = −0.973 on 2,764 trials.
  • Phase 3 structural migration Done (2026-04-11): All dangerous coordinate patterns eliminated. NB00, NB04, NB19 were the last three with inline assign_fixation_to_position(click_y, scroll_y, ...) or click_page_y = cy + interpolate_scroll(...). Replaced with click_to_position(clicks, tops, n_res). Zero dangerous patterns remain across all 30 notebooks (verified via regex scan).
  • docs/findings.md Already current (v11, 2026-04-10): §10 and §10b updated with post-fix values (14.4% click rate, N = 344, correct NB22 four-class Ns, [NB##:K##] refs throughout). docs/findings-approach-retreat.md intentionally frozen with SUPERSEDED banner — it's a journey doc, not canonical.
  • docs/drafts/ grep pass Done (2026-04-11): model-analysis.html given SUPERSEDED banner with before/after table. model-analysis.md line 270 fixed (0.821→0.792). task-model-paper.md line 179 fixed (994→344). paper.md references to 0.821 are all Bruckner ACD (correct, different dataset). Remaining stale values in .html left under the SUPERSEDED banner rather than surgically edited.
  • Approach-retreat repo Done (2026-04-11): README fixed (NB24 arc ratios, 17× typo, discrimination cost values). CLAUDE.md added documenting upstream dependency. See approach-retreat commit 63d861a and 257cd79.

Reference data: pre-fix JSONs preserved for reproducibility:

  • AdSERP/data/butterworth-lfhf-by-position.prefix-bug.json
  • AdSERP/data/cursor-approach-features.prefix-bug.json
  • AdSERP/data/cursor-approach-features-typed.prefix-bug.json

Regression lock. notebooks-v2/test_coordinate_invariants.py (nine sections, passes in a few seconds) now encodes the gaze-is-screen-space, cursor-is-page-space convention as an executable contract. Any future change to data_loader.py, any Tier B producer script, or any Tier A notebook's data path must keep this test green. The corpus-wide Invariant 9 is the headline: all 2,764 clicks must fall within their reported band under the correct formula, and the buggy formula must still misplace 1,590 scrolled trials (so we know the test hasn't silently lost its reference comparison).

v9 — 2026-04-07

LHIPA reinterpretation: boundary step, not position gradient

Trial-level LHIPA by click position is flat across positions 0–8 (range: 0.0385–0.0392, delta = 0.0008), then steps down at positions 9–10 (0.0376–0.0380). The previously reported ρ = −0.87 is driven almost entirely by the boundary step, not a gradual decline. Excluding positions 9–10: ρ = −0.78 but delta is within noise.

Correction: Prior claims that "LHIPA decreases monotonically with foraging depth" (README §Behavioral signals, findings.md, lit-review-scroll-regressions.md) overstated the position effect. LHIPA tracks the boundary decision cost (the same phenomenon as the ski-jump click distribution uptick) not a per-position scanning cost. Butterworth LF/HF (NB 14) remains the valid per-position cognitive load measure, and it shows framework compilation (steep drop 0–3, plateau after).

Unified rank effects notebook (NB 23)

New notebook 23_rank_effects.ipynb consolidates all by-position effects:

  • Click share, fixation count, dwell time, Butterworth LF/HF, LHIPA — all on shared x-axis
  • Forward-pass vs regression dwell decomposition (stacked bar): regression share peaks at positions 2–3 (~30%), drops to ~10% at position 9
  • Normalized dissociation plot: time and cognitive load both decline, but load drops faster (framework compilation)
  • Publication-quality hero chart with IQR bands

New files: notebooks-v2/23_rank_effects.ipynb, assets/rank-effects-dissociation.png, assets/temporal-spectrum.png

Updated: README.md (temporal spectrum graphic, rank effects hero chart, LHIPA reframing), notebooks-v2/README.md (NB 23 entry)

Methodological patterns identified (science audit)

Three systemic issues affecting how results were reported throughout the project:

1. Position-aggregate correlations reported as if trial-level. The three headline rhos (LHIPA ρ = −0.903, Butterworth ρ = −0.618, forward dwell ratio ρ = +0.82) are all computed on N = 9–11 position-level aggregates (means or medians), not individual trials. Citing "N = 2,719 trials" alongside a correlation computed on 11 points creates a false impression of statistical power. Trial-level correlations are much weaker (e.g., LHIPA ρ = −0.088). Every position-aggregate statistic now states the actual N of the aggregation.

2. Survivor bias in per-position analyses. Not all trials reach every position (pos 0: 2,742; pos 9: 640). Position means at later positions come from self-selected thorough scanners who scrolled the full page. This inflates apparent dwell at later positions and may bias Butterworth LF/HF medians. Added to methodological-threats.md. This also connects to the F-pattern: Nielsen's aggregate heatmap conflates compiled criteria (real), survey-phase concentration at top (real), and survivor selection (artifact).

3. Mean vs median LHIPA sensitivity. The LHIPA "gradient" by click position appears in means (right-skewed distribution pulls the mean up at early positions) but disappears in medians (flat 0–8). The gradient in the mean is partly a confound: high-LHIPA (low-load) trials tend to be easy trials where the user clicked early. The median is the robust estimator and reveals the boundary-step pattern.

Corrected notebooks: NB05 (LHIPA: figure title, summary, key measures table), NB06 (orientation/evaluation: "Working Memory Accumulation" → "Evaluation Effort by Position," removed WM ramp narrative, corrected LHIPA claims, "dwell" → "gaze dwell ratio").

v8 — 2026-04-04

Per-position cognitive load: working memory hypothesis reversed

Duchowski (2026, PACM CGIT) recommended Butterworth IIR over wavelet LHIPA for short-window cognitive load. Minimum windows: FFT 10s, DWT 7.5s, Butterworth 1s. Implemented per-position LF/HF ratio for all 2,719 trials.

The working memory hypothesis was wrong. LF/HF decreases with position (ρ = −0.618, p = 0.04). Cognitive load peaks at position 0, drops steeply through 0–3, plateaus through 4–10. This contradicts the §3a interpretation that forward-only dwell increase (ρ = +0.82) reflects growing working memory load.

Correction: The prior interpretation in §3a ("cognitive load increases with foraging depth because the candidate set in working memory grows") has been revised. The dissociation between increasing dwell time and decreasing cognitive effort indicates evaluation becomes routinized through framework compilation, not overloaded through working memory accumulation. The Shi et al. (2025) lit note connection claiming per-result LHIPA showed increasing load was also corrected — wavelet LHIPA at ~2s granularity was below Duchowski's stated 7.5s minimum, making that trend unreliable.

New files: scripts/compute_butterworth_lfhf.py, notebooks-v2/14_butterworth_cognitive_load.ipynb, docs/lit-notes/duchowski2026-realtime-pupil-lfhf.md, AdSERP/data/butterworth-lfhf-by-position.json

Updated: data_loader.py (added load_pupil_trial(), remove_blinks()), references.bib (added duchowski2026realtime), findings.md (§3b-iv, §3a correction), README.md (notebook 14, key insight)

Thumbnail screenshot fix

build-gh-pages.js PNG screenshot loop crashed without error handling, producing only 2 of 10 thumbnails. Added try/catch.

v7 — 2026-04-03

(See git log for v7 changes — survey phase, ski-jump decomposition, forward/regression split, README rewrite, arxiv stub)

v6 — 2026-04-02

Semantic embeddings tested and null

Sentence-level cosine similarity (mxbai-embed-large) between each result's snippet embedding and the centroid of all prior result embeddings. Null within-position — same as bag-of-words. The priming hypothesis is now tested at three granularities (bag-of-words, semantic embeddings, within-position controls) and null at all of them.

§9: Where relaxing the serial evaluation assumption helps

New findings section analyzing when non-serial SERP models add value. Acknowledges forced-choice inflation of regression rates, notes that at-scale regression prevalence (click_rank < max_scroll_depth) is unmeasured. Identifies three areas where complexity helps: position bias estimation, stop/regress/paginate decision, re-finding task metrics.

Orientation time: 194ms median

Page orientation (time from page load to first fixation on any result) is 194ms median across all groups — consistent with a well-memorized SERP layout. Previously reported as ~1-3s from a regression intercept (a different metric).

Lit review and references

11 new bibtex entries. Literature review on scroll regressions identifies 5 novelty claims. Key finding from the review: nobody has published the at-scale prevalence of click_rank < max_scroll_depth despite every search engine having this data.

v5 — 2026-04-02

Bug fix: FPOGY out-of-bounds clamp

The Gazepoint GP3 HD reports gaze Y coordinates that exceed screen boundaries. 24.5% of fixations have FPOGY > screen_height (1024px); the 95th percentile is 1830px. These out-of-bounds samples were added to scroll offset to compute page-space Y, attributing fixations to SERP positions below the visible viewport.

Impact: Position 9 dwell ratios were inflated by 3-50x per trial (mean 2.9×, 89% of trials >1.0). The aggregate dwell ratio for position 9 was 1.25 — now corrected to 0.79.

Fix: Clamp FPOGY to [0, screen_height] before computing page_y = fy + scroll_offset in compute_fixation_per_result(). Applied in serp_priming.ipynb (Cells 13, 16) and fixation_coverage.ipynb (Cell 3).

Note for AdSERP users: If you are working with the AdSERP fixation data and mapping gaze coordinates to page-space positions, always clamp or filter FPOGY to screen bounds first. The eye tracker does not constrain gaze reports to the application window.

Other v5 changes:

  • Forward-only shape test ρ strengthened from +0.73 to +0.82 (positions 0-8)
  • Dwell table in README and findings updated with corrected values

New: Scroll kinematics analysis (scroll_kinematics.ipynb)

Tests the viewport mechanics confound hypothesis: does ballistic backward scrolling explain the apparent "priming during regressions" pattern?

Results:

  • Backward scroll velocity > forward: median 915 vs 784 px/s, peak 1852 vs 1111 px/s
  • Velocity profile is ballistic: ρ = 0.867 between distance-from-target and velocity
  • 87.3% of regression targets are positions 0-4 (median: position 2)
  • Regression velocity mediates the dwell delta: ρ = -0.762 (p = 0.017) across positions

Positions 6-8 are ballistic transit zones (high velocity, short viewport, suppressed fixations). The "priming during regressions" pattern is a viewport mechanics artifact.

Prose cleanup: unsupported priming claims

Corrected language in README.md, TODO.md, findings.md, and adserp-key-claims.md that framed the regression-trial overlap correlation (r = -0.033) as evidence that "priming operates in re-evaluation." The signal is triply confounded:

  1. Position-overlap covariation — within-position controls null (v3)
  2. Repetition/recognition — revisiting already-read content produces shorter dwell (v4)
  3. Ballistic scroll kinematics — high-velocity transit biases viewport time and fixation count (v5)

v4 — 2026-04-01

Bug fix: viewport time computation

The prior compute_viewport_time only counted time between scroll events. Pre-scroll periods (page load → first scroll) and post-scroll periods were dropped. Position 0 dwell ratios were >1.0 (up to 73×). Fixed by covering the full trial window. Position 0 dwell ratio corrected from 1.35 → 0.28.

New: forward-only shape test

Isolating forward-scanning periods, gaze dwell ratio increases with position (ρ = +0.73), opposite the priming prediction. The aggregate priming correlation was entirely driven by regression artifacts.

New: p(fixate | visible) analysis

Forward-only p(fixate) is ~99.8% at every position. Users fixate virtually everything during first-pass scanning. No skip decision for overlap to predict.

Metric rename

"Eval rate" / "attention density" → "gaze dwell ratio" (fixation duration / visible duration).


v3 — 2026-04-01

Within-position controls

Testing high-overlap vs low-overlap at the same rank: null across all metrics (TFT, TFC, mean fixation duration, viewport time). The aggregate priming correlation (r = -0.054) was driven by the position-overlap confound.


v2 — 2026-04-01

Regression-stratified analysis

Aggregate effect concentrated in regression trials (r = -0.033), null in first-pass (r = -0.002). Initially reframed as "priming facilitates re-evaluation" — later shown to be confounded (v3-v5).


v1 — 2026-04-01

Initial analysis

  • Lexical overlap builds rapidly down the SERP (62% by position 9)
  • Aggregate priming correlation: partial r = -0.054 (p = 2.4×10⁻⁹)
  • 69% scroll regression prevalence, mean 2.8 per trial
  • Mouse-gaze convergence depends on click intent
  • Viewport state predicts clicks better than distance (AUC 0.704 vs 0.548)
  • Per-participant variance large (acquisition onset SD = 2.5s)