Branch: bbox-y-coverage-fix. Adds a fifth AOI attribution flavor (typed_gapfill) as a pragmatic post-processing modifier on the typed cascade.
A 2026-05-05 audit during AllSERP descriptive work surfaced a 22.7 % silent contamination of approached & clicked records (391 / 1,723) under typed. The legacy data_loader.click_to_position does Y-band-only assignment with no X check, rolling right-rail dd_right ad clicks (67), page chrome clicks, and inter-result-gap clicks into adjacent organics. The hypothesis that bboxes had a Y-pixel calibration drift was tested and refuted (clicks bias downward, fixations bias upward — opposite directions). The fix is a midpoint-split gap-fill on organic bboxes plus X+Y bbox-aware click attribution and an is_main_axis_click() trial-level filter.
Pragmatic, not principled — DOM-anchored bbox extraction is the principled alternative, deferred as future work. Both typed and typed_gapfill flavors stay queryable side-by-side per the cascade rule (CLAUDE.md).
- Producer:
scripts/extract_organic_bboxes.pyadds--flavor organic_gapfill(midpoint-split semantics, organic-only; ads pass through unchanged). New helpersapply_midpoint_splitandassert_no_y_overlap.scripts/apply_gapfill_to_existing.pyprovides a no-screenshot path for environments where the AdSERP screenshot volume isn't mounted. - Typed map + CSV export:
build_typed_aoi_map.py --source organic_gapfill;export_aois_by_trial_id.py --attribution typed_gapfill. New outputs atdata/aoi-typed-gapfill/andscripts/output/adserp_aois_by_trial_id_typed_gapfill.csv. data_loader.pyhelpers (notebooks-v2):load_typed_gapfill_aois,typed_gapfill_aoi_bands,typed_gapfill_aoi_tops,typed_gapfill_aoi_etypes,attribute_click_to_typed_gapfill,is_main_axis_click. X+Y bbox-aware attribution prefers strict containment over tolerance, smallest-area on overlap.- Cursor-approach features:
compute_cursor_approach_features.py --attribution typed_gapfillwritesAdSERP/data/cursor-approach-features-typed-gapfill.json(18,218 records vs legacy 19,774; 231 hard-error trials filtered). - AllSERP descriptives:
scripts/allserp_descriptives.py --flavor typed_gapfillwritesscripts/output/allserp_descriptives_gapfill/. - Audit scripts (cite-ready for AllSERP resource paper):
scripts/audit_unattributed_clicks.py,audit_dd_right.py,audit_cascade_contamination.py,audit_calibration_bias.py. Each carries regime tag and headline number in the docstring.
| legacy | gapfill | Δ | |
|---|---|---|---|
| AllSERP descriptives total clicks attributed | 2,479 | 2,634 | +155 |
| organic fixated % | 52.7 | 55.6 | +2.9 pp |
| paa fixated % | 32.8 | 40.6 | +7.8 pp |
was_clicked=True records |
2,594 | 2,375 | −219 |
| approached & clicked | 1,723 | 1,562 | −161 |
| organic clicked records | 2,021 | 1,886 | −135 |
| native_ad clicked records | 186 | 137 | −49 |
| paa clicked records | 27 | 31 | +4 |
- NB21 LOSO click prediction: M3 AUC 0.871 → 0.856 (Δ = −0.015); position coefficient strengthens; per-etype AUC ordering preserved (dd_top 0.913, organic 0.852, native_ad 0.833). K-bbox-y-1..12 rows in
docs/notebook-key-claims.md. - NB22 four-class taxonomy: full per-etype breakdown under typed_gapfill via
compute_regression_labels.py --attribution typed_gapfill. Class proportions invariant within ±0.3 pp (clicked 13.0 % preserved; deferred 13.0 → 13.3; eval-rejected 2.9 % preserved; not-approached 71.0 → 70.8 %). Honest population shed 219 contaminatedwas_clicked=Truerecords and gained 4 paa records (genuine recovery). - NB30 etype × viewport: LOPO AUC 0.687 → 0.701; per-etype
max_overlap_fracinteraction Δ widens (dd_top −0.108 → −0.163; native_ad −0.236 → −0.288). The "ads need higher viewport overlap to convert to clicks" dissociation strengthens under the cleaner population. - AR replay rebuild:
build_replay_trial.py --flavor typed_gapfillshipped with screenshot fallback. 4 of 6 confirmed-issue trials rebuilt from local cache; remaining 2 + full 147-trial set auto-complete on volume mount.
- NB28 calibration: M4 + vt_bands LOSO AUC = 0.8423 (typed_gapfill) vs 0.842 (legacy absolute). Three-decimal replication — the viewport-band × cursor-retreat discriminator is bbox-attribution-invariant.
viewport_time_calibration.viewport_ms_for_trialextended with optionalbands=parameter;scripts/nb28_typed_gapfill.pyshipped. K-bbox-y-NB28-* rows indocs/notebook-key-claims.md.
- DOM-anchored bbox extraction (the principled alternative) is named as future work and not started — refuted as a wholesale replacement (re-rendering 2022 SERP HTML in 2026 produces 13–45 px layout drift,
docs/plan-demo-fix.md); kept as a future direction for individual element-level geometry rather than full layout.
docs/null-findings/2026-05-05-bbox-y-coverage.md— full writeup with the four-audit synthesis.docs/methodology/attribution-cascade-synthesis.md §1.06— flavor definition.docs/drafts/allserp-data-tables.md— resource paper draft updated with audit citations.
Branch: feat/aoi-pipeline-v3-typed. Extends the prior cascade
(feat/aoi-pipeline-v2, organic + organic_hybrid) with a fourth attribution
flavor (typed) in which every SERP card is labelled by joint HTML +
vision typing. The taxonomy:
organic | dd_top | native_ad | dd_right | top_places | knowledge_panel | paa | image_pack | related_searches | other_widget | unknown_widget | chrome
Pipeline:
- Phase 1:
scripts/extract_html_widget_types.pyparsesAdSERP/data/serps/<tid>.htmlfor all 2,776 trials, identifies card-level DOM units in#rso(descending into "Main results" wrappers when present) and#botstuff(Related Searches), and types each card by heading text → structural markers → data-attrid → class → fallback. Outputsdata/aoi-html-types/<tid>.json. Type distribution: organic 22,530 (81.4 %), related_searches 1,811 (6.5 %), image_pack 1,600 (5.8 %), knowledge_panel 826 (3.0 %), paa 769 (2.8 %), top_places 86 (0.3 %), other_widget 51 (0.2 %). - Phase 2:
scripts/build_typed_aoi_map.pyjoins HTML types to existing CV bbox coordinates fromorganic-boundary-data+ ad bboxes fromad-boundary-data. Walks bboxes in y-order, matches each to ad-overlap (≥30 %) → ad type, otherwise to HTML #rso card in DOM order. Bottom-of-page CV-detected cells with deep position (≥10) and small height (<200 px) are swept tochrome(off-axis, position = -1). Outputsdata/aoi-typed/<tid>.jsonwith[{position, type, x, y, width, height, html_handle, ...}, ...]. Match quality: 90 % of trials have |Δ| ≤ 2 between HTML and bbox card counts; 1.8 % residualunknown_widgetafter chrome sweep (down from 7.1 % pre-sweep).
Per-corpus typed AOI distribution (entries with position ≥ 0): organic
22,530 (53.0 %), native_ad 9,217 (21.8 %), related_searches 1,811 (4.3 %),
image_pack 1,600 (3.8 %), dd_top 1,582 (3.7 %), knowledge_panel 826
(1.9 %), paa 769 (1.8 %), unknown_widget 756 (1.8 %), top_places 86
(0.2 %), other_widget 51 (0.1 %). Off-axis (position = -1): chrome 2,255,
dd_right 861, plus the #botstuff related_searches and #rhs knowledge_panel
counts above.
All 2026-05-03 stress-test findings reproduce nearly identically under
typed. Hybrid values are preserved at scripts/output/<name>_HYBRID_BACKUP/
for direct comparison.
| Finding | organic_hybrid | typed | verdict |
|---|---|---|---|
| Within-item paired LF/HF Δ (return − first), median | +6.31 | +6.44 | ✓ replicates |
| Same — Wilcoxon two-sided p | 5.7×10⁻²³ | 2.5×10⁻²³ | ✓ |
| Participant-level mean-of-means Δ | +10.73 | +10.90 | ✓ |
| Pre-scroll cross-position Spearman ρ (P0–P6) | −0.857 | −0.857 | ✓ identical |
| Pooled steep-vs-plateau MW p | 2.6×10⁻²⁵ | 2.3×10⁻²⁵ | ✓ |
| Within-trial Spearman ρ (≥3 segs), median | −0.400 | −0.400 | ✓ identical |
| Within-trial Spearman, % negative | 62.0 % | 61.8 % | ✓ |
| Cap-10 audit Spearman ρ | −0.689 | −0.733 | ✓ stronger |
| RIPA2 paired Δ, median | +8.05×10⁻⁶ | +8.19×10⁻⁶ | ✓ both null |
| RIPA2 paired p (two-sided) | 0.17 | 0.16 | ✓ |
| Argmax LF/HF → click hit rate | 0.320 | 0.319 | ✓ identical |
| Argmax — chance baseline (1/N) | 0.535 | 0.534 | ✓ |
| First-scroll-vs-gaze: median above-fold coverage | 0.500 | 0.500 | ✓ identical |
| First-scroll-vs-gaze: % reaching last-visible | 10.3 % | 10.3 % | ✓ |
| Knee × mean click position (per-ppt Spearman) | +0.460 | +0.471 | ✓ |
| Knee × P0-fraction of clicks | −0.408 | −0.408 | ✓ identical |
| Knee × click entropy | +0.447 | +0.465 | ✓ |
| Knee × regression rate (NS) | −0.021 | −0.021 | ✓ identical |
| satisficer trial median knee | P2 | P2 | ✓ |
| optimizer trial median knee | P1 | P1 | ✓ |
| satopt × knee MW two-sided p | 0.022 | 0.022 | ✓ identical |
The 5,148 widgets that were previously pooled with organics or filtered out under hybrid are now correctly typed, but the cognitive findings do not shift. This is itself a strong robustness story: the within-item paired return finding, the pre-emptive-scroll behaviour, the rank-value-prior reframe, and the satopt × knee dissociation are all properties of the trial-level cognitive operations, not of widget-vs-organic mis-attribution.
scripts/compute_cursor_approach_features.py --attribution typed→ emitsAdSERP/data/cursor-approach-features-typed.json(19,774 records, 2,774 trials, 9 etypes).scripts/compute_regression_labels.py --attribution typed→ emitsscripts/output/approach_threshold_sensitivity/regression_labels_cache_typed.json(12,600 regressed / 7,174 not_regressed = 63.7 % regression rate; matches hybrid's 63.x %).notebooks-v2/data_loader.pyextended withtyped_aoi_bands,typed_aoi_tops,typed_aoi_etypes,attribute_click_to_typedmirroringorganic_aoi_*and_hybrid_aoi_topsconventions.
lfhf_first_vs_return_paired.py (multi-attribution: typed added),
lfhf_pre_vs_post_scroll.py, ripa2_first_vs_return_paired.py,
lfhf_argmax_predicts_click.py, first_scroll_vs_gaze.py,
knee_vs_click_distribution.py, knee_by_satopt.py (hard-swapped),
knee_by_rank_variant.py (multi-attribution: typed column added),
lfhf_rank_gradient_typed.py (forked from _hybrid.py).
- pupil-lfhf sibling repo's
compute_butterworth_lfhf.pyandcompute_ripa2.pyare NOT yet typed-aware. The stress tests bypass these by computing LF/HF / RIPA2 from raw pupil on the fly usingtyped_aoi_topsfor window assignment, so the headline pupil paper findings do not depend on pre-computed*-by-position-typed.jsonJSONs. Notebook re-execution under typed (NB14, NB18, NB22, NB28, etc.) requires the pre-computed JSONs and is deferred to a future cascade pass. compute_lab_gaze_gated_features.pynot yet ported to typed; pupil paper does not depend on it.
- §3.3 (
ettac-paper/sections/adserp.tex): updated draft isdocs/drafts/ettac-adserp-2026-05-04-v2.4.md. Numbers shift by < 0.05 in correlation strength relative to v2.3 hybrid version; no qualitative changes. The paper does not use the word "typed" or contrast attribution flavors in prose (per "avoid alternate-rank framing" instruction). The Stimuli paragraph mentions display-order ranks across organic, ad, and widget surfaces inline. - Internal OSEC memo (
docs/drafts/rank-value-prior-osec-2026-05-03.md): rank-value-prior axis × verification-appetite axis remain robust under typed.
Stress tests on AdSERP under organic_hybrid attribution, motivated by the §3.3 rewrite, surface several findings that update OSEC framing and sharpen the cognitive interpretation of the LF/HF rank gradient.
The median user issues their first significant scroll after fixating only
~50 % of the visible above-fold candidate set; only 10.3 % of trials reach
the last visible position before scrolling. The modal deepest pre-scroll
fixated position (the per-trial knee) is P1 (34.7 % of trials), with
P0 14.3 %, P2 24.3 %, P3 14.4 %, P4 8.9 %, P5+ 3 %. Per-position fraction
fixated before first scroll: P0 98.3 %, P1 81.1 %, P2 47.7 %, P3 29.3 %,
P4 19.6 %. Active criterion compilation is not an exhaustive pass
through the viewport. Source: scripts/output/first_scroll_vs_gaze/.
Pre-scroll first-visit LF/HF is sharply rank-correlated (Spearman
ρ = −0.857, p = 1.4 × 10⁻², N = 7 positions P0–P6); post-scroll
first-visit LF/HF is essentially flat (ρ = −0.482, p = 0.13, N = 11
positions P0–P10). At P0 specifically, pre-scroll first-visit median
LF/HF is 27.99 (N = 1,465); the rare post-scroll first-visit at P0 is
14.52 (N = 92), Δ = −13.47, MW p = 3.6 × 10⁻⁶. The same instrument
reads two different cognitive modes: Survey-active (criterion
compilation in working memory, pre-scroll, sharp gradient) and
Survey-external (the SERP as external memory for confirmation under
a now-stable criterion, post-scroll, flat). Source:
scripts/output/lfhf_pre_vs_post_scroll/.
Across 2,646 paired (trial, position) records under organic_hybrid,
return-visit LF/HF is significantly higher than first-visit LF/HF on
the same item by the same user (median Δ = +6.31, mean Δ = +12.55,
Wilcoxon two-sided p = 5.7 × 10⁻²³; 60 % Δ > 0). Participant-level
80 % Δ > 0, p = 3.1 × 10⁻⁴. Per-rank, the elevation is significant
P1–P5. Drift-control rules out within-trial baseline shift:
forward-only within-trial Δ between the latest and earliest visited
positions is −1.97 (p = 2.6 × 10⁻⁴), opposite direction to the paired
return Δ. Metric-specificity control rules out generalised pupil
amplitude: RIPA2 paired Δ on the same records is at the noise floor
(median +8 × 10⁻⁶, two-sided p = 0.17). The return-elevation lives in
the autonomic spectral ratio, not in per-fixation amplitude. Source:
scripts/output/lfhf_first_vs_return_paired/,
scripts/output/ripa2_first_vs_return_paired/,
scripts/output/lfhf_within_trial_drift_control/.
First-pass LF/HF argmax over visited positions hit-rate vs click_pos
under hybrid attribution: 0.32 against a 1/N consideration-set chance
baseline of 0.54 (lift = −22 pp, under-performs chance, N = 2,446).
LF/HF features alone in LOSO logistic: AUC = 0.636 ± 0.089. Trivial
gaze-dwell baseline (n_fixations, total_dwell_ms): AUC = 0.718 ±
0.098. Combined: AUC = 0.715 ± 0.100 — no lift over dwell. LF/HF is a
measurement instrument for cognitive state, not a click-prediction
feature. Source: scripts/output/lfhf_predicts_return_stress/,
scripts/output/lfhf_argmax_predicts_click/,
scripts/output/lfhf_click_prediction_test/.
The current ettac-paper/sections/adserp.tex claim that per-participant
Wilcoxon on per-(trial, position) median LF/HF gives p = 0.0055 with a
participant-cluster bootstrap CI of [+0.94, +3.85] does not reproduce
under any of three attribution flavors. Under absolute attribution
(matching the paper's stated methodology), participant-Wilcoxon mean-Δ
p = 0.57; cluster bootstrap CI [−1.72, +5.68] straddles zero.
Median-of-medians-Δ variant gives p = 0.017 (still 3× weaker than the
paper). Under organic_hybrid the signal weakens further; under organic
(bbox) it goes negative. The wr/nr stratification is rank-confounded:
returned items concentrate at top ranks, where LF/HF is higher; once
rank is partialled out (per-rank Cohen's d), the effect is null or
slightly negative in every cell. The §Predicting return paragraph
should be removed and replaced with the within-item paired return
finding (which is rank-controlled by construction). Source:
scripts/output/lfhf_predicts_return_stress/report.md (12-angle stress
test).
Per-participant correlations (n = 45, all under organic_hybrid):
| Correlation | Spearman ρ | p |
|---|---|---|
| mean knee × mean click position | +0.460 | 1.5 × 10⁻³ |
| mean knee × P0-fraction of clicks | −0.408 | 5.5 × 10⁻³ |
| mean knee × P0-or-P1 fraction | −0.363 | 1.4 × 10⁻² |
| mean knee × P3-or-deeper fraction | +0.337 | 2.4 × 10⁻² |
| mean knee × click entropy | +0.447 | 2.1 × 10⁻³ |
| mean knee × regression rate | −0.021 | 0.89 (NS) |
| regression rate × mean click position | +0.045 | 0.77 (NS) |
| regression rate × P0-fraction | +0.101 | 0.51 (NS) |
The participant-level rank-value-prior axis (top-heavy ↔ flat) predicts both knee depth and click distribution shape with consistent signs. Regression rate predicts neither at participant level. The trial-level satopt × knee effect (median split p = 5.9 × 10⁻⁵, optimizer-trial median P1 vs satisficer-trial P2) is real but is dominated by trial-count imbalance — optimizer participants generate more trials with knee data.
Implication: the OSEC-relevant individual-difference space is at least two-dimensional — rank-value prior strength (controls knee depth, click distribution shape, Survey-active investment) and verification appetite (controls return rate, Survey-external duration). Satisficer/optimizer is a one-dimensional projection; the two axes are nearly orthogonal in this dataset.
Source: scripts/output/knee_vs_click_distribution/,
scripts/output/knee_by_satopt/,
scripts/output/knee_by_rank_variant/. Working memo:
docs/drafts/rank-value-prior-osec-2026-05-03.md.
When the display-order top position (P0 under hybrid) is a top-of-page
ad (dd_top, n = 1,306 trials, 59 % of cohort), median knee = P1; under
organic-only attribution 25.8 % of the full cohort never fixates an
organic before the first scroll. Native-ad P0 trials (n = 452): hybrid
knee P3 — native ads enter active criterion compilation as if they were
candidates. Organic-top trials (n = 459): knee P2. Top-of-page display
ads consume one slot of the active-compilation budget without
contributing to organic-result criterion compilation. Source:
scripts/output/knee_by_rank_variant/.
- §3.3 (
ettac-paper/sections/adserp.tex): drop the §Predicting return paragraph (lines 164–183 of current draft); replace with within- item paired return finding. Plateau slope is non-significant under hybrid (was marginal under absolute) — the steep-vs-plateau separation now rests on the pooled MW (still very strong, p = 2.6 × 10⁻²⁵), not on the plateau slope itself. Seedocs/drafts/ettac-adserp-2026-05-03-v2.md. - Task-model paper / methods paper: the satopt → knee interpretation needs splitting into two axes; the four-class consideration-set taxonomy sits inside Survey-external + Evaluate, not across all of Survey; Survey-active is typically just ~2 positions per trial (median knee P1).
scripts/render_*.py (the canonical paper-figure producers) now accept
--attribution {organic,absolute} and default to organic. Bbox-attributed
inputs (cursor-approach-features-organic.json +
regression_labels_cache_organic.json) flow through to
class_distributions.png, coupling_traces.png,
cursor_gaze_array.png, cursor_gaze_timeseries.png,
deferred_vs_rejected_*.png, gaze_around_cursor.png,
gaze_density_class.png, class_distributions_wild_mode.png.
The per_record_coupling.json and per_record_trajectory.json caches
inside render_deferred_vs_rejected.py are now keyed by attribution
(*_organic.json vs the unsuffixed legacy file) so the n=14,760 cache
doesn't collide with the n=13,419 cache.
coupling_traces.png previously showed three well-separated horizontal
bands (eval-rejected ≈ 220 px / deferred ≈ 300 px / clicked ≈ 390 px).
Under bbox attribution the three traces collapse to ~400 px with heavily
overlapping IQR ribbons. The renderer's hardcoded legend captions
("EVAL-REJECTED tracks gaze closely") describe the legacy shape and no
longer match the data. The motor-signature dissociation in
deferred_vs_rejected_four_panel.png (cursor-gaze distance and dwell
deltas, p < 10⁻⁹ and p < 10⁻¹⁹) survives the cascade.
r1_dissociation.png / r1_2x2_dissociation.png (pupil-paper-relevant). The
R1 per-(trial, position) RIPA2 vs LF/HF dissociation collapses on the
RIPA2 side under bbox attribution. Per-fixation effect on later-returned
vs never-returned items: LF/HF d=+0.041, p=1.1e-03 (preserved, sign
unchanged); RIPA2 d=+0.006, p=8.0e-01 (was p=0.0058 under absolute,
per the JEMR-2025 implementation-bug fix). The "lingered first time"
LF/HF claim survives. The "lingered but processed shallowly" joint
LF/HF × RIPA2 signature does not — the RIPA2 component appears to have
been rank-pooling artifact, not a per-fixation arousal-amplitude
difference. pupil paper §3 should drop the RIPA2 leg of the joint
dissociation claim unless absolute-attribution is held as the primary.
plot_approach_retreat_hero.png is pinned to absolute attribution.
The curated COMMIT exemplar (p015-b1-t5 pos=2) reattributes away from
'clicked' under bbox so the "Commit (clicked)" caption stops matching.
New exemplars need hand-picked from cursor-approach-features-organic.json
before this hero figure migrates.
plots-v1/plot_ettac_*.png regenerated under bbox-organic. Headline
position-load result holds (full-corpus ρ = -0.655, p < 10⁻⁴; steep-phase
ρ = -1.000 over P0–P3, p = 3.2 × 10⁻²³). Plateau ρ flipped to +0.321
(p=0.482, n.s.) — directional, but no longer surprising at this
attribution.
Aggregate refactor: notebooks-v2/update_key_claims.py is now a reader
(notebooks are canonical) instead of a template-writer; emits
docs/notebook-key-claims.md directly from each notebook's K-claims
cell. Eliminates the two-copy sync problem behind the 2026-05-01
--force-clobber guard.
Pipeline + consumer API for the bbox AOI enrichment, plus first-pass K-ID delta evidence under organic-rank attribution. Notebook migrations not yet shipped — Andy's deep dive on pupil paper this weekend will decide which findings move to organic-rank as primary.
60a2e7b9widget filter + composite-cell split +is_adx-overlap fixda0a8aaeband-y guard against featured-snippet false positives- This commit: consumer API in
data_loader.py+ producer migrations + comparison harness
Three new functions consume the bbox JSONs written by scripts/extract_organic_bboxes.py:
load_aois(trial_id, include_widgets=False, include_cells=False)— full structured AOI dict; widgets and composite cells are opt-in (default-off matches "second-column variable" convention from methodology §7).organic_aoi_bands(trial_id)— pixel-accurate(y_top, y_bottom)bands per organic; drop-in replacement forresult_bands(n, doc_h).organic_aoi_tops(trial_id)— convenience for the y-tops, drop-in forresult_band_tops(n, doc_h).
All three fall back to band estimation when a trial's bbox JSON is missing. The 'source' field in the load_aois return discriminates 'bbox' vs 'band_estimate'.
Both compute_butterworth_lfhf.py and compute_ripa2.py gained --attribution {absolute,organic}. Default is absolute (legacy). Organic-attribution outputs land at:
AdSERP/data/butterworth-lfhf-by-position-organic.jsonAdSERP/data/ripa2-by-position-organic.json
pipeline_organic_count vs count_organic_ranks (HTML-derived, ad-overlap excluded; not ground truth — includes some widget-heading h3s):
exact (delta=0): 683/2,776 = 24.6%
|delta| ≤ 1: 1,801/2,776 = 64.9%
|delta| ≤ 2: 2,451/2,776 = 88.3%
median 0, mean -0.20
Widget filter caught 2,008 widgets across 1,628 trials (58.6%). Composite cells found in 166 trials (6.0%, 376 cells).
Per-fixation re-attribution rate: 73.9% (scripts/output/aoi-consumer-cascade/per-rank-shifts.json).
rank 0: band 51,255 → bbox 48,908 (-4.6%)
rank 1: band 43,778 → bbox 28,130 (-35.7%)
rank 2: band 36,306 → bbox 17,094 (-52.9%) ← rank-2 peak under absolute is artifactual
rank 3: band 24,094 → bbox 12,698 (-47.3%)
rank 8: band 3,872 → bbox 5,449 (+40.7%)
rank 9: band 2,245 → bbox 3,422 (+52.4%)
rank 10: band 862 → bbox 1,320 (+53.1%)
Top re-attribution flow:
band rank 0 → bbox rank -1: 38,512 fixations (band-attributed to organic but actually outside any AOI)
band rank 1 → bbox rank -1: 21,791
band rank 2 → bbox rank 0: 16,475 (re-numbered down by ad/widget exclusion)
bbox rank -1 = fixation didn't land on any organic AOI. ~60K fixations were attributed by band estimation to organic ranks 0-1 that actually fall outside organic AOIs entirely — likely on ad cards, search box, knowledge panels, widgets.
Full table at scripts/output/aoi-consumer-cascade/nb14_nb18_comparison.md. K-IDs computed with the canonical published denominator (positions 0–10, N=11), matching the original Key Claims block — earlier draft of this entry used a wider position range and produced misleading K3 values.
| K | Claim | Old (absolute, ads pooled) | New (organic, bbox) | Verdict |
|---|---|---|---|---|
| K1 | trials | 2,416 | 2,174 (−242) | sample shrinks |
| K2 | segments | 6,112 | 4,450 (−1,662) | |
| K3 | ρ pos 0–10 (N=11) | −0.927, p=4e-5 | −0.655, p=0.029 | ✓ survives, weaker |
| K4 | ρ pos 1–10 (N=10) | −0.903, p=3e-4 | −0.539, p=0.108 | ⚠ ns |
| K6 | clicked > non-clicked p | 3.5e-6 | 2.5e-7 | ✓ stronger |
| K9 | steep vs plateau MW p | 1.6e-23 | 8.8e-9 | ✓ holds |
| K10 | steep ρ (pos 0–3) | −1.000 (perfect) | −0.800, p=0.20 | ⚠ ns |
| K11 | plateau ρ (pos 4–10) | −0.714, p=0.071 | +0.321, p=0.482 | ⚠ sign flip |
| K | Old | New | Verdict |
|---|---|---|---|
| K6 RIPA2 × position ρ | −0.262, p=0.366 | −0.080, p=0.776 | ⚠ ns under both |
Full table at scripts/output/aoi-consumer-cascade/nb23_comparison.md. Generated by scripts/compare_nb23_under_attributions.py on n=2,776 trials.
| K | Claim | Old (band, abs rank) | New (bbox, org rank) | Verdict |
|---|---|---|---|---|
| K1 | Click share × rank ρ | −0.952, p=2.3e-5 | −0.988, p=9.3e-8 | ✓ sharper monotone |
| K1 | N clicks attributed | 2,764 | 2,363 (−401) | clicks on ads/KP/widgets correctly excluded |
| K2 | Fixation count × rank ρ | −1.000, p=6.6e-64 | −0.988, p=9.3e-8 | sharper N reflects rank-0 share jump |
| K2 | Fixations attributed | 202,792 | 144,874 | ads/widgets/KP fixations correctly excluded |
| K3 | Total dwell × rank ρ | −1.000 | −0.988 | similar |
| K8 | Forward fixations % | 74.0% | 74.6% | stable |
| K9 | Regression fixations % | 26.0% | 25.4% | stable |
Ski jump returns under organic attribution. Per-rank click distribution shows the ad-displacement artifact disappear and a genuine terminal-click ski jump emerge at rank 8:
band/abs bbox/org Δ from prev (org)
rank 0: 18.85% 44.86%
rank 1: 19.10% 17.60% −27.25
rank 2: 24.57% 10.83% −6.77 ← ad-displacement peak gone
rank 3: 14.83% 7.24% −3.60
rank 7: 1.88% 2.20% −0.55
rank 8: 1.77% 2.71% +0.51 ⬆ ← terminal-click ski jump
rank 9: 1.12% 1.48% −1.23
Under absolute rank, the spurious "ski jump" was at rank 2 (24.57%, +5.46% above rank 1) — that was the ad-displacement artifact (top-organic clicks attributed to rank 2 because ads occupied ranks 0–1). Under bbox attribution, it's at rank 8 with a +0.51% bump (52 → 64 clicks at rank 8 vs rank 7) — the canonical end-of-first-viewport terminal-click effect.
Full table at scripts/output/aoi-consumer-cascade/nb04_comparison.md. abs n=2,764 / org n=2,363.
| K | Claim | Absolute | Organic |
|---|---|---|---|
| K2 | First-viewport clickers | 504 (18.2%) | 382 (16.2%) |
| K4 | Mean share of results-above-click fixated | 98.0% | 96.9% |
| K6 | Mean share of max-scroll-depth results fixated | 74.0% | 70.7% |
| K7 | FV clickers — share of first-screen results fixated | 68.5% | 60.3% |
| K8 | Scrollers — share of first-screen results fixated | 93.9% | 90.8% |
Per-position fixation budget shifts dramatically for FV clickers:
| Position | K-ID | Absolute | Organic |
|---|---|---|---|
| 0 | K13 | 45.4% | 67.9% |
| 1 | K14 | 35.7% | 28.5% |
| 2 | — | 22.9% | 17.2% |
| 3 | — | 16.1% | 12.2% |
K13 jumps from 45.4% to 67.9% under bbox attribution — when first-viewport clickers click, 68% of their fixation time is on position 0 (the top organic), not 45% as previously reported. This is consistent with the rank-0 click share jump (NB23 K1: 18.8% → 44.9%): under organic attribution, the top organic is clearly the dominant attentional target.
The N=504 → N=382 shift in FV clickers reflects the 411 trials in NB22 where the click was on an ad/widget — those drop out of the FV-organic-clicker cohort under bbox.
Full table at scripts/output/aoi-consumer-cascade/nb22_comparison.md. Generated on n=2,775 trials.
Class distribution shifts:
| Class | Absolute share | Organic share | Δ |
|---|---|---|---|
| clicked | 8.2% | 8.9% | +0.7 |
| deferred | 26.2% | 27.1% | +0.9 |
| evaluated_rejected | 15.5% | 20.1% | +4.6 |
| not_approached | 50.1% | 43.9% | −6.2 |
The evaluated_rejected class grows substantially under bbox attribution because ad-slot positions that were "not_approached" under absolute rank simply don't exist as positions under organic rank — so the visited-but-not-clicked fraction shifts up.
Per-trial averages:
| Absolute | Organic | Note | |
|---|---|---|---|
| Mean visited positions / trial | 6.03 | 5.32 | bbox tighter — ad/widget visits not counted |
| Mean regressed positions / trial | 3.86 | 3.15 | |
| % of visited that are regressed | 64.0% | 59.3% |
Per-trial label stability:
- 99.4% of trials (2,757/2,775) have at least one shifted four-class label when switching from absolute to organic attribution.
- 411 trials have
clickedcount differing — i.e., the click landed on a different bucket (organic vs ad/widget) under the two methods. - 2,047 trials have
deferredcount differing — gaze regression picked up different positions because position attribution shifted.
Implication for AR replay rebuild: nearly every curated example in approach-retreat/site/replay/data/curation.json may have stale labels. Re-running build_replay_trial.py on those trials will produce fresh AOI labels via M5; caption claims like "5 DEFERRED AOIs" need automated cross-check against the regenerated labels before re-publishing demos. The 411 click-shifts are particularly important because curation.json filters trials by class profile.
| K | Old (h3) | New (bbox) |
|---|---|---|
| K11 modal organic count | 10 (26.3%) | 9 (33.1%) |
| K12 range | 1–15 | 1–17 |
| K13 ∈ {9,10,11} | 69.8% | 75.8% |
| K14 exactly 10 | 26.3% (731) | 30.5% (847) |
The "monotonic load decline by rank" finding is partly an absolute-rank artifact driven by ad-screening discrimination cost contaminating early positions. Under organic-only attribution, the gradient collapses to ns; what survives strongly is (a) clicked > non-clicked (K6 strengthens) and (b) steep early band vs plateau late band dichotomy (K9 holds at p<10⁻⁸).
Andy's proposed reframe: organic rank as primary, ads as essential distractors. Headline becomes "cognitive engagement on organic search results is two-band — early evaluation-heavy band + late satisficer plateau, with clicked positions uniformly elevated regardless of band". K6 + K9 carry the new headline; K3/K4/K10/K11 retire to a robustness section that shows the absolute-rank curves and explains the ad-distractor contamination.
Bbox AOIs are extracted tight to visual content; clicks frequently land in the small visual gap between adjacent card rectangles (~10–15 px typical). Under strict containment those count as "off-AOI" even when they were almost certainly intended for an adjacent card.
Distribution of off-AOI click distance to nearest organic edge:
median 10 px, P75 15 px, P90 22 px
≤ 10 px: 55.1% rescued
≤ 20 px: 88.8% rescued
≤ 30 px: 92.5% rescued ← elbow; further loosening rescues only ~0.3pp more
≤ 50 px: 92.5% rescued
≤ 100 px: 92.8% rescued
attribute_click_to_organic(click_y, trial_id, tolerance_px=30) added to data_loader.py. Logic: strict containment in organic always wins; if click falls inside any ad rect, refuse to snap (it's an ad click); if inside any filtered widget, refuse; otherwise snap to nearest organic if within tolerance_px.
Click attribution under each method:
| Bucket | Strict (tolerance=0) | Tolerant (30 px) |
|---|---|---|
| Organic | 1,785 (64.3%) | 2,181 (78.6%) |
| All ads | 557 (20.1%) | 557 (20.1%) |
| Widgets (filtered) | 5 (0.2%) | 5 (0.2%) |
| Off-AOI (KP / image carousel / footer / large gaps) | 428 (15.4%) | 32 (1.2%) |
The 30 px tolerance rescues 396 clicks that strict containment loses — these are the visual-margin clicks ("clicked the bottom edge of card 3") not the truly off-AOI clicks (which stay at ~32 = 1.2%).
Headline for paper framing: under this attribution, 78.6% of clicks are on organic results, 20.1% on ads, 1.2% on content the pipeline doesn't model (Knowledge Panel, image carousel, etc.). The "ads as essential distractors" frame holds; the methodology limitation around right-pane / KP coverage affects ~1% of clicks, not 15%.
scripts/compute_retreat_arcs.py extracts NB24's extract_retreat_arcs_v2 into a producer with --attribution {absolute, organic_hybrid}. The hybrid mode combines bbox organics + shipped ad rectangles into one ordered position list with etype tags — preserving the organic-vs-top-ad-vs-native-ad comparison that NB24 needs.
Output:
AdSERP/data/retreat-arcs.json— 1,490 raw arcs (legacy absolute)AdSERP/data/retreat-arcs-organic.json— 5,201 raw arcs (organic_hybrid; 3.5× coverage gain)
The coverage gain reflects bbox AOIs being pixel-accurate (cursor enters/exits positions cleanly) vs band estimation (cursor trajectory frequently fell into "no-position" gaps).
| Metric | Absolute | Organic_hybrid | Verdict |
|---|---|---|---|
| Retreats (valid arcs, not clicked) | 907 | 1,651 | |
| Top Ad arc ratio (median) | 1.51 | 1.55 | ✓ unchanged |
| Top Ad lateral displacement (median px) | 63 | 62 | ✓ unchanged |
| Top Ad lateral/arc ratio (pooled) | 0.166 | 0.170 | ✓ replicates |
| Organic arc ratio | 1.22 | 1.11 | ✓ sharper (more linear) |
| Organic lateral displacement | 33 px | 11 px | ✓ sharper (cleaner) |
| Organic vs Top Ad arc ratio MW p | 1.1e-5 | 4.0e-17 | ✓ much stronger |
The "retreat as lateral displacement" claim survives and strengthens. Top ads still curve laterally (arc ratio 1.55, lateral 62 px) — that's stable across attribution methods. What changes is the contrast: organic retreats are now revealed to be much more linear (lateral disp 33 → 11 px) than previously thought. The Mann-Whitney p-value tightens 12 orders of magnitude (1e-5 → 4e-17).
Implication for AR / methods paper: the brand claim that "top ads impose lateral retreat arcs" is more defensible under bbox attribution, not less. The 3.5× more retreat-arc data also enables sharper per-(direction × etype) splits for the forward/regressive analysis NB24 produces.
scripts/compute_cursor_approach_features.py extracted from NB15 cell 4 with --attribution {absolute,organic}. Output:
AdSERP/data/cursor-approach-features.json— legacy absolute (13,419 records, 2,339 trials)AdSERP/data/cursor-approach-features-organic.json— bbox attribution (14,760 records, 2,701 trials)
Coverage increases under organic (more trials have valid AOIs because the producer's extract_serp_results-or-fallback no longer rejects trials where h3 enumeration returns null). Per-position record counts:
| Position | Absolute | Organic |
|---|---|---|
| 0 | 2,320 | 2,658 |
| 1 | 2,244 | 2,360 |
| 2 | 2,091 | 1,985 |
| 7 | 719 | 936 |
| 8 | 481 | 748 |
| 9 | 192 | 401 |
Approach share (min_dist < 100): 28.2% → 23.3% under organic. The drop reflects cleaner per-AOI distance calculations (no spurious "approaches" to gap regions previously included as positions).
This unblocks NB20 (approach by element), NB21 (click prediction LOSO), NB22 numerical recompute, NB24 (retreat arc geometry), NB28 (viewport bands) — all consume cursor-approach-features.json directly. To rerun those under organic, point them at cursor-approach-features-organic.json (one-line change in cell 1 of each).
Side-by-side K-ID reports complete for 6 notebooks:
| Notebook | Method | Status | Headline shift |
|---|---|---|---|
| NB14 Butterworth | Producer rerun + comparison | ✓ done | Monotone-decline (K3) survives but weaker; perfect-steep (K10) and plateau-direction (K11) lose significance; clicked>nonclicked (K6) and dichotomy (K9) strengthen |
| NB18a RIPA2 | Producer rerun + comparison | ✓ done | ρ stays ns under both |
| NB23 rank effects | Per-trial recompute | ✓ done | ρ tightens, ski jump returns at rank 8 |
| NB22 four-class | Per-trial recompute | ✓ done | 99.4% of trials shift; 411 click reattributions |
| NB04 fixation coverage | Per-trial recompute | ✓ done | K13 FV pos-0 budget 45% → 68% |
| NB25 SERP composition | Counts comparison | ✓ done | Modal organic count 10 → 9 |
Pending — heavier regeneration cost:
| Notebook | Why heavier |
|---|---|
| NB21 click prediction | Consumes cursor-approach-features.json; needs NB15 producer regenerated under organic AOIs |
| NB28 viewport bands | Same — depends on cursor-approach-features + regression_labels_cache |
| NB24 retreat arc geometry | Same upstream dependency |
| NB20 approach by element | Same |
| NB15 cursor approach itself | The producer; rerunning it cascades to NB20/21/22/24/28 |
These all share one upstream artifact (AdSERP/data/cursor-approach-features.json), so regenerating it once under organic attribution unblocks the whole tier. Estimated cost: 1–2 hr to migrate NB15's per-AOI loops + re-run.
Probably unaffected (per code triage, no per-position attribution code):
- NB05 LHIPA — per-trial pupillometric index
- NB07a regressions prevalence — count-based
- NB09 difficulty — token-based difficulty measures
- NB13 survey phase — saccade-amplitude phase classifier
- NB17 scroll retreat — scroll-event based, AOI-independent
(These should still be spot-checked, but no expected K-ID shifts from AOI cascade.)
Six notebooks worth of K-ID evidence + the master TL;DRs:
- The "monotonic load decline" framing weakens under organic but doesn't fully die. K3 (ρ over positions 0–10) survives at p=0.029 vs absolute's p=4e-5 — significant but the strength halves. K4 (positions 1–10), K10 (steep-phase perfect monotone), and K11 (plateau direction) all lose significance and K11 sign-flips. The cleaner story is dichotomy + decision-locking: K9 steep-vs-plateau still p<10⁻⁸, and K6 (clicked > non-clicked) strengthens to p=2.5e-7.
- Top-organic dominance is sharper than reported. Click share at rank 0: 18.8% → 44.9%. Position-0 fixation budget for FV clickers: 45% → 68%. Spurious "rank-2 peak" was ad-displacement.
- Ski jump returns at rank 8 under bbox — terminal-click effect at end-of-first-viewport, masked under absolute attribution.
- AR demos cannot ship as-is. 99.4% of trials have shifted four-class labels, 411 click reattributions. Curation captions are stale until rebuild.
- pupil paper deep-dive scheduled for the weekend (May 2–3, deadline May 15).
- Notebook code migrations (NB14, NB18a, NB23, NB04, NB22 cell rewrites) deferred to after the deep-dive — paper framing decision drives which K-IDs are primary.
- Approach-retreat replay-bundle rebuild also deferred until NB22 four-class taxonomy is regenerated under organic attribution AND curation captions are validated.
- NB15 producer migration (cursor-approach-features under organic) is the unlock for NB20/21/24/28; estimated 1–2 hr next iteration.
notebooks-v2/data_loader.py—load_aois,organic_aoi_bands,organic_aoi_topsscripts/compute_butterworth_lfhf.py—--attributionflagscripts/compute_ripa2.py—--attributionflagscripts/compare_aoi_consumers.py— full-corpus per-fixation re-attribution auditscripts/compare_nb14_nb18_under_attributions.py— side-by-side K-ID report (NB14 + NB18a)scripts/output/aoi-consumer-cascade/— generated reports for the weekend deep-dive
notebooks-v2/data_loader.py documented Gazepoint FPOGY as screen-space (viewport pixels, 0..scr_h) and provided helpers (assign_fixation_to_position, gaze_cursor_distance) that added scroll offset internally. Per the AdSERP README (https://github.com/kayhan-latifzadeh/AdSERP), FPOGY is actually page-space — "relative to the top-left corner of the screenshot in pixels." The JS adserp-importer.js in Scrutinizer had the correct interpretation all along. The Python loader was wrong.
Empirical verification on 20 scrolled trials: Pearson r(FPOGY, scrollY) ≈ 0.95; (FPOGY − scrollY) falls inside the viewport [0, scr_h] for 98%+ of fixations; max FPOGY exceeds scr_h on scrolled trials (e.g. 2,143 on a 1,024-tall screen). This is definitionally impossible under the screen-space interpretation.
The result: every fix.y + scroll_y in Python code was double-counting scroll. This is the symmetric bug to the 2026-04-09 cursor-side audit (which fixed the opposite sign error: adding scroll to values that were already page-space). Both bugs originated from the same miscommunication about coordinate conventions across the two language pipelines.
Found while validating the 31 canonical gazeplot trials against the authors' own full-page screenshots (Zenodo record 15236546, downloaded 2026-04-12). Empirically falsifiable: fixations plotted in page-space coordinates on the raw screenshots landed on meaningful content; the same fixations under the buggy fix.y + scroll interpretation landed off-page (past doc_h) on 842 of 2,776 trials.
notebooks-v2/data_loader.py— module docstring rewritten (FPOGY is page-space, full audit history);load_fixationsdropsclamp_y;assign_fixation_to_position(page_y, tops, n_results)new 3-arg signature;gaze_cursor_distancescroll-free (both inputs page-space);screen_y_to_page_y/page_y_to_screen_yrenamed toviewport_y_to_page_y/page_y_to_viewport_y;classify_fixationsno longer clamps or adds scroll.notebooks-v2/test_coordinate_invariants.py— rewritten. New invariants enforce the correct page-space contract and fail loudly on any regression. Corpus-wide check: 234,333 / 234,339 fixations (100.00%) land within page bounds; 2,889 / 2,889 clicks (100.00%); 842 / 2,776 trials would have overfloweddoc_hunder the oldfix.y + scrollformula.- Notebook callers fixed: NB07c, NB12, NB15, NB18_learning_curve, NB19, NB22 — removed
+ scrollpatterns and updatedassign_fixation_to_positioncalls to the new signature. 17 substitutions across 5 notebooks plus the NB07c / NB12 structural rewrites. - Script callers fixed:
compute_butterworth_lfhf.py,compute_ripa2.py,compute_encoding_vs_retrieval.py,generate_explainer_heatmaps.py. - Downstream propagation:
pupil-lfhf/validation/adserp_loader.pyforked loader patched in-place onmain(notfeat/ripa2-comparison, which Duchowski requested be kept isolated).compute_butterworth_lfhf.pyandcompute_ripa2.pyin pupil-lfhf updated and re-run.validation/README.mdandCLAUDE.mdnoted. - approach-retreat: README.md, docs/one-pager.md, docs/references/arapakis-leiva-2016.md, CLAUDE.md, and the 2026-04-10 historical memo (appended with a 2026-04-12 update section) — updated to post-fix values.
| K | Pre-fix (2026-04-10) | Post-fix (2026-04-12) |
|---|---|---|
| K1 Trials with usable LF/HF | 2,719 | 2,719 (unchanged) |
| K2 Position-segments | 6,874 | 6,112 |
| K3 Pos × median LF/HF | ρ = −0.618, p = 0.0426 (borderline) | ρ = −0.927, p < 0.0001 |
| K4 Positions 1–10 only | ρ = −0.491, p = 0.150 (ns) | ρ = −0.903, p = 0.0003 (sig) |
| K5 Within-trial (≥3 pos) | N=1,167, mean ρ = −0.105, 56.6% neg | N=1,025, mean ρ = −0.152, median ρ = −0.400, 61.0% neg |
| K6 Clicked vs non-clicked LF/HF | 22.24 (N=1,110) vs 19.01 (N=5,472); p = 1.30 × 10⁻⁴ | 22.40 (N=1,463) vs 19.27 (N=4,636); p < 10⁻⁸ |
| K9 Steep vs plateau (MW raw) | p = 4.1 × 10⁻²² | p = 3.2 × 10⁻²³ |
| K10 Steep phase Spearman (pos 0–3) | — | ρ = −1.000 (perfect monotone) |
| K11 Plateau phase (pos 4–10) | ρ = −0.393, p = 0.383 (ns) | ρ = −0.714, p = 0.071 (marginal) |
The 2026-04-10 note in NB14_BODY that said "K3 unchanged by the 2026-04-09 audit because it uses fixation position, not click_pos" is now superseded — the fixation-side bug does touch K3. The claim was right for the 2026-04-09 scope, wrong for the 2026-04-12 scope.
| K | Pre-fix (2026-04-10) | Post-fix (2026-04-12) |
|---|---|---|
| K1 Records / participants / click rate | 15,397 / 47 / 14.4% (2,214 clicks) | 13,419 / 47 / 16.6% (2,228 clicks) |
| K3 M3 LOSO AUC | 0.792 ± 0.062 | 0.859 ± 0.044 (+0.067) |
| K4 M4 (approach only) AUC | 0.792 ± 0.061 | 0.861 ± 0.043 |
| K7 M3 LOSO AP | 0.491 | 0.611 (+0.120) |
| K9 Per-participant LOSO M3 AUC | median 0.798, IQR [0.759, 0.831], min 0.589 | median 0.860, IQR [0.827, 0.901], min 0.745 |
| K12 Brier score | 0.1781 | 0.1526 |
| K15 Evaluated-rejected (classifier) | 344 (2.2%) | 974 (7.3%) |
| K21 position coefficient | −0.380 | −0.130 (same direction, weaker) |
| K27 direction_changes coefficient | −0.005 | +0.061 |
| K | Pre-fix | Post-fix |
|---|---|---|
| K2 Deferred N | 1,178 (7.7%) | 1,916 (14.3%) |
| K3 Evaluated-rejected N | 278 (1.8%) | 439 (3.3%) |
| K4 Not approached N | 11,727 (76.2%) | 8,836 (65.8%) |
| K5 Retreat distance (def vs rej) | 191.3 vs 96.4 px, p = 1.9 × 10⁻¹¹ | 234.5 vs 90.8 px, p = 1.76 × 10⁻³⁸ |
| K6 Gaze dwell (def vs rej) | 3,842 vs 2,018 ms, p = 3.7 × 10⁻²⁶ | 4,137 vs 1,612 ms, p = 9.76 × 10⁻⁷⁰ |
| K11 M3 LOSO AUC | 0.792 ± 0.062 | 0.859 ± 0.044 |
| K | Pre-fix | Post-fix |
|---|---|---|
| K5 LF/HF × position | ρ = −0.618 | ρ = −0.927 |
| K6 RIPA2 × position | ρ = −0.827 | ρ = −0.909 |
| K15 Will-regress one-sided p | 0.0022 | 0.0106 (weaker but still sig) |
| K16 First-pass dwell p | 4.1 × 10⁻²⁴ | 8.1 × 10⁻³² (stronger) |
No sign flips. No direction changes. Every effect got stronger. The pre-fix scroll double-count was injecting noise in the position direction, masking the true signal. The framework-compilation story (steep decline early, plateau later) is preserved and sharpened. The deferred-vs-rejected motor signature dissociation (the methods paper's central empirical claim) is now on dramatically firmer statistical ground (retreat-distance p went from 10⁻¹¹ to 10⁻³⁸; gaze-dwell p from 10⁻²⁶ to 10⁻⁷⁰).
K3 moving from ρ = −0.618, p = 0.0426 (borderline at α = 0.05) to ρ = −0.927, p < 0.0001 is the biggest single win. K4 (positions 1–10 only) flipping from non-significant (ρ = −0.491, p = 0.150) to highly significant (ρ = −0.903, p = 0.0003) is the second biggest — pre-fix the 1–10 subset could not be cited; post-fix it's a robust effect.
All Key Claims blocks regenerated via notebooks-v2/update_key_claims.py (VERIFIED date bumped to 2026-04-12). docs/findings.md and docs/findings-approach-retreat.md refreshed. methods paper draft (docs/drafts/paper-output/paper.md) and model-analysis sidecar refreshed. Task-model paper, OSEC explainer, Duchowski correspondence drafts, publication roadmap, pupil paper brief, priming null result doc, Shi 2025 lit note — all substantive stale values replaced. Cross-repo: attentional-foraging / pupil-lfhf / approach-retreat are consistent. science-agent notebook-audit across all three repos returns zero substantive hits (remaining warnings are false positives on AttCur-dataset tables and historical audit memos).
Snapshot at docs/drafts/coord_fix_snapshot_20260412/:
key_claims_before.json— extracted K-ID tables as of 2026-04-12 08:51 (just before the re-runs)butterworth-lfhf-by-position.json,ripa2-by-position.json,cursor-approach-features.json,cursor-approach-features-typed.json,encoding-vs-retrieval.json— pre-fix copiesnotebook-key-claims.md— pre-fix aggregatepost_fix_stdout_{nb}.txt— dumped cell outputs per notebookcell_output_diff.md— naive diff (superseded by this entry)git_state.txt— HEAD sha + working tree at snapshot time
Historical audit memos that describe the 2026-04-09 state (attentional-foraging/CHANGELOG.md entry below, approach-retreat/docs/drafts/2026-04-10-coord-audit-update.md, docs/findings-approach-retreat.md "Refreshed" banner) have been left in place as the historical record of what was known at those dates — the 2026-04-12 entries build on them rather than replacing them.
- Key Claims expanded to 11 notebooks (~145 canonical rows). New: NB05 (LHIPA, K1–K15), NB12 (regression precision null, K1–K14), NB18 (RIPA2 vs LF/HF, K1–K17).
- NB14 piecewise gradient analysis (K9–K15). Resolves K3's borderline p = 0.043:
- Steep phase (pos 0–3): Mann–Whitney p = 4.1 × 10⁻²², medians 30.0 → 16.0
- Plateau phase (pos 4–10): Spearman ns — flat, as predicted by framework compilation
- Within-trial gradient strengthens with evaluation depth: 79.1% negative at ≥7 positions (K15)
- findings.md v11: corrected 8 stale values (NB13, NB11, NB14), added Key Claims
[NB__:K__]references throughout. - NB14:K5 inclusion criterion documented: ≥3 valid LF/HF segments at positions 0–10 (Spearman with N=2 is degenerate).
- pupil-lfhf validation pipeline: self-contained AdSERP analysis (
adserp_loader.py,validate_adserp.py) with coordinate-audited click_pos. All values match Key Claims exactly.
The bug. scripts/compute_butterworth_lfhf.py:147 and scripts/compute_ripa2.py:193 derived each trial's click_pos by calling assign_fixation_to_position(last_click[2], click_scroll, …). That function is designed for gaze — it adds scroll_y to convert screen-space FPOGY into page-space. But clicks[-1][2] comes from evtrack ypos, which is already page-space (verified empirically: p004-b2-t3 has cursor Y up to 1,902 px while the browser window is only 1,137 px tall). Adding scroll double-counted it, pushing clicks on scrolled trials to deeper bands than the user actually clicked.
The same pattern was cargo-culted into nine other notebooks (NB01, NB03, NB05, NB06, NB07b, NB10, NB12, NB15, NB18-learning_curve, NB23, NB24) and one additional script (forward_regressive_tolerance_sweep.py). The root cause is that half the notebooks reimplement their own mini-loader in cell 2 instead of importing data_loader.py, each with its own implicit coordinate-space assumption.
Impact, corpus-wide (see notebooks-v2/test_coordinate_invariants.py Invariant 9):
| Correct formula | Buggy formula | |
|---|---|---|
| Clicks landing in their reported band | 2,764 / 2,764 | 1,174 / 2,764 (57.5 % mis-placed) |
| Mis-placed clicks on scrolled trials | 0 | 1,590 / 2,266 |
| No-scroll trials (sanity bar) | — | 0 disagreements |
The buggy formula also produced physically impossible click_pos values (up to 15, for 10-result SERPs) in 239 trials of the old butterworth-lfhf-by-position.json.
NB14 Key Claims — before / after the fix:
| Claim | Before | After | Notes |
|---|---|---|---|
| K1 (trials) | 2,719 | 2,719 | — |
| K2 (position segments) | 6,874 | 6,874 | — |
| K3 (position × median LF/HF) | ρ = −0.618, p = 0.0426 | ρ = −0.618, p = 0.0426 | Exact — uses fixation position, not click_pos |
| K4 (positions 1–10) | ρ = −0.491, p = 0.150 | ρ = −0.491, p = 0.150 | — |
| K5 (within-trial) | N = 1,167, median ρ = −0.200 | N = 1,167, median ρ = −0.200 | — |
| K6 (clicked vs non-clicked LF/HF) | 22.86 (N = 1,145) vs 18.97 (N = 5,437); p ≈ 0 | 22.24 (N = 1,110) vs 19.01 (N = 5,472); U = 3,257,823, p = 1.30 × 10⁻⁴ | Direction and significance preserved |
| K7 (LF/HF × LHIPA) | ρ = −0.122, p = 9.29 × 10⁻¹⁰, N = 2,492 | unchanged | — |
| K8 (position medians) | pos 0: 29.98 → pos 1: 21.20 → … | unchanged (uses fixation position, not click_pos) | — |
The pupil paper central claim (K3) is unaffected. The position-level correlation, within-trial decomposition, and LHIPA cross-index validation all use fixation position (gaze → page-space, which is the coordinate-correct direction). Only click_pos-dependent rows moved.
The fix.
notebooks-v2/data_loader.py— documented coordinate-space conventions in the module docstring, tightenedassign_fixation_to_positionto name its parameterscreen_fix_yand warn that cursor/click Ys must not be passed. Added canonical helpers:get_click_page_xy,click_to_position,cursor_to_position,screen_y_to_page_y,page_y_to_screen_y,gaze_cursor_distance,interpolate_cursor_at.notebooks-v2/test_coordinate_invariants.py— nine-section regression test locking in the conventions. Corpus-wide Invariant 9 produces the 1,590-trial headline number above.scripts/compute_butterworth_lfhf.py— replaced the buggyassign_fixation_to_positioncall withclick_to_position(clicks, tops, n_results). Regeneratedbutterworth-lfhf-by-position.json.notebooks-v2/update_key_claims.py— NB14 K6 row updated; aggregatedocs/notebook-key-claims.mdrefreshed.
NB15 cursor-approach fix — the feature-generating hero notebook. Two bug sites: (1) compute_approach_features double-counted scroll on mouse_page_y, corrupting min_dist, mean_dist, final_dist, dwell_in_proximity_ms, and was_clicked; (2) click_y_page = clicks[0][2] + click_scroll corrupted click-position assignment. Fix: import click_to_position and gaze_cursor_distance from data_loader, replace both sites. Regenerated cursor-approach-features.json via jupyter nbconvert --execute; regenerated cursor-approach-features-typed.json via scripts/add_etype_to_features.py. Pre-fix JSONs preserved with .prefix-bug.json suffix.
Feature-level diff (NB15):
| Metric | Before | After | Δ |
|---|---|---|---|
| Clicked records | 1,981 | 2,214 | +233 (+11.8 %) — clicks correctly re-attributed to their real positions |
| Click rate | 12.87 % | 14.38 % | +1.5 pp |
| Median gaze-cursor distance | 256.5 px | 354.7 px | +98 px |
| "Almost clicked" (<58 px, non-clicked) | 7.98 % | 5.57 % | −30 % |
| Position 3 close-distance rate | 11.49 % | 3.23 % | −72 % |
| Position 5 close-distance rate | 7.36 % | 0.28 % | −96 % |
| Position 9 close-distance rate | 0.45 % | 0.00 % | −100 % |
NB15 §2b's orient-phase observation is preserved at position 0 (27.8 % → 29.0 %, essentially flat — consistent with cursor parked near first result during orient). The deep-position approach signal at positions 3–9 was almost entirely scroll-bug artifact.
NB21 Key Claims — before / after:
| Claim | Before | After | Notes |
|---|---|---|---|
| K1 click rate | 12.9 % (1,981) | 14.4 % (2,214) | 233 re-attributed clicks |
| K3 M3 LOSO AUC | 0.827 ± 0.047 | 0.792 ± 0.062 | −0.035; direction preserved |
| K4 M4 (approach only) AUC | 0.821 ± 0.048 | 0.792 ± 0.061 | M3 = M4 to three sig figs — position+dwell add no information beyond approach features |
| K5 M2 (pos+dwell) AUC | 0.746 ± 0.069 | 0.707 ± 0.081 | −0.039 |
| K6 M1 (pos only) AUC | 0.592 ± 0.083 | 0.670 ± 0.085 | +0.078 — position now a stronger predictor with clicks correctly attributed |
| K12 Brier score | 0.1615 | 0.1781 | calibration slightly worse (consistent with dropped AUC) |
| K15 Evaluated-rejected (4-class) | 994 (6.5 %) | 344 (2.2 %) | largest shift — pre-fix "rejected" was mostly scroll noise at deep positions |
K21 position coefficient |
+0.21 (→ click) | −0.380 (→ skip) | SIGN FLIP — rank effect now in the correct direction |
K27 direction_changes |
+0.20 (→ click) | ≈0 (neutral) | feature was largely scroll artifact |
The −0.035 AUC drop is a real loss of predictive power — the pre-fix 0.827 was partly driven by scroll-leak features. K27 (direction_changes, pre-fix +0.20 → click) collapses to ≈0 post-fix, and the deep-position approach artifacts that populated the 4-class "Evaluated-rejected" set (994 → 344) were not informative in the first place.
Model-level results are preserved: M3 > M2 > M1 (0.792 > 0.707 > 0.670), M3 = M4 to three sig figs (approach features carry the full signal), and all 47 participants remain above chance (min 0.589). Feature-level coefficient signs are NOT all preserved: K21 (position) flipped +0.21 → −0.380 — the post-fix sign is the one the SERP rank-effect literature predicts. The pre-fix "11×" lift claim and the "14 % almost clicked" figure in docs/findings.md §10 are overstatements; the corrected taxonomy lives in NB21:K13–K16.
NB11.5 (chattiness) — replication updated:
| Claim | Before | After | Notes |
|---|---|---|---|
| K9 Low events/s tercile AUC | 0.826 ± 0.061 (n = 15) | 0.803 ± 0.052 (n = 15) | median events/s: 9.4 → 9.5 |
| K10 Mid tercile AUC | 0.817 ± 0.041 (n = 16) | 0.780 ± 0.065 (n = 16) | median events/s: 14.7 → 14.7 |
| K11 High tercile AUC | 0.838 ± 0.034 (n = 16) | 0.793 ± 0.064 (n = 16) | median events/s: 32.2 → 28.8 |
| K12 pooled replication of NB21 | 0.827 | 0.792 | tracks NB21:K3 exactly |
| K13–K16 chattiness × AUC Spearmans | +0.04 to +0.14, all ns | −0.11 to +0.00, all ns | direction shifted toward zero; no row crosses significance |
The "robust across chattiness terciles" framing holds at the significance level (K13–K16 are all still ns with p > 0.4) but the tercile AUCs themselves dropped 0.02–0.05 uniformly with the NB21 re-run. Paper §4.3 robustness claim needs both the new tercile values AND a narrower effect-size range if the prose described it as "flat."
Remaining notebooks and scripts patched:
NB01, 03 (×2 sites), 05, 06, 07b, 10, 12, 18-learning_curve, 24 — batch-patched via notebooks-v2/_apply_coord_fixes.py. None of these have Key Claims blocks yet, so re-execution has not been triggered; they will pick up the fix on next run. scripts/compute_ripa2.py and scripts/forward_regressive_tolerance_sweep.py also patched; their JSON outputs will be refreshed the next time they run.
NB23 (rank_effects) is a separate case: its local click_positions derivation (used for panel 1, click share by position) has been patched in place, but the notebook has not been re-executed. Panels 4–5 (butterworth LF/HF + LHIPA by click position) already consume the fixed butterworth-lfhf-by-position.json, so they reflect the post-fix click_pos from that feeder. NB23 does not yet have a Key Claims block even though it's the rank-effects hero chart cited in README and CHANGELOG v9 — promoting it to Tier A is tracked separately.
Still pending:
RegenerateDone (2026-04-11):ripa2outputcompute_ripa2.py -o AdSERP/data/ripa2-by-position.json(2,719 trials, ρ = −0.827 positional gradient confirmed. (NB18 re-execution deferred) it reads this JSON, will pick up new values on next run.)Re-execute NB23Done (2026-04-09): NB23 usesclick_to_position()fromdata_loader(coordinate-safe); all 9 code cells executed with correct output. K1 = ρ = −0.973 on 2,764 trials.Phase 3 structural migrationDone (2026-04-11): All dangerous coordinate patterns eliminated. NB00, NB04, NB19 were the last three with inlineassign_fixation_to_position(click_y, scroll_y, ...)orclick_page_y = cy + interpolate_scroll(...). Replaced withclick_to_position(clicks, tops, n_res). Zero dangerous patterns remain across all 30 notebooks (verified via regex scan).Already current (v11, 2026-04-10): §10 and §10b updated with post-fix values (14.4% click rate, N = 344, correct NB22 four-class Ns, [NB##:K##] refs throughout).docs/findings.mddocs/findings-approach-retreat.mdintentionally frozen with SUPERSEDED banner — it's a journey doc, not canonical.Done (2026-04-11):docs/drafts/grep passmodel-analysis.htmlgiven SUPERSEDED banner with before/after table.model-analysis.mdline 270 fixed (0.821→0.792).task-model-paper.mdline 179 fixed (994→344).paper.mdreferences to 0.821 are all Bruckner ACD (correct, different dataset). Remaining stale values in.htmlleft under the SUPERSEDED banner rather than surgically edited.Approach-retreat repoDone (2026-04-11): README fixed (NB24 arc ratios, 17× typo, discrimination cost values). CLAUDE.md added documenting upstream dependency. See approach-retreat commit63d861aand257cd79.
Reference data: pre-fix JSONs preserved for reproducibility:
AdSERP/data/butterworth-lfhf-by-position.prefix-bug.jsonAdSERP/data/cursor-approach-features.prefix-bug.jsonAdSERP/data/cursor-approach-features-typed.prefix-bug.json
Regression lock. notebooks-v2/test_coordinate_invariants.py (nine sections, passes in a few seconds) now encodes the gaze-is-screen-space, cursor-is-page-space convention as an executable contract. Any future change to data_loader.py, any Tier B producer script, or any Tier A notebook's data path must keep this test green. The corpus-wide Invariant 9 is the headline: all 2,764 clicks must fall within their reported band under the correct formula, and the buggy formula must still misplace 1,590 scrolled trials (so we know the test hasn't silently lost its reference comparison).
Trial-level LHIPA by click position is flat across positions 0–8 (range: 0.0385–0.0392, delta = 0.0008), then steps down at positions 9–10 (0.0376–0.0380). The previously reported ρ = −0.87 is driven almost entirely by the boundary step, not a gradual decline. Excluding positions 9–10: ρ = −0.78 but delta is within noise.
Correction: Prior claims that "LHIPA decreases monotonically with foraging depth" (README §Behavioral signals, findings.md, lit-review-scroll-regressions.md) overstated the position effect. LHIPA tracks the boundary decision cost (the same phenomenon as the ski-jump click distribution uptick) not a per-position scanning cost. Butterworth LF/HF (NB 14) remains the valid per-position cognitive load measure, and it shows framework compilation (steep drop 0–3, plateau after).
New notebook 23_rank_effects.ipynb consolidates all by-position effects:
- Click share, fixation count, dwell time, Butterworth LF/HF, LHIPA — all on shared x-axis
- Forward-pass vs regression dwell decomposition (stacked bar): regression share peaks at positions 2–3 (~30%), drops to ~10% at position 9
- Normalized dissociation plot: time and cognitive load both decline, but load drops faster (framework compilation)
- Publication-quality hero chart with IQR bands
New files: notebooks-v2/23_rank_effects.ipynb, assets/rank-effects-dissociation.png, assets/temporal-spectrum.png
Updated: README.md (temporal spectrum graphic, rank effects hero chart, LHIPA reframing), notebooks-v2/README.md (NB 23 entry)
Three systemic issues affecting how results were reported throughout the project:
1. Position-aggregate correlations reported as if trial-level. The three headline rhos (LHIPA ρ = −0.903, Butterworth ρ = −0.618, forward dwell ratio ρ = +0.82) are all computed on N = 9–11 position-level aggregates (means or medians), not individual trials. Citing "N = 2,719 trials" alongside a correlation computed on 11 points creates a false impression of statistical power. Trial-level correlations are much weaker (e.g., LHIPA ρ = −0.088). Every position-aggregate statistic now states the actual N of the aggregation.
2. Survivor bias in per-position analyses. Not all trials reach every position (pos 0: 2,742; pos 9: 640). Position means at later positions come from self-selected thorough scanners who scrolled the full page. This inflates apparent dwell at later positions and may bias Butterworth LF/HF medians. Added to methodological-threats.md. This also connects to the F-pattern: Nielsen's aggregate heatmap conflates compiled criteria (real), survey-phase concentration at top (real), and survivor selection (artifact).
3. Mean vs median LHIPA sensitivity. The LHIPA "gradient" by click position appears in means (right-skewed distribution pulls the mean up at early positions) but disappears in medians (flat 0–8). The gradient in the mean is partly a confound: high-LHIPA (low-load) trials tend to be easy trials where the user clicked early. The median is the robust estimator and reveals the boundary-step pattern.
Corrected notebooks: NB05 (LHIPA: figure title, summary, key measures table), NB06 (orientation/evaluation: "Working Memory Accumulation" → "Evaluation Effort by Position," removed WM ramp narrative, corrected LHIPA claims, "dwell" → "gaze dwell ratio").
Duchowski (2026, PACM CGIT) recommended Butterworth IIR over wavelet LHIPA for short-window cognitive load. Minimum windows: FFT 10s, DWT 7.5s, Butterworth 1s. Implemented per-position LF/HF ratio for all 2,719 trials.
The working memory hypothesis was wrong. LF/HF decreases with position (ρ = −0.618, p = 0.04). Cognitive load peaks at position 0, drops steeply through 0–3, plateaus through 4–10. This contradicts the §3a interpretation that forward-only dwell increase (ρ = +0.82) reflects growing working memory load.
Correction: The prior interpretation in §3a ("cognitive load increases with foraging depth because the candidate set in working memory grows") has been revised. The dissociation between increasing dwell time and decreasing cognitive effort indicates evaluation becomes routinized through framework compilation, not overloaded through working memory accumulation. The Shi et al. (2025) lit note connection claiming per-result LHIPA showed increasing load was also corrected — wavelet LHIPA at ~2s granularity was below Duchowski's stated 7.5s minimum, making that trend unreliable.
New files: scripts/compute_butterworth_lfhf.py, notebooks-v2/14_butterworth_cognitive_load.ipynb, docs/lit-notes/duchowski2026-realtime-pupil-lfhf.md, AdSERP/data/butterworth-lfhf-by-position.json
Updated: data_loader.py (added load_pupil_trial(), remove_blinks()), references.bib (added duchowski2026realtime), findings.md (§3b-iv, §3a correction), README.md (notebook 14, key insight)
build-gh-pages.js PNG screenshot loop crashed without error handling, producing only 2 of 10 thumbnails. Added try/catch.
(See git log for v7 changes — survey phase, ski-jump decomposition, forward/regression split, README rewrite, arxiv stub)
Sentence-level cosine similarity (mxbai-embed-large) between each result's snippet embedding and the centroid of all prior result embeddings. Null within-position — same as bag-of-words. The priming hypothesis is now tested at three granularities (bag-of-words, semantic embeddings, within-position controls) and null at all of them.
New findings section analyzing when non-serial SERP models add value. Acknowledges forced-choice inflation of regression rates, notes that at-scale regression prevalence (click_rank < max_scroll_depth) is unmeasured. Identifies three areas where complexity helps: position bias estimation, stop/regress/paginate decision, re-finding task metrics.
Page orientation (time from page load to first fixation on any result) is 194ms median across all groups — consistent with a well-memorized SERP layout. Previously reported as ~1-3s from a regression intercept (a different metric).
11 new bibtex entries. Literature review on scroll regressions identifies 5 novelty claims. Key finding from the review: nobody has published the at-scale prevalence of click_rank < max_scroll_depth despite every search engine having this data.
The Gazepoint GP3 HD reports gaze Y coordinates that exceed screen boundaries. 24.5% of fixations have FPOGY > screen_height (1024px); the 95th percentile is 1830px. These out-of-bounds samples were added to scroll offset to compute page-space Y, attributing fixations to SERP positions below the visible viewport.
Impact: Position 9 dwell ratios were inflated by 3-50x per trial (mean 2.9×, 89% of trials >1.0). The aggregate dwell ratio for position 9 was 1.25 — now corrected to 0.79.
Fix: Clamp FPOGY to [0, screen_height] before computing page_y = fy + scroll_offset in compute_fixation_per_result(). Applied in serp_priming.ipynb (Cells 13, 16) and fixation_coverage.ipynb (Cell 3).
Note for AdSERP users: If you are working with the AdSERP fixation data and mapping gaze coordinates to page-space positions, always clamp or filter FPOGY to screen bounds first. The eye tracker does not constrain gaze reports to the application window.
Other v5 changes:
- Forward-only shape test ρ strengthened from +0.73 to +0.82 (positions 0-8)
- Dwell table in README and findings updated with corrected values
Tests the viewport mechanics confound hypothesis: does ballistic backward scrolling explain the apparent "priming during regressions" pattern?
Results:
- Backward scroll velocity > forward: median 915 vs 784 px/s, peak 1852 vs 1111 px/s
- Velocity profile is ballistic: ρ = 0.867 between distance-from-target and velocity
- 87.3% of regression targets are positions 0-4 (median: position 2)
- Regression velocity mediates the dwell delta: ρ = -0.762 (p = 0.017) across positions
Positions 6-8 are ballistic transit zones (high velocity, short viewport, suppressed fixations). The "priming during regressions" pattern is a viewport mechanics artifact.
Corrected language in README.md, TODO.md, findings.md, and adserp-key-claims.md that framed the regression-trial overlap correlation (r = -0.033) as evidence that "priming operates in re-evaluation." The signal is triply confounded:
- Position-overlap covariation — within-position controls null (v3)
- Repetition/recognition — revisiting already-read content produces shorter dwell (v4)
- Ballistic scroll kinematics — high-velocity transit biases viewport time and fixation count (v5)
The prior compute_viewport_time only counted time between scroll events. Pre-scroll periods (page load → first scroll) and post-scroll periods were dropped. Position 0 dwell ratios were >1.0 (up to 73×). Fixed by covering the full trial window. Position 0 dwell ratio corrected from 1.35 → 0.28.
Isolating forward-scanning periods, gaze dwell ratio increases with position (ρ = +0.73), opposite the priming prediction. The aggregate priming correlation was entirely driven by regression artifacts.
Forward-only p(fixate) is ~99.8% at every position. Users fixate virtually everything during first-pass scanning. No skip decision for overlap to predict.
"Eval rate" / "attention density" → "gaze dwell ratio" (fixation duration / visible duration).
Testing high-overlap vs low-overlap at the same rank: null across all metrics (TFT, TFC, mean fixation duration, viewport time). The aggregate priming correlation (r = -0.054) was driven by the position-overlap confound.
Aggregate effect concentrated in regression trials (r = -0.033), null in first-pass (r = -0.002). Initially reframed as "priming facilitates re-evaluation" — later shown to be confounded (v3-v5).
- Lexical overlap builds rapidly down the SERP (62% by position 9)
- Aggregate priming correlation: partial r = -0.054 (p = 2.4×10⁻⁹)
- 69% scroll regression prevalence, mean 2.8 per trial
- Mouse-gaze convergence depends on click intent
- Viewport state predicts clicks better than distance (AUC 0.704 vs 0.548)
- Per-participant variance large (acquisition onset SD = 2.5s)