Motivation
The Stage 6 longitudinal PR (#317) exposed three wiring bugs that all passed unit tests but failed in the full pipeline (15-30 min CI round trip each):
- Anat groupby filtering out mask rows from the resolve DataFrame
- Metrics orchestration querying a
reg column that doesn't exist in the BIDS table (extra entities live in extra_entities)
- Session filter propagating to template discovery, hiding the multi-session view needed for longitudinal templates
These are all plumbing bugs between layers (CLI -> orchestration -> BIDS resolve -> DataFrame queries), not logic bugs. Unit tests mock too much to catch them; full_pipeline tests take 20+ minutes and only exercise one dataset.
Proposal
Build a lightweight "wiring" test tier that runs orchestration logic against real BIDS metadata from OpenNeuro without downloading any imaging data or running containers.
How it works
- Pre-compute a bids2table index for OpenNeuro datasets (just file paths + sidecar metadata, no volumes). Cache as Parquet -- probably a few GB total for the full catalog.
- For each dataset, run through the orchestration wiring with stubbed workflows:
load_table / Filters.apply() / iter_sessions_with_template
resolve_* functions (do the Bids.expect() / Bids.find() queries succeed?)
export_* naming (dry-run Bids.save that validates the output path without copying)
discover_template_inputs / discover_derivative_runs groupby logic
- Workflow functions return fake
NamedTuples with placeholder paths
- Collect failures as a compatibility matrix: which datasets break which resolve/export paths, and why.
What it catches
- Entity combinations nobody anticipated (multi-echo + multi-run + session gaps)
- Single-session subjects mixed into multi-session datasets
- Missing sidecars, unusual suffix/desc combos, non-standard naming
groupby logic failing on unexpected null patterns
Bids.expect() queries that work on ds000001/ds000114 but fail on the long tail
- Filter propagation bugs (session/task filters reaching stages that need the full view)
Scope
- Runs in seconds per dataset (no containers, no NIfTI I/O)
- Could cover thousands of datasets in a single CI job
- Cross-sectional and longitudinal wiring paths
- Separate from the existing test pyramid (unit/integration/full_pipeline)
Open questions
- Where to host the pre-computed index (artifact in the repo? separate cache?)
- How to handle datasets that are intentionally unsupported (e.g., DWI-only, PET)
- Whether to gate PRs on this or run it as a nightly/weekly audit
Motivation
The Stage 6 longitudinal PR (#317) exposed three wiring bugs that all passed unit tests but failed in the full pipeline (15-30 min CI round trip each):
regcolumn that doesn't exist in the BIDS table (extra entities live inextra_entities)These are all plumbing bugs between layers (CLI -> orchestration -> BIDS resolve -> DataFrame queries), not logic bugs. Unit tests mock too much to catch them; full_pipeline tests take 20+ minutes and only exercise one dataset.
Proposal
Build a lightweight "wiring" test tier that runs orchestration logic against real BIDS metadata from OpenNeuro without downloading any imaging data or running containers.
How it works
load_table/Filters.apply()/iter_sessions_with_templateresolve_*functions (do theBids.expect()/Bids.find()queries succeed?)export_*naming (dry-runBids.savethat validates the output path without copying)discover_template_inputs/discover_derivative_runsgroupby logicNamedTuples with placeholder pathsWhat it catches
groupbylogic failing on unexpected null patternsBids.expect()queries that work on ds000001/ds000114 but fail on the long tailScope
Open questions