Skip to content

Metadata-level wiring tests against OpenNeuro BIDS index #318

@nx10

Description

@nx10

Motivation

The Stage 6 longitudinal PR (#317) exposed three wiring bugs that all passed unit tests but failed in the full pipeline (15-30 min CI round trip each):

  1. Anat groupby filtering out mask rows from the resolve DataFrame
  2. Metrics orchestration querying a reg column that doesn't exist in the BIDS table (extra entities live in extra_entities)
  3. Session filter propagating to template discovery, hiding the multi-session view needed for longitudinal templates

These are all plumbing bugs between layers (CLI -> orchestration -> BIDS resolve -> DataFrame queries), not logic bugs. Unit tests mock too much to catch them; full_pipeline tests take 20+ minutes and only exercise one dataset.

Proposal

Build a lightweight "wiring" test tier that runs orchestration logic against real BIDS metadata from OpenNeuro without downloading any imaging data or running containers.

How it works

  1. Pre-compute a bids2table index for OpenNeuro datasets (just file paths + sidecar metadata, no volumes). Cache as Parquet -- probably a few GB total for the full catalog.
  2. For each dataset, run through the orchestration wiring with stubbed workflows:
    • load_table / Filters.apply() / iter_sessions_with_template
    • resolve_* functions (do the Bids.expect() / Bids.find() queries succeed?)
    • export_* naming (dry-run Bids.save that validates the output path without copying)
    • discover_template_inputs / discover_derivative_runs groupby logic
    • Workflow functions return fake NamedTuples with placeholder paths
  3. Collect failures as a compatibility matrix: which datasets break which resolve/export paths, and why.

What it catches

  • Entity combinations nobody anticipated (multi-echo + multi-run + session gaps)
  • Single-session subjects mixed into multi-session datasets
  • Missing sidecars, unusual suffix/desc combos, non-standard naming
  • groupby logic failing on unexpected null patterns
  • Bids.expect() queries that work on ds000001/ds000114 but fail on the long tail
  • Filter propagation bugs (session/task filters reaching stages that need the full view)

Scope

  • Runs in seconds per dataset (no containers, no NIfTI I/O)
  • Could cover thousands of datasets in a single CI job
  • Cross-sectional and longitudinal wiring paths
  • Separate from the existing test pyramid (unit/integration/full_pipeline)

Open questions

  • Where to host the pre-computed index (artifact in the repo? separate cache?)
  • How to handle datasets that are intentionally unsupported (e.g., DWI-only, PET)
  • Whether to gate PRs on this or run it as a nightly/weekly audit

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions