Skip to content

Run Selector accession-list file as input to ww-sra-cellranger #362

Description

@tefirman

Pipeline:
Enhancement to the existing ww-sra-cellranger pipeline.

Link to references:

Context:
@hvenkat94 asked that ww-sra-cellranger accept a project ID (e.g. SRP/PRJNA) and auto-expand it into the per-sample SRA run accessions.

Instead of auto-expanding, this proposes letting the user supply a plain-text accession list (one run accession per line) as an alternative to the inline sra_id_list array. This is exactly what the "Accession List" button in SRA Run Selector produces, so it drops in with no reformatting.

Proposed input (mirroring the ww-rnaseq precedent):

File? sra_id_file
Array[String]? sra_id_list

Array[String] sra_ids = if defined(sra_id_file)
  then read_lines(select_first([sra_id_file]))
  else select_first([sra_id_list])

Then scatter over sra_ids instead of sra_id_list.

Why this approach (and not project-ID auto-expansion)?

  1. A CLI path exists but adds a dependency. esearch -db sra -query <SRP> | efetch -format runinfo (NCBI EDirect) works, but EDirect isn't in the getwilds/sra-tools image and runinfo often returns partial/empty results with no error code, silently dropping samples.
  2. The metadata doesn't always encode "single cell." Multi-assay studies mix scRNA, multiome RNA+ATAC, bulk RNA, and WXS. In SRP306983 (~70 runs), library_strategy is RNA-Seq for both single-cell and bulk; the only single-cell signal is a submitter-specific library_name convention (_scRNA, _multiome_RNA vs PT#/TS#). No heuristic picks the right subset across studies.
  3. Curation is a human judgment call. Run Selector is where that selection happens, and it exports the result as a one-column text file. Keeping the human in that loop beats any heuristic in ww-sra.

Additional context:

  • This complements skip_on_chemistry_failure from Enabling "graceful fail" for CellRanger run_count tasks #357: since no curation guarantees a clean single-cell-only set, graceful fail stays as the net for a mis-picked accession (lands it on skipped_sample_list instead of failing the scatter).
  • PR scope: ww-sra-cellranger.wdl (add sra_id_file, resolve sra_ids, update scatter + summarize_chemistry_status), parameter_meta, inputs.json, README.md. testrun.wdl stays zero-config on an inline list.
  • Explicit non-goal: we are not adding a project-ID-to-run-list task to ww-sra.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions