You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Context: @hvenkat94 asked that ww-sra-cellranger accept a project ID (e.g. SRP/PRJNA) and auto-expand it into the per-sample SRA run accessions.
Instead of auto-expanding, this proposes letting the user supply a plain-text accession list (one run accession per line) as an alternative to the inline sra_id_list array. This is exactly what the "Accession List" button in SRA Run Selector produces, so it drops in with no reformatting.
Proposed input (mirroring the ww-rnaseq precedent):
Why this approach (and not project-ID auto-expansion)?
A CLI path exists but adds a dependency.esearch -db sra -query <SRP> | efetch -format runinfo (NCBI EDirect) works, but EDirect isn't in the getwilds/sra-tools image and runinfo often returns partial/empty results with no error code, silently dropping samples.
The metadata doesn't always encode "single cell." Multi-assay studies mix scRNA, multiome RNA+ATAC, bulk RNA, and WXS. In SRP306983 (~70 runs), library_strategy is RNA-Seq for both single-cell and bulk; the only single-cell signal is a submitter-specific library_name convention (_scRNA, _multiome_RNA vs PT#/TS#). No heuristic picks the right subset across studies.
Curation is a human judgment call. Run Selector is where that selection happens, and it exports the result as a one-column text file. Keeping the human in that loop beats any heuristic in ww-sra.
Additional context:
This complements skip_on_chemistry_failure from Enabling "graceful fail" for CellRanger run_count tasks #357: since no curation guarantees a clean single-cell-only set, graceful fail stays as the net for a mis-picked accession (lands it on skipped_sample_list instead of failing the scatter).
Pipeline:
Enhancement to the existing
ww-sra-cellrangerpipeline.Link to references:
ww-rnaseqsamples_tsvContext:
@hvenkat94 asked that
ww-sra-cellrangeraccept a project ID (e.g.SRP/PRJNA) and auto-expand it into the per-sample SRA run accessions.Instead of auto-expanding, this proposes letting the user supply a plain-text accession list (one run accession per line) as an alternative to the inline
sra_id_listarray. This is exactly what the "Accession List" button in SRA Run Selector produces, so it drops in with no reformatting.Proposed input (mirroring the
ww-rnaseqprecedent):Then scatter over
sra_idsinstead ofsra_id_list.Why this approach (and not project-ID auto-expansion)?
esearch -db sra -query <SRP> | efetch -format runinfo(NCBI EDirect) works, but EDirect isn't in thegetwilds/sra-toolsimage andruninfooften returns partial/empty results with no error code, silently dropping samples.library_strategyisRNA-Seqfor both single-cell and bulk; the only single-cell signal is a submitter-specificlibrary_nameconvention (_scRNA,_multiome_RNAvsPT#/TS#). No heuristic picks the right subset across studies.ww-sra.Additional context:
skip_on_chemistry_failurefrom Enabling "graceful fail" for CellRangerrun_counttasks #357: since no curation guarantees a clean single-cell-only set, graceful fail stays as the net for a mis-picked accession (lands it onskipped_sample_listinstead of failing the scatter).ww-sra-cellranger.wdl(addsra_id_file, resolvesra_ids, update scatter +summarize_chemistry_status),parameter_meta,inputs.json,README.md.testrun.wdlstays zero-config on an inline list.ww-sra.