Skip to content

sapporo-wes/pa-cwl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pa-cwl: Pretty Agentic CWL

DOI

Production-ready Common Workflow Language workflows designed for AI-agent execution via the GA4GH Workflow Execution Service (WES) API.

What is this?

pa-cwl provides a curated collection of CWL v1.2 workflows for scientific data analysis. Each workflow ships with:

  • agent.yaml — Machine-readable instructions for AI agents: what the workflow does, what inputs it needs, how to run it
  • CWL v1.2 workflows — Portable, standards-compliant workflow definitions
  • WES-ready execution — Tested with sapporo-wes and validated via yevis-cli
  • Workflow Run RO-Crate — Provenance records for every validated execution

Workflows

All 16 pipelines implemented and tested. Functional specifications derived from nf-core pipelines, rewritten as idiomatic CWL v1.2 (not transpiled).

Data Retrieval

Workflow Description Key Tools
fetchngs Fetch FASTQ from public repositories (SRA/ENA/DDBJ) ENA API, fasterq-dump

Transcriptomics

Workflow Description Key Tools
rnaseq RNA-seq quantification (4 pathways) STAR, HISAT2, Salmon, RSEM, kallisto
scrnaseq Single-cell RNA-seq (10x, Drop-seq, Smart-seq2) STARsolo, Alevin-Fry, Kallisto/BUStools
rnafusion Gene fusion detection STAR-Fusion, Arriba, FusionCatcher, FusionInspector

Epigenomics

Workflow Description Key Tools
chipseq ChIP-seq peak calling BWA-MEM2, MACS2, deepTools
atacseq ATAC-seq chromatin accessibility BWA-MEM2, MACS2 (--nomodel)
methylseq Bisulfite-seq methylation (RRBS) Bismark, bwa-meth
cutandrun CUT&RUN/CUT&TAG peak calling Bowtie2, MACS2, SEACR, spike-in normalization

Variant Calling

Workflow Description Key Tools
sarek Germline + somatic variant calling BWA-MEM2, GATK4 (HC, Mutect2, BQSR), VEP
raredisease Rare disease variant annotation sarek + VEP, DeepVariant, Manta, GENMOD
viralrecon Viral variant calling and consensus BWA-MEM2, iVar, bcftools, Pangolin, Nextclade

Metagenomics

Workflow Description Key Tools
ampliseq 16S/ITS amplicon sequencing (PE+SE) Cutadapt, DADA2, QIIME2
mag Metagenome-assembled genomes SPAdes, MetaBAT2, MaxBin2, DAS Tool, BUSCO, GTDB-Tk
taxprofiler Taxonomic profiling Kraken2, Bracken, MetaPhlAn, Centrifuge, Krona

Long-Read & 3D Genomics

Workflow Description Key Tools
nanoseq Nanopore long-read sequencing minimap2, NanoPlot, medaka, Sniffles2, StringTie2
hic Hi-C chromatin conformation Bowtie2, pairtools, cooler, HiCExplorer, cooltools

124 CWL tools in tools/, shared across pipelines. See pipeline roadmap for detailed feature tables and test matrices.

For AI Agents

Start with AGENTS.md — the top-level guide for AI agents. It provides the workflow catalog, WES API essentials, and provenance protocol.

Each workflow's agent.yaml contains the detailed execution plan, input schema with resolution strategies, and resource requirements.

For Humans

# Run locally with cwltool
cwltool workflows/rnaseq/main.cwl workflows/rnaseq/examples/star-salmon.yaml

# Run via sapporo-wes
# See docs/running-with-wes.md

Roadmap

  • Phase 1 — 16 core pipelines (fetchngs through hic) — Complete
  • Phase 2 — v1.1 enhancements — Complete (44 features across 12 pipelines)
  • Phase 2.5 — v2.0 sarek somatic calling (Mutect2) + VEP annotation — Complete
  • Phase 3 — Agent guide (AGENTS.md) — Complete

License

Apache-2.0

About

Pretty Agentic CWL — Production-ready CWL v1.2 workflows for AI-agent execution via WES

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors