|
| 1 | +# PEPPRO Test Suite |
| 2 | + |
| 3 | +This directory contains the PEPPRO test suite, organized into two tiers: |
| 4 | + |
| 5 | +- **Unit tests** — fast, no genome data or external bioinformatics tools required; run on every push/PR via GitHub Actions |
| 6 | +- **Integration tests** — full pipeline runs; require a self-hosted runner with genome indices and all tools installed |
| 7 | + |
| 8 | +--- |
| 9 | + |
| 10 | +## Directory Structure |
| 11 | + |
| 12 | +``` |
| 13 | +tests/ |
| 14 | +├── data/ # Small test FASTQ files (~3 MB total) |
| 15 | +│ ├── test_R1.fastq.gz # SE reads (12,500 reads) |
| 16 | +│ ├── test_R2.fastq.gz # PE reverse reads (rev-comp of R1) |
| 17 | +│ └── test_R1_umi.fastq.gz # R1 with 8-nt UMI prefix for UMI tests |
| 18 | +├── pep_configs/ # PEP project configs for each scenario |
| 19 | +│ ├── se_basic.yaml / .csv |
| 20 | +│ ├── pe_basic.yaml / .csv |
| 21 | +│ └── ... |
| 22 | +├── looper_configs/ # Looper run configs for each scenario |
| 23 | +│ ├── .looper_se_basic.yaml |
| 24 | +│ └── ... |
| 25 | +├── scripts/ |
| 26 | +│ └── generate_test_data.sh # Regenerate test FASTQ data from source |
| 27 | +├── test_unit.py # Unit tests (no tools/genome needed) |
| 28 | +├── test_integration.py # Integration tests (full pipeline runs) |
| 29 | +└── README.md # This file |
| 30 | +``` |
| 31 | + |
| 32 | +--- |
| 33 | + |
| 34 | +## Unit Tests |
| 35 | + |
| 36 | +Unit tests cover: |
| 37 | + |
| 38 | +- **Constants**: `RUNON_SOURCE`, `ADAPTER_REMOVERS`, `TRIMMERS`, `DEDUPLICATORS` values and defaults |
| 39 | +- **PEP loading**: Each test config loads correctly with expected sample attributes |
| 40 | +- **Schema validation**: eido validation passes for valid configs; regression tests ensure invalid inputs (e.g., integer `umi_len` in YAML `imply`, invalid `protocol`/`adapter`/`trimmer`/`dedup` enum values) fail correctly |
| 41 | +- **Argument parsing**: All CLI flags parse correctly, defaults are correct, invalid choices raise `SystemExit` |
| 42 | +- **Recovery paths**: Expected output file naming conventions are documented and verified |
| 43 | + |
| 44 | +### Running unit tests |
| 45 | + |
| 46 | +```bash |
| 47 | +# Via pytest directly |
| 48 | +pytest tests/test_unit.py -v |
| 49 | + |
| 50 | +# Via Makefile |
| 51 | +make test-unit |
| 52 | +``` |
| 53 | + |
| 54 | +No environment variables or external tools are needed. |
| 55 | + |
| 56 | +--- |
| 57 | + |
| 58 | +## Integration Tests |
| 59 | + |
| 60 | +Integration tests run the full PEPPRO pipeline for each scenario and verify: |
| 61 | + |
| 62 | +1. Pipeline exits with status `0` |
| 63 | +2. Key output files exist (BAM, bigWig, stats.yaml) |
| 64 | +3. `stats.yaml` contains the expected result keys |
| 65 | +4. The `TestRecovery` class additionally tests checkpoint skipping and the `unmap_R1.fq` recovery regression |
| 66 | + |
| 67 | +### Prerequisites |
| 68 | + |
| 69 | +The integration tests require a machine with all PEPPRO dependencies installed and genome assets configured via refgenie: |
| 70 | + |
| 71 | +| Tool | Version tested | |
| 72 | +|------|---------------| |
| 73 | +| bowtie2 | ≥2.4 | |
| 74 | +| samtools | ≥1.13 | |
| 75 | +| bedtools | ≥2.30 | |
| 76 | +| cutadapt | ≥4.0 | |
| 77 | +| fastp | ≥0.23 | |
| 78 | +| seqtk | ≥1.3 | |
| 79 | +| fastx_toolkit | any | |
| 80 | +| seqkit | ≥2.0 | |
| 81 | +| fqdedup | any | |
| 82 | +| fastq_pair | any | |
| 83 | +| wigToBigWig | UCSC | |
| 84 | +| bedGraphToBigWig | UCSC | |
| 85 | + |
| 86 | +**Genome assets** (via refgenie, pointed to by `$REFGENIE`): |
| 87 | + |
| 88 | +- `hg38/bowtie2_index` |
| 89 | +- `human_rDNA/bowtie2_index` |
| 90 | +- `hg38/fasta` (for chromosome sizes) |
| 91 | +- `hg38/blacklist` (optional, for coverage tests) |
| 92 | + |
| 93 | +### Running integration tests |
| 94 | + |
| 95 | +Tests run with `-p local` (divvy local compute package) so the pipeline |
| 96 | +executes inline on the current node rather than being submitted to a job |
| 97 | +scheduler. Run integration tests from a compute node or interactive session |
| 98 | +if your cluster policy prohibits CPU-intensive work on login nodes. |
| 99 | + |
| 100 | +```bash |
| 101 | +# Enable integration tests |
| 102 | +export RUN_INTEGRATION_TESTS=true |
| 103 | + |
| 104 | +# Run a specific scenario |
| 105 | +pytest tests/test_integration.py -v -k se_basic |
| 106 | + |
| 107 | +# Via Makefile targets |
| 108 | +make test-se # All SE scenarios |
| 109 | +make test-pe # All PE scenarios |
| 110 | +make test-recovery # Recovery regression tests |
| 111 | +make test-integration # All integration tests |
| 112 | +make test-all # Unit + integration |
| 113 | + |
| 114 | +# Run a single named scenario |
| 115 | +make test-scenario SCENARIO=se_fastp |
| 116 | + |
| 117 | +# Keep output directories for debugging (default: cleaned up after each class) |
| 118 | +KEEP_TEST_OUTPUTS=true RUN_INTEGRATION_TESTS=true pytest tests/test_integration.py -v -k se_basic |
| 119 | +``` |
| 120 | + |
| 121 | +--- |
| 122 | + |
| 123 | +## Test Scenarios |
| 124 | + |
| 125 | +| Scenario | Read type | Protocol | Adapter | Trimmer | Dedup | Notes | |
| 126 | +|----------|-----------|----------|---------|---------|-------|-------| |
| 127 | +| `se_basic` | SE | PRO-seq | cutadapt | seqtk | — | Baseline SE run | |
| 128 | +| `pe_basic` | PE | PRO-seq | cutadapt | seqtk | — | Baseline PE run | |
| 129 | +| `se_groseq` | SE | GRO-seq | cutadapt | seqtk | — | GRO-seq protocol | |
| 130 | +| `se_umi` | SE | PRO-seq | cutadapt | seqtk | seqkit | 8-nt UMI deduplication | |
| 131 | +| `pe_umi` | PE | PRO-seq | cutadapt | seqtk | seqkit | PE with UMI dedup | |
| 132 | +| `se_fastp` | SE | PRO-seq | fastp | seqtk | — | fastp adapter trimming | |
| 133 | +| `se_fastx` | SE | PRO-seq | cutadapt | fastx | — | fastx_trimmer | |
| 134 | +| `se_fqdedup` | SE | PRO-seq | cutadapt | seqtk | fqdedup | fqdedup UMI dedup | |
| 135 | +| `se_scale` | SE | PRO-seq | cutadapt | seqtk | — | `--scale` flag | |
| 136 | +| `se_no_complexity` | SE | PRO-seq | cutadapt | seqtk | — | `--no-complexity` flag | |
| 137 | +| `se_nofifo` | SE | PRO-seq | cutadapt | seqtk | — | `--no-fifo` flag | |
| 138 | +| `se_coverage` | SE | PRO-seq | cutadapt | seqtk | — | `--coverage` flag | |
| 139 | + |
| 140 | +--- |
| 141 | + |
| 142 | +## Test Data |
| 143 | + |
| 144 | +The files in `tests/data/` are derived from `examples/data/test_r1.fq.gz` (the existing pipeline example read file). They are small enough to commit to the repository (~1 MB each). |
| 145 | + |
| 146 | +To regenerate the test data files (requires `seqtk`): |
| 147 | + |
| 148 | +```bash |
| 149 | +make test-data |
| 150 | +# or |
| 151 | +bash tests/scripts/generate_test_data.sh |
| 152 | +``` |
| 153 | + |
| 154 | +--- |
| 155 | + |
| 156 | +## GitHub Actions |
| 157 | + |
| 158 | +Unit tests run automatically on every push and pull request targeting `master` or `dev`, across Python 3.9, 3.11, and 3.12. |
| 159 | + |
| 160 | +Integration tests are triggered manually via **workflow_dispatch** on a self-hosted runner: |
| 161 | + |
| 162 | +1. Go to **Actions** → **Tests** → **Run workflow** |
| 163 | +2. Set "Run integration tests" to `true` |
| 164 | +3. Click **Run workflow** |
| 165 | + |
| 166 | +See `.github/workflows/tests.yml` for the full configuration. |
0 commit comments