diff --git a/.agents/skills/container-vulns/SKILL.md b/.agents/skills/container-vulns/SKILL.md
new file mode 100644
index 000000000..8664b9807
--- /dev/null
+++ b/.agents/skills/container-vulns/SKILL.md
@@ -0,0 +1,69 @@
+# Container Vulnerability Management
+
+Guidance for scanning, triaging, and mitigating container image vulnerabilities
+in the viral-ngs Docker image hierarchy.
+
+## Scanning
+
+Container images are scanned for vulnerabilities using [Trivy](https://aquasecurity.github.io/trivy/):
+
+- **On every PR/push**: `docker.yml` scans each image flavor after build (SARIF -> GitHub Security tab, JSON -> artifact)
+- **Weekly schedule**: `container-scan.yml` scans the latest published images
+- Scans filter to **CRITICAL/HIGH** severity, **ignore-unfixed**, and apply a Rego policy (`.trivy-ignore-policy.rego`)
+- Per-CVE exceptions go in `.trivyignore` with mandatory justification comments
+
+## Rego Policy (`.trivy-ignore-policy.rego`)
+
+The Rego policy filters CVEs that are architecturally inapplicable to ephemeral batch containers:
+
+- **AV:P** (Physical access required) -- containers are cloud-hosted
+- **AV:A** (Adjacent network required) -- no attacker on same network segment
+- **AV:L + UI:R** (Local + user interaction) -- no interactive sessions
+- **AV:L + PR:H** (Local + high privileges) -- containers run non-root
+- **AV:L + S:U** (Local + scope unchanged) -- attacker already has code execution and impact stays within the ephemeral container
+
+Changes to this policy should be reviewed carefully. The comments in the file explain the rationale and risk for each rule.
+
+## Common Vulnerability Sources
+
+**Python transitive deps**: Pin minimum versions in `docker/requirements/*.txt`. Prefer conda packages over pip. Check conda-forge availability before assuming a version exists -- conda-forge often lags PyPI by days/weeks.
+
+**Java fat JARs** (picard, gatk, snpeff, fgbio): Bioinformatics Java tools are distributed as uber JARs with all dependencies bundled inside. Trivy detects vulnerable libraries (log4j, commons-compress, etc.) baked into these JARs. Version bumps can cause ARM64 conda solver conflicts because Java tools pull in openjdk -> harfbuzz -> icu version chains that clash with other packages (r-base, boost-cpp, pyicu). Always check:
+1. Whether the tool is actually flagged by Trivy (don't bump versions unnecessarily)
+2. Whether the CVE applies (e.g., log4j 1.x is NOT vulnerable to Log4Shell)
+3. Whether the desired version resolves on ARM64 before pushing
+
+**Go binaries**: Some conda packages bundle compiled Go binaries (e.g., mafft's `dash_client`, google-cloud-sdk's `gcloud-crc32c`). If the binary is unused, delete it in the Dockerfile. Delete from **both** the installed location and `/opt/conda/pkgs/*/` (conda package cache) -- Trivy scans the full filesystem.
+
+**Vendored copies**: Packages like google-cloud-sdk and setuptools bundle their own copies of Python libraries that may be older than what's in the conda environment. Trivy flags these vendored copies separately. Options: delete the vendored directory (if not needed at runtime), or accept the risk in `.trivyignore` with justification.
+
+## ARM64 Solver Conflicts
+
+The conda solver on ARM64 (linux-aarch64) is more constrained than amd64 because fewer package builds exist. Common conflict patterns:
+
+- **icu version conflicts**: Many packages (openjdk, r-base, boost-cpp, pyicu) pin specific icu version ranges. Bumping one package can make the entire environment unsolvable.
+- **libdeflate/htslib conflicts**: lofreq 2.1.5 pins old htslib/libdeflate versions that conflict with newer pillow/libtiff.
+- **openjdk version escalation**: snpeff 5.2+ requires openjdk>=11, 5.3+ requires openjdk>=21. Higher openjdk versions pull in harfbuzz->icu chains that conflict with everything.
+
+When a solver conflict occurs: revert the change, check what version the solver was picking before, and pin to that exact version if it already addresses the CVE.
+
+## Mitigation Decision Process
+
+When triaging a CVE:
+
+1. **Check the CVSS vector** -- does the Rego policy already filter it?
+2. **Identify the source package** -- use Trivy JSON output (`PkgName`, `PkgPath`, `InstalledVersion`)
+3. **Check if a fix version exists on conda-forge/bioconda** -- not just on PyPI
+4. **Test on ARM64** -- solver conflicts are the most common failure mode
+5. **If the fix version conflicts**: consider whether the CVE is exploitable in your deployment model. Document the risk assessment in `.trivyignore` or `vulnerability-mitigation-status.md`.
+6. **If the vulnerable code is unused**: delete the binary/file inline in the Dockerfile (same RUN layer as install to avoid bloating images)
+
+## Key Files
+
+| File | Purpose |
+|------|---------|
+| `.trivy-ignore-policy.rego` | Rego policy for class-level CVE filtering |
+| `.trivyignore` | Per-CVE exceptions with justifications |
+| `.github/workflows/docker.yml` | Build-time scanning (SARIF + JSON) |
+| `.github/workflows/container-scan.yml` | Weekly scheduled scanning |
+| `vulnerability-mitigation-status.md` | Local-only tracking doc (not committed) |
diff --git a/.agents/skills/dsub-batch-jobs/SKILL.md b/.agents/skills/dsub-batch-jobs/SKILL.md
new file mode 100644
index 000000000..211bd45bf
--- /dev/null
+++ b/.agents/skills/dsub-batch-jobs/SKILL.md
@@ -0,0 +1,126 @@
+# Running Batch Jobs on GCP via dsub
+
+Use dsub to run one-off compute jobs on Google Cloud when your analysis requires
+more compute/memory than the local environment, or needs specific Docker images
+that are impractical to run locally.
+
+## When to Use
+
+- Analysis tools need >8GB RAM (e.g., VADR, BLAST, assembly)
+- Need to run many independent jobs in parallel (batch processing)
+- Need a specific Docker image with pre-installed tools
+- Data lives in GCS and is most efficiently processed in-cloud
+
+## Prerequisites
+
+- **dsub** installed (ask the user where their dsub installation or venv is located)
+- **gcloud CLI** authenticated with a GCP project that has Batch API enabled
+- **GCS bucket** accessible by the project's default service account
+
+## Generic Invocation
+
+```bash
+dsub --provider google-cls-v2 \
+  --project <gcp-project> \
+  --regions <region> \
+  --image <docker-image> \
+  --machine-type <machine-type> \
+  --script <script.sh> \
+  --tasks <tasks.tsv> \
+  --logging gs://<bucket>/logs/<job-name>/
+```
+
+### Key Parameters
+
+| Parameter | Description |
+|-----------|-------------|
+| `--provider google-cls-v2` | Use Google Cloud Batch (recommended over Life Sciences) |
+| `--project` | GCP project with Batch API enabled |
+| `--regions` | Compute region (e.g., `us-central1`) |
+| `--image` | Docker image to run (e.g., `staphb/vadr:1.6.4`) |
+| `--machine-type` | VM type (e.g., `n1-highmem-4` for 26GB RAM) |
+| `--script` | Local shell script to execute inside the container |
+| `--tasks` | TSV file defining one row per job (batch mode) |
+| `--logging` | GCS path for stdout/stderr logs |
+
+## Task TSV Format
+
+The tasks TSV defines inputs, outputs, and environment variables for each job.
+Header row uses column prefixes to declare types:
+
+```
+--env VAR1	--env VAR2	--input FASTA	--output RESULT	--output LOG
+value1	value2	gs://bucket/input.fasta	gs://bucket/output.txt	gs://bucket/log.txt
+```
+
+- `--env NAME` -- environment variable passed to the script
+- `--input NAME` -- GCS file downloaded to a local path; the env var is set to the local path
+- `--output NAME` -- local path; after the script finishes, the file is uploaded to GCS
+
+Each non-header row is one job. All jobs run independently in parallel.
+
+## GCP Project and Bucket Scoping
+
+The service account running dsub jobs must have read/write access to all GCS paths
+referenced in the tasks TSV. The simplest approach:
+
+1. Use a GCP project whose default service account already has access to your data
+2. Use a bucket within that same project for staging intermediate/output files
+3. For ephemeral results, use a temp bucket with a lifecycle policy (e.g., 30-day auto-delete)
+
+### Broad Viral Genomics Defaults
+
+Most developers on the viral-ngs codebase use:
+- **GCP project**: `gcid-viral-seq`
+- **Staging bucket**: `gs://viral-temp-30d` (30-day auto-delete lifecycle)
+
+These are not universal -- always confirm with the user before using them.
+
+## Monitoring Jobs
+
+```bash
+# Check job status
+dsub --provider google-cls-v2 --project <project> --jobs <job-id> --status
+
+# Or use dstat
+dstat --provider google-cls-v2 --project <project> --jobs <job-id> --status '*'
+
+# View logs
+gcloud storage cat gs://<bucket>/logs/<job-name>/<task-id>.log
+```
+
+## Tips
+
+- **Batch over single jobs**: Always prefer `--tasks` with a TSV over individual
+  `dsub` invocations. One TSV row per job is cleaner and easier to track.
+- **Machine sizing**: Check your tool's memory requirements. VADR needs ~16GB;
+  use `n1-highmem-4` (26GB). Most tools work fine with `n1-standard-4` (15GB).
+- **Script portability**: Write the `--script` to be self-contained. It receives
+  inputs/outputs as environment variables. Don't assume any local state.
+- **Logging**: Always set `--logging` to a GCS path so you can debug failures.
+- **Idempotency**: If re-running, dsub will create new jobs. Check for existing
+  outputs before re-submitting to avoid redundant computation.
+
+## Example: VADR Batch Analysis
+
+From the GATK-to-FreeBayes regression testing (PR #1053), we ran VADR on 30 FASTAs
+(15 assemblies x old/new) using dsub:
+
+```bash
+source ~/venvs/dsub/bin/activate  # or wherever dsub is installed
+
+dsub --provider google-cls-v2 \
+  --project gcid-viral-seq \
+  --regions us-central1 \
+  --image staphb/vadr:1.6.4 \
+  --machine-type n1-highmem-4 \
+  --script run_vadr.sh \
+  --tasks vadr_tasks.tsv \
+  --logging gs://viral-temp-30d/vadr_regression/logs/
+```
+
+The tasks TSV had columns for VADR options (`--env VADR_OPTS`), model URL
+(`--env MODEL_URL`), input FASTA (`--input FASTA`), and outputs
+(`--output NUM_ALERTS`, `--output ALERTS_TSV`, `--output VADR_TGZ`).
+
+All 30 jobs completed in ~15 minutes total (running in parallel on GCP).
diff --git a/.agents/skills/regression-testing/SKILL.md b/.agents/skills/regression-testing/SKILL.md
new file mode 100644
index 000000000..0a11161d2
--- /dev/null
+++ b/.agents/skills/regression-testing/SKILL.md
@@ -0,0 +1,167 @@
+# Assembly Regression Testing
+
+End-to-end regression testing for assembly pipeline changes against Terra submissions.
+
+## When to Use
+
+Use this playbook when a PR makes functional changes to the assembly or variant-calling
+pipeline (e.g., swapping variant callers, changing alignment parameters, modifying
+consensus logic). It compares assembly outputs from old vs new code across hundreds
+of real samples to validate equivalence or improvement.
+
+## Prerequisites
+
+- **gcloud CLI** -- authenticated with access to Terra workspace GCS buckets
+- **mafft** -- for pairwise sequence alignment
+- **Python** with pandas and matplotlib (e.g., a dataviz venv)
+- **dsub** -- for running VADR batch jobs on GCP (see the `dsub-batch-jobs` skill)
+
+## Workflow
+
+### Step 1: Set Up Terra Submissions (Manual)
+
+The user must manually launch Terra submissions with old and new code:
+1. Run the pipeline on a representative dataset using the **main branch** Docker image
+2. Run the same pipeline on the same dataset using the **feature branch** Docker image
+3. Note the submission IDs and workspace bucket for both runs
+
+### Step 2: Discover Paired Samples
+
+Use `discover_pairs.py` to find all comparable old/new sample pairs by crawling
+GCS Cromwell output directories.
+
+```bash
+python discover_pairs.py \
+  --bucket <workspace-bucket-id> \
+  --old-sub <old-submission-id> \
+  --new-sub <new-submission-id> \
+  --output pairs.json
+```
+
+This produces a JSON mapping sample_name -> {old_tsv, new_tsv} for all samples
+present in both submissions.
+
+### Step 3: Compare Assembly Outputs
+
+Use `compare_sample_pair.py` to compare each sample pair. This script:
+- Downloads assembly_stats TSVs from GCS
+- Compares metrics (coverage, % reference covered, length, etc.)
+- Downloads FASTA assemblies and aligns them with mafft
+- Reports SNPs, indels (events and bp), ambiguity diffs, and terminal extensions
+
+```bash
+python compare_sample_pair.py \
+  --old-tsv <gcs_uri> --new-tsv <gcs_uri> \
+  --work-dir ./results/<sample> \
+  --output-json ./results/<sample>.json
+```
+
+For batch processing, iterate over all entries in `pairs.json` and invoke
+`compare_sample_pair.py` for each sample pair (e.g., via a small wrapper
+script using `concurrent.futures` or `xargs`/GNU `parallel`).
+
+### Step 4: Generate Report
+
+Use `generate_report.py` to aggregate all per-sample JSONs into plots and a markdown report.
+
+```bash
+python generate_report.py \
+  --results-dir ./results/ \
+  --report-dir ./report/ \
+  --workspace-name <name>
+```
+
+Outputs:
+- Summary TSV with per-assembly metrics
+- 8 plots (scatter plots, histograms, identity distribution)
+- Markdown report with summary tables and divergent assembly details
+
+### Step 5: (Optional) VADR Annotation Quality
+
+For assemblies with internal indel differences, run VADR to assess whether indels
+cause frameshifts or other annotation problems. See the `dsub-batch-jobs` skill for
+details on running batch jobs via dsub.
+
+Use `run_vadr.sh` with dsub to run VADR on each FASTA:
+
+```bash
+dsub --provider google-cls-v2 \
+  --project <gcp-project> --regions us-central1 \
+  --image staphb/vadr:1.6.4 \
+  --machine-type n1-highmem-4 \
+  --script run_vadr.sh \
+  --tasks vadr_tasks.tsv \
+  --logging gs://<bucket>/vadr_logs/
+```
+
+VADR model parameters come from the viral-references repo:
+https://github.com/broadinstitute/viral-references/blob/main/annotation/vadr/vadr-by-taxid.tsv
+
+Use the taxid from the assembly_id (format: `sample_id-taxid`) to look up the
+correct `vadr_opts`, `min_seq_len`, `max_seq_len`, `vadr_model_tar_url`, and
+`vadr_model_tar_subdir`.
+
+### Step 6: Post Results
+
+Post the report as a PR comment. Before posting:
+- **Self-review the proposed comment for confidential information** (sample names,
+  internal paths, credentials, etc.). Ask the user if in doubt about what is safe
+  to share publicly.
+- Include plots as image attachments if the PR is on GitHub
+- Attribute the analysis appropriately
+
+## Key Patterns
+
+### Per-Segment Alignment
+
+Multi-segment genomes (e.g., influenza with 8 segments) must be aligned
+**per-segment**, not as a single concatenated sequence. Otherwise, terminal
+effects at segment boundaries get misclassified as internal indels.
+
+The `compare_sample_pair.py` script handles this automatically: it pairs
+segments by FASTA header, aligns each pair independently, analyzes each
+alignment separately (so terminal effects stay terminal), and aggregates
+the statistics.
+
+### Event Counting vs BP Counting
+
+For indels, both counts matter:
+- **BP count**: Total gap positions (e.g., "49 bp of indels")
+- **Event count**: Contiguous gap runs (e.g., "13 indel events")
+
+A single 26bp insertion is 1 event but 26 bp. Event counts better reflect
+the number of variant-calling decisions that differ between old and new code.
+
+### VADR Frameshift Cascade Detection
+
+A single spurious 1bp indel in a coding region causes a cascade of VADR alerts:
+1. `FRAMESHIFT` -- the indel shifts the reading frame
+2. `STOP_CODON` -- premature stop codon in the shifted frame
+3. `UNEXPECTED_LENGTH` -- protein length doesn't match model
+4. `PEPTIDE_TRANSLATION_PROBLEM` -- for each downstream mature peptide
+
+When comparing VADR alert counts, a large delta (e.g., 32 -> 1) usually means
+one version has frameshift-causing indels that the other avoids. Check the
+`.alt.list` files to confirm which genes are affected.
+
+## Interpreting Results
+
+### What to Look For
+
+1. **Identity distribution**: Most assemblies should be 100% identical. Any
+   below 99.9% warrant investigation.
+2. **SNP count = 0 for all assemblies**: Pipeline changes that only affect
+   indel calling (e.g., swapping variant callers) should produce zero SNPs.
+3. **Indel events**: The number and nature of indel differences. Are they in
+   coding regions? Do they cause frameshifts?
+4. **Coverage correlation**: Low-coverage samples (<10x) are most likely to
+   show differences between variant callers.
+5. **VADR alert deltas**: Fewer alerts = more biologically plausible assembly.
+   Large improvements (e.g., -31 alerts) strongly favor the new code.
+
+### Red Flags
+
+- Assemblies present in old but missing in new (or vice versa)
+- SNPs introduced where none existed before
+- VADR alerts increasing significantly for the new code
+- Differences concentrated in specific organisms/taxids
diff --git a/.agents/skills/regression-testing/compare_sample_pair.py b/.agents/skills/regression-testing/compare_sample_pair.py
new file mode 100644
index 000000000..5b5000072
--- /dev/null
+++ b/.agents/skills/regression-testing/compare_sample_pair.py
@@ -0,0 +1,514 @@
+#!/usr/bin/env python3
+"""Compare assembly outputs between old and new code for a single sample pair.
+
+Takes two GCS URIs pointing at assembly_metadata TSV files (old and new),
+downloads them, compares metrics, aligns FASTAs with mafft, and outputs a JSON result.
+"""
+import argparse
+import csv
+import io
+import json
+import logging
+import os
+import subprocess
+
+logging.basicConfig(level=logging.INFO, format='%(asctime)s %(levelname)s %(message)s')
+log = logging.getLogger(__name__)
+
+METRICS_COLS = [
+    'assembly_length', 'assembly_length_unambiguous', 'reads_aligned',
+    'mean_coverage', 'percent_reference_covered', 'reference_length',
+    'scaffolding_num_segments_recovered', 'reference_num_segments_required',
+]
+FLOAT_COLS = {'mean_coverage', 'percent_reference_covered'}
+INT_COLS = {'assembly_length', 'assembly_length_unambiguous', 'reads_aligned',
+            'reference_length', 'scaffolding_num_segments_recovered',
+            'reference_num_segments_required'}
+
+
+def gcloud_cat(gcs_uri):
+    """Read a GCS file's contents as a string."""
+    result = subprocess.run(
+        ['gcloud', 'storage', 'cat', gcs_uri],
+        capture_output=True, text=True, timeout=120
+    )
+    if result.returncode != 0:
+        raise RuntimeError(f"gcloud storage cat failed for {gcs_uri}: {result.stderr.strip()}")
+    return result.stdout
+
+
+def gcloud_cp(gcs_uri, local_path):
+    """Download a GCS file to local path."""
+    result = subprocess.run(
+        ['gcloud', 'storage', 'cp', gcs_uri, local_path],
+        capture_output=True, text=True, timeout=120
+    )
+    if result.returncode != 0:
+        raise RuntimeError(f"gcloud storage cp failed for {gcs_uri}: {result.stderr.strip()}")
+
+
+def parse_tsv(content):
+    """Parse assembly_metadata TSV content into a dict keyed by assembly_id.
+
+    Returns dict: assembly_id -> {col: value, ...}
+    """
+    reader = csv.DictReader(io.StringIO(content), delimiter='\t')
+    rows = {}
+    for row in reader:
+        # The first column is 'entity:assembly_id'
+        assembly_id = row.get('entity:assembly_id', '').strip()
+        if not assembly_id:
+            continue
+        # Parse numeric columns
+        parsed = dict(row)
+        for col in FLOAT_COLS:
+            if col in parsed and parsed[col]:
+                try:
+                    parsed[col] = float(parsed[col])
+                except (ValueError, TypeError):
+                    parsed[col] = None
+        for col in INT_COLS:
+            if col in parsed and parsed[col]:
+                try:
+                    parsed[col] = int(parsed[col])
+                except (ValueError, TypeError):
+                    parsed[col] = None
+        rows[assembly_id] = parsed
+    return rows
+
+
+def parse_fasta(content):
+    """Parse a FASTA string into list of (header, sequence) tuples."""
+    sequences = []
+    current_header = None
+    current_seq = []
+    for line in content.strip().split('\n'):
+        line = line.strip()
+        if not line:
+            continue
+        if line.startswith('>'):
+            if current_header is not None:
+                sequences.append((current_header, ''.join(current_seq)))
+            current_header = line[1:].strip()
+            current_seq = []
+        else:
+            current_seq.append(line)
+    if current_header is not None:
+        sequences.append((current_header, ''.join(current_seq)))
+    return sequences
+
+
+def run_mafft_pair(old_seq, new_seq, work_dir, pair_id='0'):
+    """Run mafft on a single pair of sequences. Returns aligned (old_seq, new_seq) strings."""
+    combined = os.path.join(work_dir, f'combined_{pair_id}.fasta')
+    with open(combined, 'w') as f:
+        f.write(f'>old_{pair_id}\n{old_seq}\n>new_{pair_id}\n{new_seq}\n')
+
+    try:
+        result = subprocess.run(
+            ['mafft', '--auto', '--preservecase', '--quiet', '--thread', '1', combined],
+            capture_output=True, text=True, timeout=300
+        )
+    finally:
+        os.unlink(combined)
+    if result.returncode != 0:
+        raise RuntimeError(f"mafft failed: {result.stderr.strip()}")
+
+    seqs = parse_fasta(result.stdout)
+    if len(seqs) != 2:
+        raise RuntimeError(f"Expected 2 sequences from mafft, got {len(seqs)}")
+    return seqs[0][1], seqs[1][1]
+
+
+def align_and_analyze_fastas(old_fasta_path, new_fasta_path, work_dir):
+    """Align old vs new FASTAs, handling multi-segment genomes.
+
+    For single-segment: runs one mafft alignment and analyzes it.
+    For multi-segment: pairs segments by header, aligns each pair separately,
+    analyzes each independently (so terminal effects stay terminal), and
+    aggregates the stats.
+
+    Returns (aligned_fasta_path, stats_dict).
+    """
+    with open(old_fasta_path) as f:
+        old_fasta_text = f.read()
+    with open(new_fasta_path) as f:
+        new_fasta_text = f.read()
+    old_seqs = parse_fasta(old_fasta_text)
+    new_seqs = parse_fasta(new_fasta_text)
+
+    if len(old_seqs) == 1 and len(new_seqs) == 1:
+        # Simple case: single segment
+        aln_old, aln_new = run_mafft_pair(old_seqs[0][1], new_seqs[0][1], work_dir)
+        aligned = os.path.join(work_dir, 'aligned.fasta')
+        with open(aligned, 'w') as f:
+            f.write(f'>old_{old_seqs[0][0]}\n{aln_old}\n>new_{new_seqs[0][0]}\n{aln_new}\n')
+        stats = analyze_alignment_seqs(aln_old, aln_new)
+        return aligned, stats
+
+    # Multi-segment: pair by header name
+    old_dict = {h: s for h, s in old_seqs}
+    new_dict = {h: s for h, s in new_seqs}
+    common_headers = [h for h in old_dict if h in new_dict]
+
+    if not common_headers:
+        if len(old_seqs) == len(new_seqs):
+            log.info(f"  Multi-segment: headers don't match, pairing by position ({len(old_seqs)} segments)")
+            common_headers = None
+        else:
+            raise RuntimeError(
+                f"Cannot pair segments: {len(old_seqs)} old vs {len(new_seqs)} new, no matching headers")
+
+    # Align each pair and analyze independently
+    segment_pairs = []
+    if common_headers is not None:
+        for i, h in enumerate(common_headers):
+            aln_old, aln_new = run_mafft_pair(old_dict[h], new_dict[h], work_dir, pair_id=str(i))
+            segment_pairs.append((h, aln_old, aln_new))
+    else:
+        for i in range(len(old_seqs)):
+            aln_old, aln_new = run_mafft_pair(old_seqs[i][1], new_seqs[i][1], work_dir, pair_id=str(i))
+            segment_pairs.append((old_seqs[i][0], aln_old, aln_new))
+
+    n_segments = len(segment_pairs)
+    log.info(f"  Multi-segment alignment: {n_segments} segments aligned")
+
+    # Analyze each segment independently, then aggregate
+    agg = {
+        'alignment_length': 0, 'internal_length': 0,
+        'matches': 0, 'snps': 0,
+        'internal_insertions': 0, 'internal_deletions': 0,
+        'internal_insertion_events': 0, 'internal_deletion_events': 0,
+        'ambiguity_diffs': 0,
+        'terminal_old_left': 0, 'terminal_new_left': 0,
+        'terminal_old_right': 0, 'terminal_new_right': 0,
+        'terminal_extensions_old': 0, 'terminal_extensions_new': 0,
+        'terminal_extension_events_old': 0, 'terminal_extension_events_new': 0,
+    }
+    per_segment = []
+    for header, aln_old, aln_new in segment_pairs:
+        seg_stats = analyze_alignment_seqs(aln_old, aln_new)
+        per_segment.append({'segment': header, **seg_stats})
+        for key in agg:
+            agg[key] += seg_stats.get(key, 0)
+
+    # Compute aggregate identity
+    total_bases_compared = agg['matches'] + agg['snps'] + agg['ambiguity_diffs']
+    agg['identity'] = agg['matches'] / total_bases_compared if total_bases_compared > 0 else 1.0
+    agg['n_segments'] = n_segments
+    agg['per_segment'] = per_segment
+
+    # Write per-segment alignment file (for review)
+    aligned = os.path.join(work_dir, 'aligned.fasta')
+    with open(aligned, 'w') as f:
+        for header, aln_old, aln_new in segment_pairs:
+            f.write(f'>old_{header}\n{aln_old}\n>new_{header}\n{aln_new}\n')
+
+    return aligned, agg
+
+
+def analyze_alignment_seqs(old_seq_str, new_seq_str):
+    """Analyze a pairwise alignment given two aligned sequence strings.
+
+    Returns dict with alignment statistics.
+    """
+    old_seq = old_seq_str.upper()
+    new_seq = new_seq_str.upper()
+    aln_len = len(old_seq)
+
+    if len(new_seq) != aln_len:
+        raise RuntimeError(f"Alignment sequences differ in length: {aln_len} vs {len(new_seq)}")
+
+    ACGT = set('ACGT')
+
+    # Find the internal region (where both sequences have bases)
+    left_bound = 0
+    while left_bound < aln_len and (old_seq[left_bound] == '-' or new_seq[left_bound] == '-'):
+        left_bound += 1
+
+    right_bound = aln_len - 1
+    while right_bound >= 0 and (old_seq[right_bound] == '-' or new_seq[right_bound] == '-'):
+        right_bound -= 1
+
+    # Terminal extension counts (bp and events)
+    terminal_old_left = 0
+    terminal_new_left = 0
+    terminal_old_right = 0
+    terminal_new_right = 0
+    # Track events: a contiguous run of gaps on one side = 1 event
+    terminal_old_left_events = 0
+    terminal_new_left_events = 0
+    terminal_old_right_events = 0
+    terminal_new_right_events = 0
+
+    prev_old_gap = False
+    prev_new_gap = False
+    for i in range(left_bound):
+        if old_seq[i] != '-' and new_seq[i] == '-':
+            terminal_old_left += 1
+            if not prev_old_gap:
+                terminal_old_left_events += 1
+            prev_old_gap = True
+            prev_new_gap = False
+        elif new_seq[i] != '-' and old_seq[i] == '-':
+            terminal_new_left += 1
+            if not prev_new_gap:
+                terminal_new_left_events += 1
+            prev_new_gap = True
+            prev_old_gap = False
+        else:
+            prev_old_gap = False
+            prev_new_gap = False
+
+    prev_old_gap = False
+    prev_new_gap = False
+    for i in range(right_bound + 1, aln_len):
+        if old_seq[i] != '-' and new_seq[i] == '-':
+            terminal_old_right += 1
+            if not prev_old_gap:
+                terminal_old_right_events += 1
+            prev_old_gap = True
+            prev_new_gap = False
+        elif new_seq[i] != '-' and old_seq[i] == '-':
+            terminal_new_right += 1
+            if not prev_new_gap:
+                terminal_new_right_events += 1
+            prev_new_gap = True
+            prev_old_gap = False
+        else:
+            prev_old_gap = False
+            prev_new_gap = False
+
+    # Analyze internal region (bp counts and event counts)
+    matches = 0
+    snps = 0
+    internal_insertions = 0  # bp count
+    internal_deletions = 0   # bp count
+    internal_insertion_events = 0
+    internal_deletion_events = 0
+    ambiguity_diffs = 0
+    in_insertion = False
+    in_deletion = False
+
+    for i in range(left_bound, right_bound + 1):
+        o = old_seq[i]
+        n = new_seq[i]
+
+        if o == '-' and n != '-':
+            internal_insertions += 1
+            if not in_insertion:
+                internal_insertion_events += 1
+                in_insertion = True
+            in_deletion = False
+        elif o != '-' and n == '-':
+            internal_deletions += 1
+            if not in_deletion:
+                internal_deletion_events += 1
+                in_deletion = True
+            in_insertion = False
+        else:
+            in_insertion = False
+            in_deletion = False
+            if o == n:
+                if o != '-':
+                    matches += 1
+            elif o in ACGT and n in ACGT:
+                snps += 1
+            else:
+                ambiguity_diffs += 1
+
+    total_internal = right_bound - left_bound + 1 if right_bound >= left_bound else 0
+    total_bases_compared = matches + snps + ambiguity_diffs
+    identity = matches / total_bases_compared if total_bases_compared > 0 else 1.0
+
+    return {
+        'alignment_length': aln_len,
+        'internal_length': total_internal,
+        'matches': matches,
+        'snps': snps,
+        'internal_insertions': internal_insertions,
+        'internal_deletions': internal_deletions,
+        'internal_insertion_events': internal_insertion_events,
+        'internal_deletion_events': internal_deletion_events,
+        'ambiguity_diffs': ambiguity_diffs,
+        'terminal_old_left': terminal_old_left,
+        'terminal_new_left': terminal_new_left,
+        'terminal_old_right': terminal_old_right,
+        'terminal_new_right': terminal_new_right,
+        'terminal_extensions_old': terminal_old_left + terminal_old_right,
+        'terminal_extensions_new': terminal_new_left + terminal_new_right,
+        'terminal_extension_events_old': terminal_old_left_events + terminal_old_right_events,
+        'terminal_extension_events_new': terminal_new_left_events + terminal_new_right_events,
+        'identity': identity,
+    }
+
+
+def analyze_alignment(aligned_fasta_path):
+    """Analyze pairwise alignment from mafft output file. Thin wrapper around analyze_alignment_seqs."""
+    with open(aligned_fasta_path) as f:
+        seqs = parse_fasta(f.read())
+    if len(seqs) != 2:
+        raise RuntimeError(f"Expected 2 sequences in alignment, got {len(seqs)}")
+    return analyze_alignment_seqs(seqs[0][1], seqs[1][1])
+
+
+def compare_assembly(assembly_id, old_row, new_row, work_dir):
+    """Compare one assembly (old vs new). Downloads FASTAs, runs mafft, returns result dict."""
+    result = {
+        'assembly_id': assembly_id,
+        'taxid': old_row.get('taxid', ''),
+        'tax_name': old_row.get('tax_name', ''),
+        'old_metrics': {},
+        'new_metrics': {},
+        'deltas': {},
+        'alignment': None,
+        'error': None,
+    }
+
+    # Extract metrics
+    for col in METRICS_COLS:
+        old_val = old_row.get(col)
+        new_val = new_row.get(col)
+        result['old_metrics'][col] = old_val
+        result['new_metrics'][col] = new_val
+        if old_val is not None and new_val is not None:
+            try:
+                result['deltas'][col] = float(new_val) - float(old_val)
+            except (ValueError, TypeError):
+                result['deltas'][col] = None
+
+    # Get FASTA paths
+    old_fasta_uri = old_row.get('assembly_fasta', '').strip()
+    new_fasta_uri = new_row.get('assembly_fasta', '').strip()
+
+    if not old_fasta_uri or not new_fasta_uri:
+        log.info(f"  {assembly_id}: skipping alignment (missing FASTA URI)")
+        return result
+
+    # Download FASTAs
+    aln_dir = os.path.join(work_dir, f'aln_{assembly_id}')
+    os.makedirs(aln_dir, exist_ok=True)
+
+    old_fasta_local = os.path.join(aln_dir, 'old.fasta')
+    new_fasta_local = os.path.join(aln_dir, 'new.fasta')
+
+    try:
+        log.debug(f"  Downloading old FASTA: {old_fasta_uri}")
+        gcloud_cp(old_fasta_uri, old_fasta_local)
+        log.debug(f"  Downloading new FASTA: {new_fasta_uri}")
+        gcloud_cp(new_fasta_uri, new_fasta_local)
+
+        # Check if FASTAs are non-empty
+        if os.path.getsize(old_fasta_local) == 0 or os.path.getsize(new_fasta_local) == 0:
+            log.info(f"  {assembly_id}: skipping alignment (empty FASTA)")
+            return result
+
+        # Align and analyze (handles multi-segment genomes independently)
+        log.debug(f"  Running alignment for {assembly_id}")
+        aligned_path, alignment_stats = align_and_analyze_fastas(old_fasta_local, new_fasta_local, aln_dir)
+        result['alignment'] = alignment_stats
+
+        identity = alignment_stats['identity']
+        snps = alignment_stats['snps']
+        indels = alignment_stats['internal_insertions'] + alignment_stats['internal_deletions']
+        log.info(f"  {assembly_id}: identity={identity:.6f}, snps={snps}, indels={indels}, "
+                 f"terminal_ext_old={alignment_stats['terminal_extensions_old']}, "
+                 f"terminal_ext_new={alignment_stats['terminal_extensions_new']}")
+
+    except Exception as e:
+        log.error(f"  {assembly_id}: alignment failed: {e}")
+        result['error'] = str(e)
+    finally:
+        # Cleanup downloaded files
+        for f in [old_fasta_local, new_fasta_local]:
+            if os.path.exists(f):
+                os.unlink(f)
+        # Keep aligned file only if identity < 99.9%
+        aligned_file = os.path.join(aln_dir, 'aligned.fasta')
+        if os.path.exists(aligned_file):
+            if result['alignment'] and result['alignment']['identity'] >= 0.999:
+                os.unlink(aligned_file)
+            else:
+                log.info(f"  Keeping alignment file for review: {aligned_file}")
+        # Remove empty dir
+        try:
+            os.rmdir(aln_dir)
+        except OSError:
+            pass  # dir not empty (kept alignment file)
+
+    return result
+
+
+def main():
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument('--old-tsv', required=True, help='GCS URI of old assembly_metadata TSV')
+    parser.add_argument('--new-tsv', required=True, help='GCS URI of new assembly_metadata TSV')
+    parser.add_argument('--work-dir', required=True, help='Working directory for temp files')
+    parser.add_argument('--output-json', required=True, help='Output JSON file path')
+    parser.add_argument('--verbose', '-v', action='store_true', help='Enable debug logging')
+    args = parser.parse_args()
+
+    if args.verbose:
+        logging.getLogger().setLevel(logging.DEBUG)
+
+    os.makedirs(args.work_dir, exist_ok=True)
+
+    # Download and parse TSVs
+    log.info(f"Downloading old TSV: {args.old_tsv}")
+    old_content = gcloud_cat(args.old_tsv)
+    old_rows = parse_tsv(old_content)
+
+    log.info(f"Downloading new TSV: {args.new_tsv}")
+    new_content = gcloud_cat(args.new_tsv)
+    new_rows = parse_tsv(new_content)
+
+    # Extract sample_id from first row (or from TSV filename)
+    sample_id = 'unknown'
+    if old_rows:
+        first_row = next(iter(old_rows.values()))
+        sample_id = first_row.get('sample_id', 'unknown')
+    elif new_rows:
+        first_row = next(iter(new_rows.values()))
+        sample_id = first_row.get('sample_id', 'unknown')
+
+    log.info(f"Sample: {sample_id}")
+    log.info(f"Old assemblies: {len(old_rows)}, New assemblies: {len(new_rows)}")
+
+    # Find intersecting assembly_ids
+    common_ids = sorted(set(old_rows.keys()) & set(new_rows.keys()))
+    old_only_ids = sorted(set(old_rows.keys()) - set(new_rows.keys()))
+    new_only_ids = sorted(set(new_rows.keys()) - set(old_rows.keys()))
+
+    if old_only_ids:
+        log.info(f"  Assemblies only in old: {old_only_ids}")
+    if new_only_ids:
+        log.info(f"  Assemblies only in new: {new_only_ids}")
+    log.info(f"  Assemblies in common: {len(common_ids)}")
+
+    # Compare each intersecting assembly
+    comparisons = []
+    for aid in common_ids:
+        comp = compare_assembly(aid, old_rows[aid], new_rows[aid], args.work_dir)
+        comparisons.append(comp)
+
+    # Build output
+    output = {
+        'sample_id': sample_id,
+        'old_tsv_uri': args.old_tsv,
+        'new_tsv_uri': args.new_tsv,
+        'old_assembly_count': len(old_rows),
+        'new_assembly_count': len(new_rows),
+        'assembly_count_match': len(old_rows) == len(new_rows),
+        'assemblies_only_in_old': old_only_ids,
+        'assemblies_only_in_new': new_only_ids,
+        'comparisons': comparisons,
+    }
+
+    with open(args.output_json, 'w') as f:
+        json.dump(output, f, indent=2)
+
+    log.info(f"Wrote results to {args.output_json}")
+
+
+if __name__ == '__main__':
+    main()
diff --git a/.agents/skills/regression-testing/discover_pairs.py b/.agents/skills/regression-testing/discover_pairs.py
new file mode 100644
index 000000000..5fa9c82c9
--- /dev/null
+++ b/.agents/skills/regression-testing/discover_pairs.py
@@ -0,0 +1,162 @@
+#!/usr/bin/env python3
+"""Discover comparable old/new sample pairs by crawling GCS Cromwell output directories.
+
+For each submission, finds assembly_metadata TSV files named
+``assembly_metadata-<sample>.tsv``, extracts sample names from filenames,
+and outputs the intersection as a JSON mapping.
+
+Usage:
+    python discover_pairs.py \
+        --bucket fc-XXXXXXXX-... \
+        --old-sub <old-submission-id> \
+        --new-sub <new-submission-id> \
+        -o pairs.json
+"""
+import argparse
+import json
+import logging
+import re
+import subprocess
+
+logging.basicConfig(level=logging.INFO, format='%(asctime)s %(levelname)s %(message)s')
+log = logging.getLogger(__name__)
+
+
+def gcloud_ls(path):
+    """List GCS path, return list of URIs. Returns [] on error."""
+    try:
+        result = subprocess.run(
+            ['gcloud', 'storage', 'ls', path],
+            capture_output=True, text=True, timeout=60
+        )
+        if result.returncode != 0:
+            log.warning(f"gcloud ls non-zero exit for {path}: {result.stderr.strip()}")
+            return []
+        return [line.strip() for line in result.stdout.strip().split('\n') if line.strip()]
+    except Exception as e:
+        log.warning(f"gcloud ls failed for {path}: {e}")
+        return []
+
+
+def find_tsv_in_call_dir(call_dir_uri):
+    """Find assembly_metadata TSV in a call directory, handling attempt-N subdirs.
+
+    Returns (sample_name, tsv_uri) or (None, None).
+    """
+    items = gcloud_ls(call_dir_uri)
+
+    # Check for attempt-N subdirectories
+    def attempt_sort_key(path):
+        match = re.search(r'/attempt-(\d+)', path)
+        return int(match.group(1)) if match else 0
+    attempt_dirs = sorted([i for i in items if '/attempt-' in i],
+                          key=attempt_sort_key, reverse=True)
+    tsv_files = [i for i in items if i.endswith('.tsv')]
+
+    # If there are attempt dirs, check the highest attempt first
+    if attempt_dirs:
+        for attempt_dir in attempt_dirs:
+            attempt_items = gcloud_ls(attempt_dir)
+            attempt_tsvs = [i for i in attempt_items if i.endswith('.tsv')]
+            if attempt_tsvs:
+                tsv_files = attempt_tsvs
+                break
+
+    for tsv in tsv_files:
+        match = re.search(r'assembly_metadata-(.+)\.tsv$', tsv)
+        if match:
+            return match.group(1), tsv
+
+    return None, None
+
+
+def discover_submission_tsvs(bucket, submission_id):
+    """Find all assembly_metadata TSVs for a submission.
+
+    Returns dict: sample_name -> tsv_gcs_uri
+    """
+    base = f"gs://{bucket}/submissions/{submission_id}/assemble_denovo_metagenomic/"
+    log.info(f"Listing workflow directories in {base}")
+
+    wf_dirs = gcloud_ls(base)
+    log.info(f"Found {len(wf_dirs)} workflow directories")
+
+    results = {}
+    for i, wf_dir in enumerate(wf_dirs):
+        if i % 20 == 0:
+            log.info(f"  Scanning workflow {i+1}/{len(wf_dirs)}...")
+
+        for call_name in ['call-assembly_stats_non_empty', 'call-assembly_stats_empty']:
+            call_dir = f"{wf_dir}{call_name}/"
+            sample_name, tsv_uri = find_tsv_in_call_dir(call_dir)
+            if sample_name:
+                if sample_name in results:
+                    log.warning(f"Duplicate sample {sample_name} -- keeping first occurrence")
+                else:
+                    results[sample_name] = tsv_uri
+                break
+
+    log.info(f"Found TSVs for {len(results)} samples in submission {submission_id[:8]}")
+    return results
+
+
+def main():
+    parser = argparse.ArgumentParser(description=__doc__,
+                                     formatter_class=argparse.RawDescriptionHelpFormatter)
+    parser.add_argument('--bucket', required=True,
+                        help='Terra workspace GCS bucket ID (e.g., fc-XXXXXXXX-...)')
+    parser.add_argument('--old-sub', required=True,
+                        help='Old submission ID (main branch)')
+    parser.add_argument('--new-sub', required=True,
+                        help='New submission ID (feature branch)')
+    parser.add_argument('--output', '-o', required=True,
+                        help='Output JSON file path')
+    args = parser.parse_args()
+
+    log.info(f"Old submission: {args.old_sub[:8]}")
+    log.info(f"New submission: {args.new_sub[:8]}")
+
+    old_tsvs = discover_submission_tsvs(args.bucket, args.old_sub)
+    new_tsvs = discover_submission_tsvs(args.bucket, args.new_sub)
+
+    # Find intersection
+    common_samples = sorted(set(old_tsvs.keys()) & set(new_tsvs.keys()))
+    old_only = sorted(set(old_tsvs.keys()) - set(new_tsvs.keys()))
+    new_only = sorted(set(new_tsvs.keys()) - set(old_tsvs.keys()))
+
+    log.info(f"Old-only samples: {len(old_only)}")
+    log.info(f"New-only samples: {len(new_only)}")
+    log.info(f"Intersecting samples: {len(common_samples)}")
+
+    if old_only:
+        log.info(f"  Old-only: {old_only[:5]}{'...' if len(old_only) > 5 else ''}")
+    if new_only:
+        log.info(f"  New-only: {new_only[:5]}{'...' if len(new_only) > 5 else ''}")
+
+    pairs = {}
+    for sample in common_samples:
+        pairs[sample] = {
+            'old_tsv': old_tsvs[sample],
+            'new_tsv': new_tsvs[sample],
+        }
+
+    output = {
+        'bucket': args.bucket,
+        'old_submission': args.old_sub,
+        'new_submission': args.new_sub,
+        'old_sample_count': len(old_tsvs),
+        'new_sample_count': len(new_tsvs),
+        'paired_count': len(pairs),
+        'old_only': old_only,
+        'new_only': new_only,
+        'pairs': pairs,
+    }
+
+    with open(args.output, 'w') as f:
+        json.dump(output, f, indent=2)
+
+    log.info(f"Wrote {len(pairs)} pairs to {args.output}")
+
+
+if __name__ == '__main__':
+    main()
diff --git a/.agents/skills/regression-testing/generate_report.py b/.agents/skills/regression-testing/generate_report.py
new file mode 100644
index 000000000..d6dda318a
--- /dev/null
+++ b/.agents/skills/regression-testing/generate_report.py
@@ -0,0 +1,445 @@
+#!/usr/bin/env python3
+"""Generate regression testing report with plots from per-sample JSON results.
+
+Aggregates all comparison results, produces summary TSV, plots, and markdown report.
+"""
+import argparse
+import glob
+import json
+import logging
+import os
+
+logging.basicConfig(level=logging.INFO, format='%(asctime)s %(levelname)s %(message)s')
+log = logging.getLogger(__name__)
+
+# Delay imports so script can show usage without these deps
+def get_deps():
+    import pandas as pd
+    import matplotlib
+    matplotlib.use('Agg')
+    import matplotlib.pyplot as plt
+    return pd, plt
+
+
+def load_results(results_dir):
+    """Load all per-sample JSON results."""
+    results = []
+    for path in sorted(glob.glob(os.path.join(results_dir, '*.json'))):
+        try:
+            with open(path) as f:
+                results.append(json.load(f))
+        except Exception as e:
+            log.warning(f"Failed to load {path}: {e}")
+    return results
+
+
+def build_comparison_table(results, workspace_name):
+    """Build a flat table of per-assembly comparisons."""
+    rows = []
+    for r in results:
+        sample_id = r.get('sample_id', 'unknown')
+        for comp in r.get('comparisons', []):
+            row = {
+                'workspace': workspace_name,
+                'sample_id': sample_id,
+                'assembly_id': comp.get('assembly_id', ''),
+                'taxid': comp.get('taxid', ''),
+                'tax_name': comp.get('tax_name', ''),
+            }
+            # Metrics
+            for col in ['percent_reference_covered', 'mean_coverage',
+                        'assembly_length_unambiguous', 'assembly_length',
+                        'reads_aligned', 'reference_length']:
+                old_val = comp.get('old_metrics', {}).get(col)
+                new_val = comp.get('new_metrics', {}).get(col)
+                delta = comp.get('deltas', {}).get(col)
+                row[f'old_{col}'] = old_val
+                row[f'new_{col}'] = new_val
+                row[f'delta_{col}'] = delta
+
+            # Alignment stats
+            aln = comp.get('alignment')
+            if aln:
+                row['alignment_identity'] = aln.get('identity')
+                row['snp_count'] = aln.get('snps', 0)
+                row['internal_insertions'] = aln.get('internal_insertions', 0)
+                row['internal_deletions'] = aln.get('internal_deletions', 0)
+                row['indel_count_bp'] = aln.get('internal_insertions', 0) + aln.get('internal_deletions', 0)
+                row['indel_count_events'] = aln.get('internal_insertion_events', 0) + aln.get('internal_deletion_events', 0)
+                row['terminal_extensions_old'] = aln.get('terminal_extensions_old', 0)
+                row['terminal_extensions_new'] = aln.get('terminal_extensions_new', 0)
+                row['terminal_extension_events_old'] = aln.get('terminal_extension_events_old', 0)
+                row['terminal_extension_events_new'] = aln.get('terminal_extension_events_new', 0)
+                row['ambiguity_diffs'] = aln.get('ambiguity_diffs', 0)
+            else:
+                row['alignment_identity'] = None
+                row['snp_count'] = None
+                row['indel_count_bp'] = None
+                row['indel_count_events'] = None
+                row['terminal_extensions_old'] = None
+                row['terminal_extensions_new'] = None
+                row['terminal_extension_events_old'] = None
+                row['terminal_extension_events_new'] = None
+                row['ambiguity_diffs'] = None
+
+            row['error'] = comp.get('error')
+            rows.append(row)
+    return rows
+
+
+def build_sample_summary(results):
+    """Build sample-level summary table."""
+    rows = []
+    for r in results:
+        rows.append({
+            'sample_id': r.get('sample_id', 'unknown'),
+            'old_assembly_count': r.get('old_assembly_count', 0),
+            'new_assembly_count': r.get('new_assembly_count', 0),
+            'assembly_count_match': r.get('assembly_count_match', False),
+            'assemblies_only_in_old': len(r.get('assemblies_only_in_old', [])),
+            'assemblies_only_in_new': len(r.get('assemblies_only_in_new', [])),
+            'num_comparisons': len(r.get('comparisons', [])),
+        })
+    return rows
+
+
+def generate_plots(df, plot_dir):
+    """Generate all plots from the comparison dataframe."""
+    _, plt = get_deps()
+    os.makedirs(plot_dir, exist_ok=True)
+
+    # Filter to rows with actual assemblies
+    df_asm = df[df['old_percent_reference_covered'].notna() &
+                df['new_percent_reference_covered'].notna()].copy()
+
+    if len(df_asm) == 0:
+        log.warning("No assembly comparisons with metrics — skipping plots")
+        return
+
+    # 1. Percent reference covered scatter
+    fig, ax = plt.subplots(figsize=(8, 8))
+    ax.scatter(df_asm['old_percent_reference_covered'] * 100,
+               df_asm['new_percent_reference_covered'] * 100,
+               alpha=0.5, s=20)
+    lims = [0, 105]
+    ax.plot(lims, lims, 'r--', alpha=0.5, label='y=x')
+    ax.set_xlabel('Old % Reference Covered')
+    ax.set_ylabel('New % Reference Covered')
+    ax.set_title('Percent Reference Covered: Old vs New')
+    ax.set_xlim(lims)
+    ax.set_ylim(lims)
+    ax.legend()
+    fig.tight_layout()
+    fig.savefig(os.path.join(plot_dir, 'pct_ref_covered_scatter.png'), dpi=150)
+    plt.close(fig)
+
+    # 2. Mean coverage scatter (log scale)
+    df_cov = df_asm[df_asm['old_mean_coverage'].notna() &
+                     df_asm['new_mean_coverage'].notna() &
+                     (df_asm['old_mean_coverage'] > 0) &
+                     (df_asm['new_mean_coverage'] > 0)].copy()
+    if len(df_cov) > 0:
+        fig, ax = plt.subplots(figsize=(8, 8))
+        ax.scatter(df_cov['old_mean_coverage'], df_cov['new_mean_coverage'],
+                   alpha=0.5, s=20)
+        min_v = min(df_cov['old_mean_coverage'].min(), df_cov['new_mean_coverage'].min()) * 0.8
+        max_v = max(df_cov['old_mean_coverage'].max(), df_cov['new_mean_coverage'].max()) * 1.2
+        ax.plot([min_v, max_v], [min_v, max_v], 'r--', alpha=0.5, label='y=x')
+        ax.set_xscale('log')
+        ax.set_yscale('log')
+        ax.set_xlabel('Old Mean Coverage')
+        ax.set_ylabel('New Mean Coverage')
+        ax.set_title('Mean Coverage: Old vs New')
+        ax.legend()
+        fig.tight_layout()
+        fig.savefig(os.path.join(plot_dir, 'mean_coverage_scatter.png'), dpi=150)
+        plt.close(fig)
+
+    # 3. Assembly length scatter
+    df_len = df_asm[df_asm['old_assembly_length_unambiguous'].notna() &
+                     df_asm['new_assembly_length_unambiguous'].notna()].copy()
+    if len(df_len) > 0:
+        fig, ax = plt.subplots(figsize=(8, 8))
+        ax.scatter(df_len['old_assembly_length_unambiguous'],
+                   df_len['new_assembly_length_unambiguous'],
+                   alpha=0.5, s=20)
+        min_v = min(df_len['old_assembly_length_unambiguous'].min(),
+                    df_len['new_assembly_length_unambiguous'].min()) * 0.95
+        max_v = max(df_len['old_assembly_length_unambiguous'].max(),
+                    df_len['new_assembly_length_unambiguous'].max()) * 1.05
+        ax.plot([min_v, max_v], [min_v, max_v], 'r--', alpha=0.5, label='y=x')
+        ax.set_xlabel('Old Unambiguous Length')
+        ax.set_ylabel('New Unambiguous Length')
+        ax.set_title('Assembly Length (Unambiguous): Old vs New')
+        ax.legend()
+        fig.tight_layout()
+        fig.savefig(os.path.join(plot_dir, 'assembly_length_scatter.png'), dpi=150)
+        plt.close(fig)
+
+    # 4. Delta pct_ref_covered histogram
+    deltas = df_asm['delta_percent_reference_covered'].dropna() * 100
+    if len(deltas) > 0:
+        fig, ax = plt.subplots(figsize=(8, 5))
+        ax.hist(deltas, bins=50, edgecolor='black', alpha=0.7)
+        ax.axvline(0, color='r', linestyle='--', alpha=0.5)
+        ax.set_xlabel('Delta % Reference Covered (New - Old)')
+        ax.set_ylabel('Count')
+        ax.set_title(f'Distribution of % Reference Covered Changes (n={len(deltas)})')
+        fig.tight_layout()
+        fig.savefig(os.path.join(plot_dir, 'delta_pct_ref_covered_hist.png'), dpi=150)
+        plt.close(fig)
+
+    # 5. Alignment identity histogram
+    df_aln = df_asm[df_asm['alignment_identity'].notna()].copy()
+    if len(df_aln) > 0:
+        fig, ax = plt.subplots(figsize=(8, 5))
+        identities = df_aln['alignment_identity'] * 100
+        ax.hist(identities, bins=50, edgecolor='black', alpha=0.7)
+        ax.set_xlabel('Pairwise Identity (%)')
+        ax.set_ylabel('Count')
+        ax.set_title(f'Assembly Identity Distribution (n={len(identities)})')
+        ax.axvline(100, color='r', linestyle='--', alpha=0.3)
+        fig.tight_layout()
+        fig.savefig(os.path.join(plot_dir, 'alignment_identity_hist.png'), dpi=150)
+        plt.close(fig)
+
+    # 6. SNP and indel count histograms
+    if len(df_aln) > 0:
+        fig, axes = plt.subplots(1, 2, figsize=(14, 5))
+
+        snps = df_aln['snp_count'].dropna()
+        if len(snps) > 0:
+            max_snp = int(snps.max())
+            bins_snp = range(0, max(max_snp + 2, 3))
+            axes[0].hist(snps, bins=bins_snp, edgecolor='black', alpha=0.7)
+            axes[0].set_xlabel('SNP Count')
+            axes[0].set_ylabel('Number of Assemblies')
+            axes[0].set_title(f'SNPs per Assembly (n={len(snps)})')
+
+        indels = df_aln['indel_count_events'].dropna()
+        if len(indels) > 0:
+            max_indel = int(indels.max())
+            bins_indel = range(0, max(max_indel + 2, 3))
+            axes[1].hist(indels, bins=bins_indel, edgecolor='black', alpha=0.7)
+            axes[1].set_xlabel('Indel Count (internal)')
+            axes[1].set_ylabel('Number of Assemblies')
+            axes[1].set_title(f'Internal Indels per Assembly (n={len(indels)})')
+
+        fig.tight_layout()
+        fig.savefig(os.path.join(plot_dir, 'snp_indel_counts.png'), dpi=150)
+        plt.close(fig)
+
+    # 7. Terminal extensions histogram
+    if len(df_aln) > 0:
+        ext_old = df_aln['terminal_extensions_old'].dropna()
+        ext_new = df_aln['terminal_extensions_new'].dropna()
+        has_ext = (ext_old > 0) | (ext_new > 0)
+        if has_ext.any():
+            fig, ax = plt.subplots(figsize=(8, 5))
+            ax.hist(ext_new[has_ext], bins=30, alpha=0.6, label='New extends beyond Old', edgecolor='black')
+            ax.hist(ext_old[has_ext], bins=30, alpha=0.6, label='Old extends beyond New', edgecolor='black')
+            ax.set_xlabel('Terminal Extension (bp)')
+            ax.set_ylabel('Count')
+            ax.set_title('Terminal Extensions')
+            ax.legend()
+            fig.tight_layout()
+            fig.savefig(os.path.join(plot_dir, 'terminal_extensions_hist.png'), dpi=150)
+            plt.close(fig)
+
+    # 8. Coverage vs identity
+    if len(df_aln) > 0:
+        df_ci = df_aln[df_aln['old_mean_coverage'].notna() & (df_aln['old_mean_coverage'] > 0)].copy()
+        if len(df_ci) > 0:
+            fig, ax = plt.subplots(figsize=(8, 5))
+            ax.scatter(df_ci['old_mean_coverage'], df_ci['alignment_identity'] * 100,
+                       alpha=0.5, s=20)
+            ax.set_xscale('log')
+            ax.set_xlabel('Mean Coverage')
+            ax.set_ylabel('Pairwise Identity (%)')
+            ax.set_title('Coverage vs Assembly Identity')
+            fig.tight_layout()
+            fig.savefig(os.path.join(plot_dir, 'coverage_vs_identity.png'), dpi=150)
+            plt.close(fig)
+
+    log.info(f"Generated plots in {plot_dir}")
+
+
+def generate_markdown_report(df, sample_df, workspace_name, report_dir, plot_dir):
+    """Generate markdown report."""
+    pd, _ = get_deps()
+
+    total_samples = len(sample_df)
+    if sample_df.empty or 'old_assembly_count' not in sample_df.columns:
+        samples_with_assemblies = 0
+        samples_count_match = 0
+    else:
+        samples_with_assemblies = len(sample_df[sample_df['old_assembly_count'] > 0])
+        samples_count_match = len(sample_df[sample_df['assembly_count_match']])
+    samples_count_mismatch = total_samples - samples_count_match
+
+    total_assemblies = len(df)
+    if df.empty or 'alignment_identity' not in df.columns:
+        df_aln = pd.DataFrame()
+    else:
+        df_aln = df[df['alignment_identity'].notna()]
+
+    identical = len(df_aln[df_aln['alignment_identity'] >= 1.0]) if len(df_aln) > 0 else 0
+    near_identical = len(df_aln[(df_aln['alignment_identity'] >= 0.999) & (df_aln['alignment_identity'] < 1.0)]) if len(df_aln) > 0 else 0
+    minor_diff = len(df_aln[(df_aln['alignment_identity'] >= 0.99) & (df_aln['alignment_identity'] < 0.999)]) if len(df_aln) > 0 else 0
+    significant_diff = len(df_aln[df_aln['alignment_identity'] < 0.99]) if len(df_aln) > 0 else 0
+
+    if len(df_aln) > 0 and 'snp_count' in df_aln.columns:
+        with_snps = len(df_aln[df_aln['snp_count'] > 0])
+        with_indels = len(df_aln[df_aln['indel_count_events'] > 0])
+        with_ambig = len(df_aln[df_aln['ambiguity_diffs'] > 0])
+        with_terminal = len(df_aln[(df_aln['terminal_extensions_old'] > 0) | (df_aln['terminal_extensions_new'] > 0)])
+        total_snps = int(df_aln['snp_count'].sum())
+        total_indel_bp = int(df_aln['indel_count_bp'].sum())
+        total_indel_events = int(df_aln['indel_count_events'].sum())
+        total_ambig = int(df_aln['ambiguity_diffs'].sum())
+        total_terminal_bp_old = int(df_aln['terminal_extensions_old'].sum())
+        total_terminal_bp_new = int(df_aln['terminal_extensions_new'].sum())
+        total_terminal_events_old = int(df_aln['terminal_extension_events_old'].sum())
+        total_terminal_events_new = int(df_aln['terminal_extension_events_new'].sum())
+    else:
+        with_snps = with_indels = with_ambig = with_terminal = 0
+        total_snps = total_indel_bp = total_indel_events = total_ambig = 0
+        total_terminal_bp_old = total_terminal_bp_new = 0
+        total_terminal_events_old = total_terminal_events_new = 0
+
+    report_path = os.path.join(report_dir, f'report_{workspace_name}.md')
+    with open(report_path, 'w') as f:
+        f.write(f"# Regression Report: {workspace_name}\n\n")
+        f.write(f"## Summary\n\n")
+        f.write(f"| Metric | Value |\n")
+        f.write(f"|--------|-------|\n")
+        f.write(f"| Total samples compared | {total_samples} |\n")
+        f.write(f"| Samples with assemblies (old) | {samples_with_assemblies} |\n")
+        f.write(f"| Samples with matching assembly count | {samples_count_match} |\n")
+        f.write(f"| Samples with mismatched assembly count | {samples_count_mismatch} |\n")
+        f.write(f"| Total assembly comparisons | {total_assemblies} |\n")
+        f.write(f"| Assemblies aligned | {len(df_aln)} |\n\n")
+
+        f.write(f"## Assembly Identity\n\n")
+        f.write(f"| Category | Count | % |\n")
+        f.write(f"|----------|-------|---|\n")
+        if len(df_aln) > 0:
+            f.write(f"| Identical (100%) | {identical} | {100*identical/len(df_aln):.1f}% |\n")
+            f.write(f"| Near-identical (99.9-100%) | {near_identical} | {100*near_identical/len(df_aln):.1f}% |\n")
+            f.write(f"| Minor differences (99-99.9%) | {minor_diff} | {100*minor_diff/len(df_aln):.1f}% |\n")
+            f.write(f"| Significant differences (<99%) | {significant_diff} | {100*significant_diff/len(df_aln):.1f}% |\n")
+        f.write(f"\n")
+
+        f.write(f"## Variant Counts\n\n")
+        f.write(f"| Metric | Assemblies affected | Events | Bases |\n")
+        f.write(f"|--------|--------------------:|-------:|------:|\n")
+        f.write(f"| SNPs (A/C/G/T ↔ A/C/G/T) | {with_snps} | {total_snps} | {total_snps} |\n")
+        f.write(f"| Internal indels | {with_indels} | {total_indel_events} | {total_indel_bp} |\n")
+        f.write(f"| Ambiguity diffs (N ↔ A/C/G/T) | {with_ambig} | {total_ambig} | {total_ambig} |\n")
+        f.write(f"| Terminal extensions (old only) | {with_terminal} | {total_terminal_events_old} | {total_terminal_bp_old} |\n")
+        f.write(f"| Terminal extensions (new only) | {with_terminal} | {total_terminal_events_new} | {total_terminal_bp_new} |\n\n")
+
+        # Metrics summary
+        if len(df_aln) > 0:
+            f.write(f"## Metrics Summary\n\n")
+            f.write(f"| Metric | Median delta | Mean delta | Min | Max |\n")
+            f.write(f"|--------|-------------|------------|-----|-----|\n")
+            for col, label in [
+                ('delta_percent_reference_covered', '% Ref Covered'),
+                ('delta_mean_coverage', 'Mean Coverage'),
+                ('delta_assembly_length_unambiguous', 'Unambig Length'),
+            ]:
+                vals = df[col].dropna()
+                if len(vals) > 0:
+                    f.write(f"| {label} | {vals.median():.4f} | {vals.mean():.4f} | {vals.min():.4f} | {vals.max():.4f} |\n")
+            f.write(f"\n")
+
+        # Divergent assemblies
+        if len(df_aln) > 0 and 'alignment_identity' in df_aln.columns:
+            divergent = df_aln[df_aln['alignment_identity'] < 0.999].sort_values('alignment_identity')
+        else:
+            divergent = pd.DataFrame()
+        if len(divergent) > 0:
+            f.write(f"## Divergent Assemblies (identity < 99.9%)\n\n")
+            f.write(f"| Assembly ID | Tax Name | Identity | SNPs | Indel events (bp) | Ambig Diffs | Term Ext Old events (bp) | Term Ext New events (bp) |\n")
+            f.write(f"|-------------|----------|----------|------|-------------------|-------------|--------------------------|---------------------------|\n")
+            for _, row in divergent.iterrows():
+                indel_ev = int(row.get('indel_count_events', 0))
+                indel_bp = int(row.get('indel_count_bp', 0))
+                term_old_ev = int(row.get('terminal_extension_events_old', 0))
+                term_old_bp = int(row.get('terminal_extensions_old', 0))
+                term_new_ev = int(row.get('terminal_extension_events_new', 0))
+                term_new_bp = int(row.get('terminal_extensions_new', 0))
+                f.write(f"| {row['assembly_id']} | {row['tax_name']} | "
+                        f"{row['alignment_identity']*100:.3f}% | {row['snp_count']:.0f} | "
+                        f"{indel_ev} ({indel_bp}) | {row['ambiguity_diffs']:.0f} | "
+                        f"{term_old_ev} ({term_old_bp}) | "
+                        f"{term_new_ev} ({term_new_bp}) |\n")
+            f.write(f"\n")
+
+        # Assembly count mismatches
+        if 'assembly_count_match' not in sample_df.columns:
+            mismatches = pd.DataFrame()
+        else:
+            mismatches = sample_df[~sample_df['assembly_count_match']]
+        if len(mismatches) > 0:
+            f.write(f"## Assembly Count Mismatches\n\n")
+            f.write(f"| Sample | Old Count | New Count | Only Old | Only New |\n")
+            f.write(f"|--------|-----------|-----------|----------|----------|\n")
+            for _, row in mismatches.iterrows():
+                f.write(f"| {row['sample_id']} | {row['old_assembly_count']} | "
+                        f"{row['new_assembly_count']} | {row['assemblies_only_in_old']} | "
+                        f"{row['assemblies_only_in_new']} |\n")
+            f.write(f"\n")
+
+        # Plot references
+        f.write(f"## Plots\n\n")
+        plot_files = sorted(os.listdir(plot_dir)) if os.path.isdir(plot_dir) else []
+        for pf in plot_files:
+            if pf.endswith('.png'):
+                rel_plot_dir = os.path.basename(plot_dir)
+                f.write(f"![{pf}]({rel_plot_dir}/{pf})\n\n")
+
+    log.info(f"Report written to {report_path}")
+    return report_path
+
+
+def main():
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument('--results-dir', required=True, help='Directory with per-sample JSON results')
+    parser.add_argument('--report-dir', required=True, help='Output directory for report')
+    parser.add_argument('--workspace-name', required=True, help='Workspace name for report title')
+    args = parser.parse_args()
+
+    pd, plt = get_deps()
+
+    results = load_results(args.results_dir)
+    log.info(f"Loaded {len(results)} sample results")
+
+    # Build comparison table
+    comp_rows = build_comparison_table(results, args.workspace_name)
+    df = pd.DataFrame(comp_rows)
+    log.info(f"Total assembly comparisons: {len(df)}")
+
+    # Build sample summary
+    sample_rows = build_sample_summary(results)
+    sample_df = pd.DataFrame(sample_rows)
+
+    # Save summary TSV
+    os.makedirs(args.report_dir, exist_ok=True)
+    tsv_path = os.path.join(args.report_dir, f'summary_{args.workspace_name}.tsv')
+    if len(df) > 0:
+        df.to_csv(tsv_path, sep='\t', index=False)
+        log.info(f"Summary TSV written to {tsv_path}")
+
+    # Generate plots
+    plot_dir = os.path.join(args.report_dir, f'plots_{args.workspace_name}')
+    if len(df) > 0:
+        generate_plots(df, plot_dir)
+
+    # Generate markdown report
+    generate_markdown_report(df, sample_df, args.workspace_name, args.report_dir, plot_dir)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/.agents/skills/regression-testing/run_vadr.sh b/.agents/skills/regression-testing/run_vadr.sh
new file mode 100644
index 000000000..abb24ef88
--- /dev/null
+++ b/.agents/skills/regression-testing/run_vadr.sh
@@ -0,0 +1,70 @@
+#!/bin/bash
+# Run VADR on a single FASTA file.
+# Inputs (env vars set by dsub):
+#   FASTA      - input FASTA file (localized by dsub)
+#   VADR_OPTS  - vadr options string
+#   MIN_LEN    - minimum sequence length (optional)
+#   MAX_LEN    - maximum sequence length (optional)
+#   MODEL_URL  - URL to vadr model tarball
+#   MODEL_SUB  - subdirectory within model tarball (optional)
+# Outputs (env vars set by dsub):
+#   NUM_ALERTS - file to write num_alerts integer
+#   ALERTS_TSV - file to write alerts TSV
+#   VADR_TGZ   - file to write full vadr output tarball
+
+set -euo pipefail
+
+# Default optional env vars to empty
+MODEL_URL="${MODEL_URL:-}"
+MODEL_SUB="${MODEL_SUB:-}"
+MIN_LEN="${MIN_LEN:-}"
+MAX_LEN="${MAX_LEN:-}"
+
+BASENAME=$(basename "${FASTA}" .fasta)
+
+# Download and unpack VADR models
+if [ -n "${MODEL_URL}" ]; then
+  mkdir -p vadr-untar
+  curl -fsSL "${MODEL_URL}" | tar -C vadr-untar -xzf -
+  MODEL_DIR=$(find vadr-untar -mindepth 1 -maxdepth 1 -type d | head -1)
+  ln -s "${MODEL_DIR}" vadr-models
+else
+  ln -s /opt/vadr/vadr-models vadr-models
+fi
+
+if [ -n "${MODEL_SUB}" ]; then
+  VADR_MODEL_DIR="vadr-models/${MODEL_SUB}"
+else
+  VADR_MODEL_DIR="vadr-models"
+fi
+
+# Build trim args
+TRIM_ARGS=""
+if [ -n "${MIN_LEN}" ]; then
+  TRIM_ARGS="${TRIM_ARGS} --minlen ${MIN_LEN}"
+fi
+if [ -n "${MAX_LEN}" ]; then
+  TRIM_ARGS="${TRIM_ARGS} --maxlen ${MAX_LEN}"
+fi
+
+# Remove terminal ambiguous nucleotides
+/opt/vadr/vadr/miniscripts/fasta-trim-terminal-ambigs.pl \
+  "${FASTA}" ${TRIM_ARGS} > "${BASENAME}.trimmed.fasta"
+
+# Run VADR
+v-annotate.pl \
+  ${VADR_OPTS} \
+  --split --cpu $(nproc) \
+  --mdir "${VADR_MODEL_DIR}" \
+  "${BASENAME}.trimmed.fasta" \
+  "${BASENAME}"
+
+# Package outputs
+tar -C "${BASENAME}" -czf "${VADR_TGZ}" .
+
+# Extract alerts
+cut -f 5 "${BASENAME}/${BASENAME}.vadr.alt.list" | tail -n +2 > alerts.tsv
+cp alerts.tsv "${ALERTS_TSV}"
+wc -l < alerts.tsv > "${NUM_ALERTS}"
+
+echo "VADR complete. Alerts: $(cat "${NUM_ALERTS}")"
diff --git a/.claude/rules/container-vulns.md b/.claude/rules/container-vulns.md
new file mode 100644
index 000000000..aa7af5192
--- /dev/null
+++ b/.claude/rules/container-vulns.md
@@ -0,0 +1,12 @@
+---
+paths:
+  - "docker/**"
+  - ".trivyignore"
+  - ".trivy-ignore-policy.rego"
+  - "vulnerability-mitigation-status.md"
+  - ".github/workflows/container-scan.yml"
+  - ".github/workflows/docker.yml"
+---
+
+For container vulnerability management guidance, see
+.agents/skills/container-vulns/SKILL.md
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
index d899a92d5..85a464535 100644
--- a/.github/copilot-instructions.md
+++ b/.github/copilot-instructions.md
@@ -29,3 +29,4 @@ docker run --rm \
 | [docker/](../docker/) | Dockerfiles and requirements |
 | [src/viral_ngs/](../src/viral_ngs/) | Source code |
 | [tests/](../tests/) | Test files |
+| [.agents/skills/](../.agents/skills/) | Reusable agent playbooks and scripts |
diff --git a/AGENTS.md b/AGENTS.md
index 88da6ab20..9d8dae00c 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -504,72 +504,23 @@ micromamba search -c bioconda <package> --subdir linux-aarch64
 
 ---
 
-## Container Vulnerability Management
-
-### Scanning
-
-Container images are scanned for vulnerabilities using [Trivy](https://aquasecurity.github.io/trivy/):
-
-- **On every PR/push**: `docker.yml` scans each image flavor after build (SARIF → GitHub Security tab, JSON → artifact)
-- **Weekly schedule**: `container-scan.yml` scans the latest published images
-- Scans filter to **CRITICAL/HIGH** severity, **ignore-unfixed**, and apply a Rego policy (`.trivy-ignore-policy.rego`)
-- Per-CVE exceptions go in `.trivyignore` with mandatory justification comments
-
-### Rego Policy (`.trivy-ignore-policy.rego`)
-
-The Rego policy filters CVEs that are architecturally inapplicable to ephemeral batch containers:
-
-- **AV:P** (Physical access required) — containers are cloud-hosted
-- **AV:A** (Adjacent network required) — no attacker on same network segment
-- **AV:L + UI:R** (Local + user interaction) — no interactive sessions
-- **AV:L + PR:H** (Local + high privileges) — containers run non-root
-- **AV:L + S:U** (Local + scope unchanged) — attacker already has code execution and impact stays within the ephemeral container
-
-Changes to this policy should be reviewed carefully. The comments in the file explain the rationale and risk for each rule.
-
-### Common Vulnerability Sources
-
-**Python transitive deps**: Pin minimum versions in `docker/requirements/*.txt`. Prefer conda packages over pip. Check conda-forge availability before assuming a version exists — conda-forge often lags PyPI by days/weeks.
+## Reusable Agent Skills
 
-**Java fat JARs** (picard, gatk, snpeff, fgbio): Bioinformatics Java tools are distributed as uber JARs with all dependencies bundled inside. Trivy detects vulnerable libraries (log4j, commons-compress, etc.) baked into these JARs. Version bumps can cause ARM64 conda solver conflicts because Java tools pull in openjdk → harfbuzz → icu version chains that clash with other packages (r-base, boost-cpp, pyicu). Always check:
-1. Whether the tool is actually flagged by Trivy (don't bump versions unnecessarily)
-2. Whether the CVE applies (e.g., log4j 1.x is NOT vulnerable to Log4Shell)
-3. Whether the desired version resolves on ARM64 before pushing
+Established workflows and playbooks live in `.agents/skills/`. Each skill
+directory contains a `SKILL.md` with the playbook and companion scripts.
+Check there before building analysis pipelines from scratch.
 
-**Go binaries**: Some conda packages bundle compiled Go binaries (e.g., mafft's `dash_client`, google-cloud-sdk's `gcloud-crc32c`). If the binary is unused, delete it in the Dockerfile. Delete from **both** the installed location and `/opt/conda/pkgs/*/` (conda package cache) — Trivy scans the full filesystem.
+Available skills:
+- **regression-testing** -- End-to-end assembly regression testing against Terra submissions
+- **dsub-batch-jobs** -- Running one-off compute jobs on GCP via dsub
+- **container-vulns** -- Container vulnerability scanning, triage, and mitigation
 
-**Vendored copies**: Packages like google-cloud-sdk and setuptools bundle their own copies of Python libraries that may be older than what's in the conda environment. Trivy flags these vendored copies separately. Options: delete the vendored directory (if not needed at runtime), or accept the risk in `.trivyignore` with justification.
-
-### ARM64 Solver Conflicts
-
-The conda solver on ARM64 (linux-aarch64) is more constrained than amd64 because fewer package builds exist. Common conflict patterns:
-
-- **icu version conflicts**: Many packages (openjdk, r-base, boost-cpp, pyicu) pin specific icu version ranges. Bumping one package can make the entire environment unsolvable.
-- **libdeflate/htslib conflicts**: lofreq 2.1.5 pins old htslib/libdeflate versions that conflict with newer pillow/libtiff.
-- **openjdk version escalation**: snpeff 5.2+ requires openjdk>=11, 5.3+ requires openjdk>=21. Higher openjdk versions pull in harfbuzz→icu chains that conflict with everything.
-
-When a solver conflict occurs: revert the change, check what version the solver was picking before, and pin to that exact version if it already addresses the CVE.
-
-### Mitigation Decision Process
-
-When triaging a CVE:
-
-1. **Check the CVSS vector** — does the Rego policy already filter it?
-2. **Identify the source package** — use Trivy JSON output (`PkgName`, `PkgPath`, `InstalledVersion`)
-3. **Check if a fix version exists on conda-forge/bioconda** — not just on PyPI
-4. **Test on ARM64** — solver conflicts are the most common failure mode
-5. **If the fix version conflicts**: consider whether the CVE is exploitable in your deployment model. Document the risk assessment in `.trivyignore` or `vulnerability-mitigation-status.md`.
-6. **If the vulnerable code is unused**: delete the binary/file inline in the Dockerfile (same RUN layer as install to avoid bloating images)
+---
 
-### Key Files
+## Container Vulnerability Management
 
-| File | Purpose |
-|------|---------|
-| `.trivy-ignore-policy.rego` | Rego policy for class-level CVE filtering |
-| `.trivyignore` | Per-CVE exceptions with justifications |
-| `.github/workflows/docker.yml` | Build-time scanning (SARIF + JSON) |
-| `.github/workflows/container-scan.yml` | Weekly scheduled scanning |
-| `vulnerability-mitigation-status.md` | Local-only tracking doc (not committed) |
+See `.agents/skills/container-vulns/SKILL.md` for detailed guidance on vulnerability
+scanning, triaging CVEs, Rego policy, and ARM64 solver conflicts.
 
 ---
 
diff --git a/CLAUDE.md b/CLAUDE.md
index 5e0d072cf..71dc1d2e7 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -29,3 +29,4 @@ docker run --rm \
 | [docker/](docker/) | Dockerfiles and requirements |
 | [src/viral_ngs/](src/viral_ngs/) | Source code |
 | [tests/](tests/) | Test files |
+| [.agents/skills/](.agents/skills/) | Reusable agent playbooks and scripts |