Unify, merge, inspect, query, compare, and export structural variants across callers and samples.
Note
What's New in v0.4.0 — OctopuSV now provides a more complete SVCF operation layer for validation, inspection, querying, filtering, subsetting, normalization, and standard-format conversion.
New SVCF inspection and validation commands
octopusv header— show the header and metadata contract of an SVCF/VCF file.octopusv validate-svcf— validate an SVCF file against the OctopuSV SVCF contract, including required fields, evidence structure, and provenance consistency.octopusv inspect— inspect one or more SVCF records by ID and render parsed endpoints, span,SOURCES,SOURCE_IDS, and per-caller or per-sample evidence blocks.octopusv inspect --id-file ... --jsonl— export inspected records as JSONL for pipelines and structured downstream workflows.
New SVCF filtering and querying commands
octopusv query— query SVCF records by genomic targets while preserving SVCF structure.octopusv filter— filter SVCF records by SV-level attributes while preserving SVCF structure.octopusv subset— subset SVCF sample/caller evidence columns while preserving SVCF structure.
New SVCF normalization command
octopusv normalize-contigs— normalize standard SVCF contig names without changing coordinates.
Important fixes and improvements
- Fixed sample-mode merge column ordering so evidence blocks remain aligned with the
#CHROMsample order. - Improved merge provenance handling for
SOURCESandSOURCE_IDS. - Improved generated SVCF headers and default INFO definitions.
- Improved
octopusv svcf2vcfso converted VCF records now preserveSOURCESandSOURCE_IDSin theINFOfield. - Improved CLI help organization and user-facing messages, including clearer handling of conflicting merge strategy options.
Tip
Genome-wide SV visualization from v0.3.5 — octopusv plot-circos draws a genome-wide SV Circos overview from an SVCF file, with an inner link layer for DEL/DUP/INV/TRA events and an outer breakpoint-density histogram.
octopusv plot-circos -i input.svcf -o circos.pngImportant
Always use the latest version for best results.
conda install bioconda::octopusvPrevious releases
- v0.3.3 — Improved
octopusv correctsupport for multi-sample VCFs, including joint-called outputs from GRIDSS, DELLY, and related callers. - v0.3.2 — Added
octopusv clean, which sanitizes broken VCFs so strict tools such as Truvari and bcftools can parse them. - v0.3.1 — Added native GRIDSS support.
octopusv correctresolves paired BND records into standard SV types directly, without external preprocessing.
OctopuSV addresses four key challenges in structural variant (SV) analysis:
- Smart BND standardization — Converts paired BND records into standard SV types (DEL/INV/DUP/INS/TRA), while preserving potential complex rearrangements as BNDs. Works out of the box with BND-heavy callers such as GRIDSS and SvABA.
- Multi-caller integration — Merges SVs from different tools such as Manta, Delly, GRIDSS, Sniffles, PBSV, SVIM, CuteSV, and others with flexible support-based or Boolean strategies.
- Multi-sample integration — Compares and analyzes SVs across samples or cohorts with structure-preserving sample-level merging.
- SVCF-centered operations — Validates, inspects, queries, filters, subsets, normalizes, visualizes, and exports merged SV records while preserving caller/sample provenance.
Whether you are analyzing single samples, cohorts, or tumor/normal pairs, OctopuSV standardizes your workflow from raw SV calls to consistent SVCF and standard downstream-compatible outputs.
OctopuSV converts SV caller VCF outputs into a unified intermediate format (SVCF), enabling consistent merging, comparison, inspection, and conversion across callers and samples. Results can be exported back to standard VCF, BED, or BEDPE formats.
flowchart TD
A["Raw VCFs from multiple SV callers<br/>(Manta · Delly · GRIDSS · Sniffles · PBSV · ...)"] -->|octopusv correct| B["Unified SVCF format"]
B -->|octopusv merge| C["Merged SVCF<br/>multi-caller / multi-sample"]
C -->|validate / inspect| D["Checked and inspectable<br/>SVCF records"]
C -->|query / filter / subset| E["Selected SVCF records<br/>structure preserved"]
C -->|stat / plot / plot-circos| F["Statistics and visualizations"]
C -->|svcf2vcf / svcf2bed / svcf2bedpe| G["Standard output formats"]
B -->|octopusv somatic| H["Somatic SVCF<br/>tumor-specific SVs"]
B -->|octopusv clean| I["Truvari-ready VCF.gz<br/>sanitized + indexed"]
style A fill:#f5f5f5,stroke:#999
style B fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style C fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
style D fill:#fff3e0,stroke:#f57c00
style E fill:#fff3e0,stroke:#f57c00
style F fill:#ede7f6,stroke:#673ab7
style G fill:#e0f7fa,stroke:#00838f
style H fill:#fce4ec,stroke:#c2185b
style I fill:#fff8e1,stroke:#f9a825
Why SVCF? Different SV callers implement VCF inconsistently: varying field names, BND notations, coordinate conventions, and sample/caller evidence fields. SVCF reduces these compatibility issues by providing a unified intermediate format for practical SV operations.
# Step 1: Standardize caller outputs
octopusv correct manta_output.vcf manta.svcf
octopusv correct gridss_output.vcf gridss.svcf
octopusv correct sniffles_output.vcf sniffles.svcf
# Step 2: Merge and analyze with a consistent format
octopusv merge -i manta.svcf gridss.svcf sniffles.svcf -o merged.svcf --min-support 2
octopusv validate-svcf -i merged.svcf
octopusv inspect -i merged.svcf --id Sniffles2.INS.1DS0
# Step 3: Convert back to standard formats
octopusv svcf2vcf -i merged.svcf -o final_results.vcf
octopusv svcf2bedpe -i merged.svcf -o final_results.bedpe📋 SVCF Format Details: See the SVCF specification document for technical details.
Long-read callers: Sniffles, Severus, SVDSS, DeBreak, SVIM, CuteSV, PBSV, nanomonsv
Short-read callers: Manta, Delly, GRIDSS, Lumpy, SvABA, Octopus, CLEVER
CNV callers: Dragen CNV, with automatic conversion of CNV records to DEL/DUP when appropriate
Support for additional callers continues to expand.
conda install bioconda::octopusvOr with mamba for faster dependency resolution:
mamba install bioconda::octopusvBioconda installation includes the required Python dependencies and command-line tools used by OctopuSV workflows.
pip install octopusvNote
The octopusv clean subcommand requires bcftools, bgzip, and tabix as external tools. If you installed OctopuSV via pip, install them separately:
conda install -c bioconda bcftools htslibIf you installed OctopuSV via Bioconda, these tools are already included.
docker pull quay.io/biocontainers/octopusv:<tag>See octopusv/tags for available container tags.
git clone https://github.com/ylab-hi/OctopuSV.git
cd OctopuSV
mamba env create -f environment.yaml
mamba activate octopusv
poetry installoctopusv correct converts raw SV caller output into standardized SVCF format. This includes resolving paired BND records into concrete SV types and detecting insertions from BND pairs with long inserted sequences.
# Basic correction
octopusv correct input.vcf output.svcf
# With position tolerance control for BND pairing
octopusv correct -i input.vcf -o output.svcf --pos-tolerance 5
# Apply quality filters
octopusv correct -i input.vcf -o output.svcf --min-svlen 50 --max-svlen 100000 --filter-passoctopusv merge combines standardized SVCF files using flexible support and set-operation strategies.
# Intersection: SVs found by all input files
octopusv merge -i manta.svcf sniffles.svcf pbsv.svcf -o intersection.svcf --intersect
# Union: SVs found by any input file
octopusv merge -i caller1.svcf caller2.svcf caller3.svcf -o union.svcf --union
# Minimum support: SVs supported by at least N callers or samples
octopusv merge -i a.svcf b.svcf c.svcf d.svcf -o supported.svcf --min-support 3
# Specific input: SVs unique to one caller or sample
octopusv merge -i manta.svcf sniffles.svcf -o manta_specific.svcf --specific manta.svcf
# Complex Boolean logic: A and B but not C or D
octopusv merge -i A.svcf B.svcf C.svcf D.svcf \
--expression "(A AND B) AND NOT (C OR D)" -o filtered.svcf
# Multi-sample mode with custom names
octopusv merge -i sample1.svcf sample2.svcf sample3.svcf \
--mode sample --sample-names Patient1,Patient2,Patient3 \
--min-support 2 -o cohort.svcf
# Generate an UpSet plot
octopusv merge -i a.svcf b.svcf c.svcf -o merged.svcf --intersect \
--upsetr --upsetr-output venn_diagram.pngOctopuSV v0.4.0 adds inspection and validation commands for checking SVCF structure, provenance, and individual merged records.
# Show the header and metadata contract
octopusv header -i merged.svcf
# Validate SVCF structure and provenance consistency
octopusv validate-svcf -i merged.svcf
# Inspect one merged SV record by ID
octopusv inspect -i merged.svcf --id Sniffles2.INS.1DS0
# Inspect multiple records and export JSONL
octopusv inspect -i merged.svcf --id-file candidate_ids.txt --jsonl > records.jsonlinspect reports parsed endpoints, span, SOURCES, SOURCE_IDS, and per-caller or per-sample evidence blocks. This is useful for checking merged records before conversion, visualization, benchmarking, or other downstream workflows.
OctopuSV v0.4.0 provides structure-preserving SVCF operations. These commands keep the SVCF header, INFO fields, FORMAT fields, sample/caller columns, and provenance fields consistent.
# Query records by genomic region
octopusv query -i merged.svcf --region chr1:1000000-2000000 -o region_hits.svcf
# See all target-query options, including feature-based query modes
octopusv query -h
# Filter records by SV type
octopusv filter -i merged.svcf --svtype DEL --svtype DUP -o del_dup.svcf
# Filter records by support
octopusv filter -i merged.svcf --min-support 2 -o support2.svcf
# Subset sample/caller evidence columns
octopusv subset -i merged.svcf --sample sampleA --sample sampleB -o subset.svcfThese operations are SVCF-aware: they preserve both breakpoints, CHR2/END, caller/sample provenance, and merged evidence columns.
Use normalize-contigs when input files use different standard chromosome naming styles, such as chr1 versus 1.
# Normalize standard contig names without changing coordinates
octopusv normalize-contigs -i merged.svcf -o merged.normalized.svcfThis command normalizes standard contig names only. It does not lift over coordinates or alter breakpoint positions.
Use any SV caller to analyze tumor and normal samples separately, then let OctopuSV find tumor-specific variants. This works even with callers not designed specifically for cancer analysis.
# Basic somatic calling
octopusv somatic -t tumor.svcf -n normal.svcf -o somatic.svcf
# With custom matching parameters
octopusv somatic -t tumor.svcf -n normal.svcf -o somatic.svcf \
--max-distance 100 --min-jaccard 0.8
# Convert to standard VCF for downstream analysis
octopusv svcf2vcf -i somatic.svcf -o somatic.vcfExample multi-caller somatic workflow:
# Standardize tumor calls from multiple callers
octopusv correct manta_tumor.vcf manta_tumor.svcf
octopusv correct delly_tumor.vcf delly_tumor.svcf
octopusv correct gridss_tumor.vcf gridss_tumor.svcf
# Keep SVs supported by at least 2 out of 3 callers
octopusv merge -i manta_tumor.svcf delly_tumor.svcf gridss_tumor.svcf \
-o high_confidence_somatic.svcf --min-support 2Some callers produce VCFs that are technically valid but break strict parsers such as Truvari or bcftools due to missing header definitions, illegal characters in INFO fields, inconsistent chromosome naming, missing GT, or missing SVLEN.
octopusv clean fixes these issues without filtering variants, producing a sorted, bgzipped, tabix-indexed VCF ready for downstream benchmarking.
# Basic clean without chromosome harmonization
octopusv clean broken.vcf fixed.vcf.gz
# With reference FASTA for chromosome name harmonization
octopusv clean broken.vcf fixed.vcf.gz -g /path/to/reference.fa
# Typical workflow before Truvari benchmark
octopusv clean calls.vcf calls_clean.vcf.gz -g GRCh38.fa
truvari bench -b truth.vcf.gz -c calls_clean.vcf.gz -f GRCh38.fa -o bench_results/What clean fixes:
- Removes
RNAMESfield and sanitizes illegal characters in INFO - Fills missing
SVLENbased onSVTYPEandEND - Ensures
GTis the first FORMAT field with a valid value - Auto-generates missing INFO/FORMAT header definitions
- Harmonizes chromosome names against a reference FASTA when
-gis provided - Sorts, bgzips, and tabix-indexes the output
octopusv benchmark truth.vcf calls.svcf \
-o benchmark_results \
--reference-distance 500 \
--size-similarity 0.7 \
--reciprocal-overlap 0.0 \
--size-min 50 --size-max 50000# Basic stat collection
octopusv stat -i input.svcf -o stats.txt
# Add an HTML report
octopusv stat -i input.svcf -o stats.txt --report
# Plot figures from stats
octopusv plot stats.txt -o figure_prefixThe --report flag outputs an interactive HTML report covering SV type and size distributions, chromosome breakdowns, quality score summaries, genotype features, and depth features.
octopusv plot-circos draws a whole-genome SV landscape directly from an SVCF: an inner link layer for DEL/DUP/INV/TRA and an outer breakpoint-density histogram. It is useful for spotting chromosome-level breakpoint clustering and complex-rearrangement regions at a glance.
# Basic Circos overview
octopusv plot-circos -i input.svcf -o circos.png
# Plot only translocations
octopusv plot-circos -i input.svcf -o circos_tra.png --tra-only
# Use a custom reference .fai for chromosome sizes
octopusv plot-circos -i input.svcf -o circos.png --fai reference.fa.faiINS is excluded from links by default. Events larger than --intra-max-span are written to an oversized-intra table next to the figure for manual inspection. See octopusv plot-circos -h for all options, including support thresholds, span filters, per-type toggles, and arc styling.
# To BED
octopusv svcf2bed -i input.svcf -o output.bed
# To BEDPE
octopusv svcf2bedpe -i input.svcf -o output.bedpe
# To standard VCF
octopusv svcf2vcf -i input.svcf -o output.vcfoctopusv svcf2vcf generates VCF4.2-compatible output. In v0.4.0 and later, converted VCF records preserve SOURCES and SOURCE_IDS in the INFO field so caller/sample provenance remains visible after conversion.
OctopuSV generates publication-ready visualizations:
If you use OctopuSV in your research, please cite:
Guo, Qingxiang, Yangyang Li, Ting-You Wang, Abhi Ramakrishnan, and Rendong Yang. "OctopuSV and TentacleSV: a one-stop toolkit for multi-sample, cross-platform structural variant comparison and analysis." Bioinformatics (2025): btaf599. doi: https://doi.org/10.1093/bioinformatics/btaf599
@article{guo2025octopusv,
title={OctopuSV and TentacleSV: a one-stop toolkit for multi-sample, cross-platform structural variant comparison and analysis},
author={Guo, Qingxiang and Li, Yangyang and Wang, Ting-You and Ramakrishnan, Abhi and Yang, Rendong},
journal={Bioinformatics},
pages={btaf599},
year={2025},
publisher={Oxford University Press}
}If you find OctopuSV useful, a ⭐ on GitHub helps others discover the project.
See the companion pipeline: TentacleSV
We welcome issues, suggestions, and pull requests.
git clone https://github.com/ylab-hi/OctopuSV.git
cd OctopuSV
mamba env create -f environment.yaml
mamba activate octopusv
poetry install
pre-commit run -a- GitHub Issues: https://github.com/ylab-hi/OctopuSV/issues
- Email: qingxiang.guo@northwestern.edu
- Email: yangyang.li@northwestern.edu






