-
Notifications
You must be signed in to change notification settings - Fork 215
Roadmap
Adam Novak edited this page Jan 5, 2026
·
78 revisions
This document sets out the high-level tasks which the vg development team hopes to accomplish in the next few versions of vg and beyond.
- Off-linear-reference-backbone pangenome-aware DV (and supporting vg ecosystem libraries and docs)
- Make short-read Giraffe work as well as DRAGEN with re-optimization
- Long read Giraffe supplementary alignments, for assembly mapping
- API stability guarantees for vg libraries (like htslib, 1-2 releases per year requiring a code change)
- Index changes only in major releases, dev vs. maintenance branch separation
- Completely new paradigm for pangenome variant calling
- Engineering tasks we actually do need to do
- Integrated indel realignment in vg
- Drop pinchesAndCacti and sonlib (Adam)
- Drop Cactus-library-based snarl finder
- User-facing, under-test Giraffe docs for HPRC graphs (Faith)
These are things we would like to do eventually.
- Actual tutorials and examples for using libvgio/libbdsg for DV use cases (see pysam's progressively increasing examples)
- Better developer documentation (model: pysam docs) for vg library,
- Links from libvgio Doxygen section to Protobuf-derived doc pages
- More utilities/use cases in vg libraries (libhandlegraph/libvgio)
- Query the graph (for nodes/edges)
- GRCh38 location -> retrieve aligned reads near there
- Get read attributes and know what they mean
- Figure out if we need a
libsnarlsactually (low priority) - Giraffe optimizations/presets for Ultima single-end short reads
- Delete old snarl manager (except as a backward-compatible load?) and use snarl-only DI2 everywhere (Glenn did a little)
- No cruft in
vg index#3144 - Releases for vg libraries (libhandlegraph, libbdsg, libvgio)
- Enhanced release acceptance testing so a release means production quality, like we would do for a paper. Proven to be able to run some number of genomes. (Tag paper-validated hashes as release?)
- API stability guarantees for vg libraries (like htslib, 1-2 releases per year requiring a code change)
- Libraries free of cryptic error messages when the user or inputs are wrong (Faith made some progress)
- Just front-end the autoindex system with a sensible CLI for single indexes in
vg index. How do we deprecate things? 3 levels of command? Clear input vs. output distinction. - Example tutorial under test for idiomatic Giraffe on long-node rGFA on long reads with auto-chopping (autoindex already does the work probably) and close #3126
- Queries (
vg find) in rGFA space
- Queries (
- Completely new paradigm for pangenome variant calling so we don't need population-specific reference hacks
- Variant calling against rGFA references, allowing new variants all over
- Think about PanGenie fitting in here
- Will need to involve non-independent read mapping
- Solve CNVs with EM
-
Eliminate
vg::VG- Steal all the things only it can do away from it
-
Default everything to GAF instead of GAM
- mpGAF
- Also pgvf (Graph to graph)
- Calls and snarls in one of these?
-
Python bindings for
libhandlegraphalgorithms- Are they the right algorithms?
-
Use of MCMC techniques in the genotyper with multipath alignments
-
vg deconstruct and Beagle to impute genotypes into partially-mapped and called data, as a PanGenie alternative (Erik, Andrea)
-
Alignment
- Adoption of the multipath alignment paradigm as the default
- Graph-to-graph mapping (Xian)
-
Variant Calling
- Implementation of an HHGA-like machine learning based variant caller
- Integration of variant calling and assembly polishing processes
- Prune the zoo of TraversalFinders, and expose the useful ones to Python
-
Visualization
- Browser-free tube map
- Better tube map handling of edge cases
- No haplotypes on a node
- Starting on a rare haplotype
-
Infrastructure
- Destructively modernize and unify IO
- Eliminate VPKG framing if possible in favor of magic numbers everywhere
- Resolve ensuing questions about GAM format
- Just use GAF?
- Handle things like GFA that need to manually sniff
- Resolve ensuing questions about GAM format
- Just save from the object; no more
save_handle_graph - Magic format registration for
libvgiomagic numbers for loading - Depend on
libvgioinlibbdsgto do the IO there and pick the right handle graph implementation
- Eliminate VPKG framing if possible in favor of magic numbers everywhere
- Replace Protobuf internal formats with faster ones
- Revision of ID assignment logic to allow deterministic node breaking
- Accept gzipped GFA if practical (can't mmap)
- Improved HandleGraph API
- Abstract away node boundaries
- View all sequence as C++17 string_views instead of sequence-owning strings
- O(1) reverse complement DNAStringView
- CMake-ify the main vg build
- Eliminate old systems and their associated submodules, or factor them out into their own projects
-
vg vectorizecould be its own project- Update
vg vectorizeto modern, system Vowpal Wabbit - Or pull it out into its own submodule and remove Vowpal Wabbit dependency from vg
- Update
-
vg genotype -
vg srpe
-
- More cross-language support
- Interoperate with Rust handle graph users/providers
- Interoperate with Java handle graph users/providers
- Destructively modernize and unify IO