-
Notifications
You must be signed in to change notification settings - Fork 208
Description
First of all, thank you for your incredible tool. I'm relatively new to using vg, and while it's powerful, it's also quite complex. I'm not sure if what I'm encountering is a real bug or a problem specific to my workflow, but it's been extremely frustrating — I've tried everything I could think of without success.
1. What were you trying to do?
I'm trying to run a reference-based mapping pipeline using vg map and vg pack on a pangenome graph (GFA → VG) of circular single-stranded viruses of 3-4 kb apron. The pipeline maps long Nanopore reads to the graph and then uses vg pack to compute per-node coverage and edit information to summarize support for each path (reference genome).
2. What did you want to happen?
I expected vg pack -e -d to generate a coverage.tsv file containing both coverage and edit columns, so I can use the downstream R script that requires this information to rank which reference path was best supported by each sample.
3. What actually happened?
vg pack fails when the -e option is provided.
The command:
vg pack -x graph.xg -g sample.sorted.gam -o sample.pack
vg pack -x graph.xg -i sample.pack -e -d > sample_coverage.tsv
Produces the following error:
Annotation Filter: 0
Incorrectly Mapped Filter: 0
Max Reads Filter: 0
break into sorted chunks [========================================================================================]100.0%
merge 6 files [========================================================================================]100.0%
[Packing]
[libprotobuf ERROR google/protobuf/wire_format_lite.cc:577] String field 'vg.Edit.sequence' contains invalid UTF-8 data when parsing a protocol buffer. Use the 'bytes' type if you intend to send raw bytes.
[libprotobuf ERROR google/protobuf/wire_format_lite.cc:577] String field 'vg.Edit.sequence' contains invalid UTF-8 data when parsing a protocol buffer. Use the 'bytes' type if you intend to send raw bytes.
[libprotobuf ERROR google/protobuf/wire_format_lite.cc:577] String field 'vg.Edit.sequence' contains invalid UTF-8 data when parsing a protocol buffer. Use the 'bytes' type if you intend to send raw bytes.
terminate called after throwing an instance of 'j2pb_error'
what(): sequence: Fail to convert to json
━━━━━━━━━━━━━━━━━━━━
Crash report for vg v1.65.0 "Carfon"
Caught signal 6 raised at address 0x22dc7ec; tracing with backward-cpp
Stack trace (most recent call last):
#15 Object "/home/fmartino/miniconda3/envs/anellome/bin/vg", at 0x6635d4, in _start
#14 Object "/home/fmartino/miniconda3/envs/anellome/bin/vg", at 0x2298ab6, in __libc_start_main
#13 Object "/home/fmartino/miniconda3/envs/anellome/bin/vg", at 0x2297219, in __libc_start_call_main
#12 Object "/home/fmartino/miniconda3/envs/anellome/bin/vg", at 0xf41e0b, in vg::subcommand::Subcommand::operator()(int, char**) const
#11 Object "/home/fmartino/miniconda3/envs/anellome/bin/vg", at 0xef9ff2, in main_pack(int, char**)
#10 Object "/home/fmartino/miniconda3/envs/anellome/bin/vg", at 0x14129ac, in vg::Packer::as_table(std::ostream&, bool, std::vector<long long, std::allocator<long long> >)
#9 Object "/home/fmartino/miniconda3/envs/anellome/bin/vg", at 0x1e8c200, in pb2json[abi:cxx11](google::protobuf::Message const&)
#8 Object "/home/fmartino/miniconda3/envs/anellome/bin/vg", at 0x1e8c0c5, in _pb2json(google::protobuf::Message const&)
#7 Object "/home/fmartino/miniconda3/envs/anellome/bin/vg", at 0x5ffd79, in _field2json(google::protobuf::Message const&, google::protobuf::FieldDescriptor const*, unsigned long) [clone .cold]
#6 Object "/home/fmartino/miniconda3/envs/anellome/bin/vg", at 0x21d1038, in __cxa_throw
#5 Object "/home/fmartino/miniconda3/envs/anellome/bin/vg", at 0x21d0ed6, in std::terminate()
#4 Object "/home/fmartino/miniconda3/envs/anellome/bin/vg", at 0x21d0e6b, in __cxxabiv1::__terminate(void (*)())
#3 Object "/home/fmartino/miniconda3/envs/anellome/bin/vg", at 0x61f42b, in __gnu_cxx::__verbose_terminate_handler() [clone .cold]
#2 Object "/home/fmartino/miniconda3/envs/anellome/bin/vg", at 0x621b73, in abort
#1 Object "/home/fmartino/miniconda3/envs/anellome/bin/vg", at 0x22afc95, in raise
#0 Object "/home/fmartino/miniconda3/envs/anellome/bin/vg", at 0x22dc7ec, in __pthread_kill
Library locations:
ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug.
━━━━━━━━━━━━━━━━━━━━
Context dump:
Thread 0: Starting 'pack' subcommand
Found 1 threads with context.
━━━━━━━━━━━━━━━━━━━━
Please include this entire error log in your bug report!
━━━━━━━━━━━━━━━━━━━━
- I verified the GAM file with:
vg view -a sample.gam | grep -P '[^\x00-\x7F]'→ no non-UTF8 characters detected. - I rebuilt the graph using
vg pruneto remove any problematic snarls. - I tested with and without
vg filter -q,-r,-P, etc. to ensure only good alignments were passed tovg pack. - I confirmed that
vg packworks fine without the-eflag (generates .pack and basic coverage table). - I tried
vg pack -e -ddirectly from the GAM file (skipping .pack) and got the same error. - I tested using multiple GAMs from different samples: same issue only when
-eis used. - I updated protobuf and rebuilt
vgin a clean environment, just in case it was a protobuf compatibility issue.
So far, only dropping -e prevents the crash, but then I can't compute edit distances required for downstream analysis.
This occurs only when -e is used. The .gam file is generated successfully and seems valid (I can run vg view, vg gamsort, vg pack without -e, and mapq summaries all work fine). Without -e, the pipeline completes, but I lose the edit column required for scoring.
4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:
N/A. No stacktrace file was generated.
5. What data and command can the vg dev team use to make the problem happen?
I’m mapping real Nanopore data against a circular viral GFA-based graph with pruned paths and xg/gcsa/gbwt built from it.
The relevant commands are:
vg convert -g ref.gfa > ref.raw.vg
vg prune ref.raw.vg > ref.vg
vg index -x ref.xg ref.vg
vg gbwt -x ref.xg -o ref.gbwt -P --pass-paths
vg index -g ref.gcsa ref.xg
vg map -f reads.fastq.gz -x ref.xg -g ref.gcsa -d ref -1 ref.gbwt -m long > sample.gam
vg gamsort sample.gam -p > sample.sorted.gam
# This works
vg pack -x ref.xg -g sample.sorted.gam -o sample.pack
vg pack -x ref.xg -i sample.pack -d > sample_coverage.tsv
# This fails
vg pack -x ref.xg -i sample.pack -e -d > sample_coverage.tsv
I tried rebuilding the graph with vg prune, confirmed that the GAM file is valid with vg view -a, and tested with and without vg filter. None of that fixed the crash with -e.
6. What does running vg version say?
vg version v1.65.0 "Carfon"
Compiled with g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 on Linux
Linked against libstd++ 20230528
Using HTSlib headers 101990, library 1.19.1-29-g3cfe8769
Built by [email protected]
Additional Notes:
- If I omit
-e, the pipeline completes, but the final R scripts fail because theeditscolumn is missing and I need it for my analysis. - I'm happy to share the GFA, XG, and GAM files if needed.
Let me know if I can help reproduce this further or test a patch.
Thanks!