agc compression with large datasets #318
Unanswered
meckwright
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I've been trying to create a PHG database using exome capture data. I haven't been able to progress past the agc-compress step, because compression takes a very long time to complete (>2 weeks).
I thought I might be able to speed it up by downloading agc directly and adding cultivars in smaller batches, but this hasn't helped in any significant way.
The exome data I'm working from is about 4TB in total size, so I'm wondering if this is just a larger dataset than PHGv2 was inteded to handle? Or could there be some other problem I'm running into? Here's the command I used for agc as a part of PHGv2:
./phg/bin/phg agc-compress --db-path vcf_dbs --fasta-list data/keys/assemblies_list.txt --reference-file output/updated_assemblies/CSv2-1.fa...and here's the code I've been using to run agc directly:
agc create -t 32 -v 2 -i agc_create.txt -o base.agc ../output/updated_assemblies/CSv2-1.faagc append -t 32 -v 2 -i shortlist.txt base.agc > update1.agcBeta Was this translation helpful? Give feedback.
All reactions