agc compression with large datasets #318

meckwright · 2025-06-09T16:20:41Z

meckwright
Jun 9, 2025

Hello,
I've been trying to create a PHG database using exome capture data. I haven't been able to progress past the agc-compress step, because compression takes a very long time to complete (>2 weeks).

I thought I might be able to speed it up by downloading agc directly and adding cultivars in smaller batches, but this hasn't helped in any significant way.

The exome data I'm working from is about 4TB in total size, so I'm wondering if this is just a larger dataset than PHGv2 was inteded to handle? Or could there be some other problem I'm running into? Here's the command I used for agc as a part of PHGv2:

./phg/bin/phg agc-compress --db-path vcf_dbs --fasta-list data/keys/assemblies_list.txt --reference-file output/updated_assemblies/CSv2-1.fa

...and here's the code I've been using to run agc directly:

agc create -t 32 -v 2 -i agc_create.txt -o base.agc ../output/updated_assemblies/CSv2-1.fa
agc append -t 32 -v 2 -i shortlist.txt base.agc > update1.agc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

agc compression with large datasets #318

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

agc compression with large datasets #318

Uh oh!

meckwright Jun 9, 2025

Replies: 0 comments

meckwright
Jun 9, 2025