-
Notifications
You must be signed in to change notification settings - Fork 21
Description
Hi,
I'm trying to simulate WGS reads for the human genome and would love to do this in parallel mode (as it would otherwise take a lifetime ;) ). However, somehow it always goes to contig mode instead of size.
I run NEAT v4.3.5 like this:
neat read-simulator -c neat_config.yml -o output
My config looks like this:
reference: fasta/Homo_sapiens_assembly38.fasta
include_vcf: data/merged_filtered_imputed.vcf
read_len: 150
ploidy: 2
coverage: 15
fragment_mean: 350
fragment_st_dev: 50
paired_ended: True
rng_seed: 123456
min_mutations: 0
mutation_rate: 0.0
produce_bam: False
produce_vcf: False
produce_fastq: True
threads: 48
parallel_mode: size
parallel_block_size: 500000
(I've also tried "mode: size" and "size: 500000" )
The log then outputs this about parallel mode:
2025-12-29 09:52:35,063:INFO:neat.read_simulator.utils.options:Running read simulator in parallel mode.
2025-12-29 09:52:35,063:INFO:neat.read_simulator.utils.options:Multithreading - 48 threads (or CPU Max)
2025-12-29 09:52:35,063:INFO:neat.read_simulator.utils.options:Splitting input by contig.
So even though it says it is in parallel mode, it splits by contig.
When i try with version 4.3.1 with the same config (changed "parallel_mode" to "mode" )
I am able to run in parallel and it goes very fast!
the log says:
2025-12-29 09:57:34,598:INFO:neat.read_simulator.utils.options:Running read simulator in parallel mode.
2025-12-29 09:57:34,598:INFO:neat.read_simulator.utils.options:Multithreading - 48 threads (or CPU Max)
2025-12-29 09:57:34,599:INFO:neat.read_simulator.parallel_runner:[parallel] Splitting reference...
The only issue here is that it complains that the chromosome naming is not the same between my fasta and my vcf. Which is a bit weird since I use "chr1" naming in both fasta and vcf, and it works fine in version 4.3.5.
So I am a bit puzzled here and really would love to use neat in parallel mode.
Thanks!