NAnopore DAta Variant CAller
Run:
git clone [email protected]:fmfi-compbio/nadavca.git
cd nadavca
pip install .
Depending on your configuration, the last step might require root privileges.
After installing, you should be able to run
nadavca --help
and also import nadavca package in python3:
python3
...
>>> import nadavca
If Nadavca runs, but crashes on unsuccessfully trying to import some package, please report this as a bug and we will try to fix it. In the meantime, just manually install that package and try again.
If Nadavca crashes for any other reason, please report that, too.
Your reads should be basecalled (and results save to fast5) using Guppy 6.0 or earlier.
Warning: does not support multiple contigs
nadavca.estimate_snps(reference_filename,
reads,
reference=None,
config='default/config.yaml',
kmer_model='default/kmer_model.hdf5',
bwa_executable='bwa',
independent=False,
group_name='Analyses/Basecall_1D_000')where:
reference_filenameis name of a.fastafile with the reference sequencereadsis either of:- a list of filenames of individual reads
- a name of a directory. In that case, all
.fast5files in the directory will be processed - a list of
nadavca.Readinstances
configis either of:- a name of a YAML file containing parameters for the SNP-calling algorithm (see default values)
- a
dictcontaining parameters for the SNP-calling algorithm
kmer_modelis either of:- a name of a HDF5 file containing expected values of signal for individual k-mers
- a
nadavca.KmerModelinstance
bwa_executableis command that runs BWA on your system- If
independentis set toTrue, each read will be treated separately (i.e. Nadavca assumes each read was from a different modification of the reference sequence). Otherwise, information from all reads is combined into single consensus score for each position in the sequence. group_nameis path to the group containing results of basecalling inside the.fast5files of reads
The return value of nadavca.estimate_snps() is a list of
nadavca.estimator.Chunks. Each Chunk contains estimated SNP probabilities
for a contiguous segment of the reference sequence. Bounds of this segment are
indicated by Chunk's .start and .end attributes.
Estimated probabilities themselves are in Chunk's .values attribute,
which is a 2D numpy array of dimension (end - start, 4). For each position
in reference sequence between start (inclusive) and end (exclusive),
it contains 4 numbers: the estimated probability of A, C, G and T, respectively,
on this position.
If independent was set to False, some positions in the reference sequence may
be covered by multiple reads. This information is stored in .cover attribute
of nadavca.estimator.Chunk.
If independent was set to True, Nadavca produces a Chunk for each read.
If you pass reads to nadavca.estimate_snps() as a list, while using independent=True,
the order of Chunks corresponds to the order of reads.
There is also a command-line interface for nadavca.estimate_snps():
nadavca snp reference reads
Run
nadavca --help
nadavca snp --help
for more details.
nadavca.align_signal(reference_filename,
reads,
config='default/config.yaml',
kmer_model='default/kmer_model.hdf5',
bwa_executable='bwa',
group_name='Analyses/Basecall_1D_000'):
where parameters have the same meaning as in nadavca.estimate_snps().
The return value of nadavca.align_signal() is a list of results for each read,
in the same order as they appeared on the input. Thus, it is advisable to pass reads to nadavca.align_signal() as a list.
If read was not aligned the result is None.
Otherwise each results is a tuple of ApproximateAlignment and DTW alignment.
DTW alignment is an array with three columns:
the first column contains positions in the reference sequence,
the second and third column contain start and end of correspoinding event in the signal, respectively.
Positions in signal are in ascending order.
Positions in reference are in ascending or descending
order, for forward strands and reverse strands, respectively.
nadavca.align_signal() also has a command-line interface:
nadavca align reference reads
Run
nadavca --help
nadavca align --help
for more details.