snp-dists

Convert a FASTA alignment to SNP distance matrix

Quick Start

% cat test/good.aln

>seq1
AGTCAGTC
>seq2
AGGCAGTC
>seq3
AGTGAGTA
>seq4
TGTTAGAC

% snp-dists test/good.aln > distances.tab

Read 4 sequences of length 8

% cat distances.tab

snp-dists 0.7   seq1    seq2    seq3    seq4
seq1            0       1       2       3
seq2            1       0       3       4
seq3            2       3       0       4
seq4            3       4       4       0

Installation

snp-dists is written in C to the C99 standard and only depends on zlib.

Bioconda

conda install -c bioconda snp-dists

Containers

Docker images are available on dockerhub and quay.io. These are maintained by the StaPH-B workgroup. Dockerfiles can be found here.

# Docker
docker pull staphb/snp-dists:latest
docker run staphb/snp-dists:latest snp-dists -h

# Singularity
singularity build snp-dists.sif docker://staphb/snp-dists:latest
singularity exec snp-dists.sif snp-dists -h

Source

git clone https://github.com/tseemann/snp-dists.git
cd snp-dists
make

# run tests
make check
bats test/test.sh # if you have BATS installed

# install into $HOME/.local/bin
make install

Options

`snp-dists -h` (help)

USAGE
  snp-dists [opts] aligned.fasta[.gz] > matrix.tsv
OPTIONS
  -h       Show this help
  -v       Print version and exit
  -j CPUS  Threads to use [1]
  -q       Quiet mode; no progress messages
  -a       Count all differences not just [AGTC]
  -k       Keep case, don't uppercase all letters
  -m       Output MOLTEN instead of TSV
  -L       Ootput lower-trangle only (unique pairs)
  -c       Use comma instead of tab in output
  -b       Blank top left corner cell
  -t       Add column headers when using molten format
  -x INT   Stop counting distance beyond this [99999]
URL
  https://github.com/tseemann/snp-dists

`snp-dists -v` (version)

Prints the name and version separated by a space in standard Unix fashion.

snp-dists 0.9.0

`snp-dists -q` (quiet mode)

Don't print informational messages, only errors.

`snp-dists -c` (CSV instead of TSV)

snp-dists 0.7.0,seq1,seq2,seq3,seq4
seq1,0,1,2,3
seq2,1,0,3,4
seq3,2,3,0,4
seq4,3,4,4,0

`snp-dists -b` (omit the toolname/version)

        seq1    seq2    seq3    seq4
seq1    0       1       2       3
seq2    1       0       3       4
seq3    2       3       0       4
seq4    3       4       4       0

`snp-dists -L` (lower-triangle only)

	seq1	seq2	seq3	seq4
seq1	0
seq2	1	0
seq3	2	3	0
seq4	3	4	4	0

Advanced options

By default, all letters are (1) uppercased and (2) ignored if not A,G,T or C.

`snp-dists -a` (don't just count AGTC)

Normally one would not want to count ambiguous letters and gaps as a "difference" but if you desire, you can enable this option.

>seq1
NGTCAGTC
>seq2
AG-CAGTC
>seq3
AGTGNGTA

`snp-dists -k` (don't uppercase any letters)

You may wish to preserve case, as you may wish lower-case characters to be masked in the comparison.

>seq1
AgTCAgTC
>seq2
AggCAgTC
>seq3
AgTgAgTA

`snp-dists -m` ("molten" output format)

seq1    seq1    0
seq1    seq2    1
seq1    seq3    2
seq1    seq4    3
seq2    seq1    1
seq2    seq2    0
seq2    seq3    3
seq2    seq4    4
seq3    seq1    2
seq3    seq2    3
seq3    seq3    0
seq3    seq4    4
seq4    seq1    3
seq4    seq2    4
seq4    seq3    4
seq4    seq4    0

`snp-dists -m -t` ("molten" output format with column headers; requires -m "molten" format enabled)

sequence_1  sequence_2  distance
seq1        seq1        0
seq1        seq2        1
seq1        seq3        2
<snip>

`snp-dists -x INT` (stop counting after `INT` SNPs

Once a distance between two samples becomes very large there is often not much point keeping on counting. Th -x option allows you to "short-circuit" the counting. This can reduce computation time significantly on large alignment is you only care about small distance.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.github/workflows		.github/workflows
paper		paper
test		test
.clang-format		.clang-format
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
environment.yml		environment.yml
kseq.h		kseq.h
main.c		main.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

snp-dists

Quick Start

Installation

Bioconda

Containers

Source

Options

`snp-dists -h` (help)

`snp-dists -v` (version)

`snp-dists -q` (quiet mode)

`snp-dists -c` (CSV instead of TSV)

`snp-dists -b` (omit the toolname/version)

`snp-dists -L` (lower-triangle only)

Advanced options

`snp-dists -a` (don't just count AGTC)

`snp-dists -k` (don't uppercase any letters)

`snp-dists -m` ("molten" output format)

`snp-dists -m -t` ("molten" output format with column headers; requires -m "molten" format enabled)

`snp-dists -x INT` (stop counting after `INT` SNPs

Issues

Related software

Licence

Authors

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors 7

Languages

License

tseemann/snp-dists

Folders and files

Latest commit

History

Repository files navigation

snp-dists

Quick Start

Installation

Bioconda

Containers

Source

Options

snp-dists -h (help)

snp-dists -v (version)

snp-dists -q (quiet mode)

snp-dists -c (CSV instead of TSV)

snp-dists -b (omit the toolname/version)

snp-dists -L (lower-triangle only)

Advanced options

snp-dists -a (don't just count AGTC)

snp-dists -k (don't uppercase any letters)

snp-dists -m ("molten" output format)

snp-dists -m -t ("molten" output format with column headers; requires -m "molten" format enabled)

snp-dists -x INT (stop counting after INT SNPs

Issues

Related software

Licence

Authors

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors 7

Languages

`snp-dists -h` (help)

`snp-dists -v` (version)

`snp-dists -q` (quiet mode)

`snp-dists -c` (CSV instead of TSV)

`snp-dists -b` (omit the toolname/version)

`snp-dists -L` (lower-triangle only)

`snp-dists -a` (don't just count AGTC)

`snp-dists -k` (don't uppercase any letters)

`snp-dists -m` ("molten" output format)

`snp-dists -m -t` ("molten" output format with column headers; requires -m "molten" format enabled)

`snp-dists -x INT` (stop counting after `INT` SNPs

Packages