Skip to content

DavidsonGroup/nailpolish

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ’… nailpolish

Build status Static Badge GitHub Release

nailpolish is a software tool made for the deduplication of UMI and barcodes when working with long read single cell data.

Install Β Β  | Β Β  Example Β Β  | Β Β  Usage Β Β  | Β Β  Docs

Install

nailpolish is distributed as a single binary with no dependencies (beyond libc). Up-to-date builds are available through the Releases section for macOS (Intel & Apple Silicon) and x64-based Linux systems.

Releases: macOS, Linux

nailpolish is in active development. If you are running into any issues, please check to ensure that you are using the most current version of the software!

For detailed usage instructions, see the documentation.

Example

Say I have a demultiplexed sample.fastq file of the following formβ€”for instance, one generated using the Flexiplex demultiplexer:

# original file has BC/UMI duplicates
$ head -n 4 sample.fastq
@BC1_UMI1
...
@BC2_UMI2
...
@BC1_UMI1
...

# index the file
$ nailpolish index sample.fastq

# summarise duplicate rates in an accessible format
$ nailpolish summary sample.fastq
<creates HTML summary file>

# generate consensus file
$ nailpolish consensus sample.fastq | head
  @BC1_UMI1|type=consensus|
  ...
  @BC2_UMI2|type=single|
  ...

Consensus generation will output all non-duplicated and consensus called reads, removing all the original duplicated reads in the process.

There are options to:

  • Set the filtering options to determine which duplicate groups should be called
  • Configure the output format and what information to report
  • Control the false positive prevention algorithm

See the documentation for more information.

Usage

Help

nailpolish v0.2.0, commit #9ddba6c
──────────────────────────────────
tools for finding, grouping, and consensus calling PCR duplicates

git:  https://github.com/DavidsonGroup/nailpolish
docs: https://davidsongroup.github.io/nailpolish/


Usage: nailpolish index [OPTIONS] <INPUT> [PRESET]
       nailpolish summary [OPTIONS] <INPUT>
       nailpolish consensus [OPTIONS] <INPUT>
       nailpolish extract [OPTIONS] <INPUT>
       nailpolish help [COMMAND]...

Options:
  -h, --help     Print help
  -V, --version  Print version

nailpolish index:
Create an index file from a demultiplexed .fastq
      --overwrite                      overwrite an existing index file, if it exists
      --clusters <CLUSTERS>            whether to use a file containing pre-clustered reads, with every line in one of two formats:
                                         1. READ_ID;BARCODE
                                         2. READ_ID;BARCODE;UMI
      --barcode-regex <BARCODE_REGEX>  barcode regex format type, for custom header styles. this will override the preset given.
                                       for example, for the `bc-umi` preset:
                                           ^([ATCG]{16})_([ATCG]{12})
      --skip-unmatched                 skip, instead of error, on reads which are not accounted for:
                                       - if a cluster file is passed, any reads which are not in any cluster
                                       - if a barcode regex or preset is used (default), any reads which do not match the regex
      --len <LEN>                      filter lengths to a value within the given float interval [a,b].
                                       a is the minimum, and b is the maximum (both inclusive).
                                       alternatively, a can be `-inf` and b can be `inf.
                                       an unbounded interval (i.e. no length filter) is given by `0,inf`. [default: 0,15000]
      --qual <QUAL>                    filter average read quality to a value within the given float interval [a,b].
                                       see the docs for `--len` for documentation on how to use the interval. [default: 0,inf]
  -h, --help                           Print help (see more with '--help')
  <INPUT>                          the input .fastq file
  [PRESET]                         [default: bc-umi] [possible values: bc-umi, umi-tools, illumina]

nailpolish summary:
Generate a summary of duplicate statistics from an index file
  -o, --output <OUTPUT>  Output .html file. By default, will write to <file>.summary.html
  -h, --help             Print help
  <INPUT>            Input .fastq file

nailpolish consensus:
Generate a consensus-called 'cleaned up' file
  -o, --output <OUTPUT>         the output .fastq, or empty for stdout
  -t, --threads <THREADS>       the number of threads to use [default: 4]
      --duplicates-only         only show the duplicated reads, not the single ones
      --report-original-reads   for each duplicate group of reads, report the original reads along with the consensus
      --report-original-header  if the original read headers are valuable, this will create a orig_header field in the consensus called result with the entire original read header
      --extra-stats             add debugging information to the read header [intended for internal development] warning: since timings are reported, the output will not be identical across runs
      --no-clustering           disable the clustering algorithm this will prevent nailpolish from detecting and separating false duplicates
  -h, --help                    Print help
  <INPUT>                   the input .fastq

nailpolish extract:
Extract reads beloning to specific group queries a .fastq file, unmodified
  -o, --output <OUTPUT>          the output .fastq, or empty for stdout
      --id <ID>                  Filter by specific group IDs (comma-separated)
      --key <KEY>                Filter by regex pattern for the key
      --group-size <GROUP_SIZE>  Filter by the size of the duplicate group
      --format <FORMAT>          Output format type [default: fastq] [possible values: fastq, fasta]
  -h, --help                     Print help
  <INPUT>                    the input .fastq

nailpolish help:
Print this message or the help of the given subcommand(s)
  [COMMAND]...  Print help for the subcommand(s)

Install from source

Prebuilt binaries

The recommended way to download Nailpolish is to use the automated builds, which can be found in the Releases section for macOS (Intel + Apple Silicon) and x64 Linux systems.

Install from source

You will need a modern version of Rust installed on your machine, as well as the Cargo package manager. That's it - all package installations will be done automatically at the build stage. This will install nailpolish into your local PATH.

$ cargo install --git https://github.com/DavidsonGroup/nailpolish.git

# or, from a local directory
$ cargo install --path .

Note to HPC users on older systems

You will need a reasonably modern version of gcc and cmake installed, and the CARGO_NET_GIT_FETCH_WITH_CLI flag enabled. For instance:

$ module load gcc/latest cmake/latest
$ CARGO_NET_GIT_FETCH_WITH_CLI="true" cargo install --git https://github.com/DavidsonGroup/nailpolish.git

Build from source

$ git clone https://github.com/DavidsonGroup/nailpolish.git
$ cargo build --release

The binary can be found at /target/release/nailpolish.

About

πŸ’… Consensus call duplicates to clean up data

Resources

License

Stars

Watchers

Forks

Packages

No packages published