Skip to content

Perform hard selective masking of ancient DNA deamination patterns, using the output misincorporation frequency estimates of MapDamage (https://ginolhac.github.io/mapDamage/)

License

Notifications You must be signed in to change notification settings

MaelLefeuvre/pmd-mask

Repository files navigation

PMD-mask

Perform hard selective masking of ancient DNA deamination patterns, using the output misincorporation frequency estimates of MapDamage.

DOI Ubuntu MacOS

Preface

pmd-mask is a simple preprocessing and quality filtration command-line utility designed to selectively mask potentially deaminated nucleotides within ancient DNA alignement files, changing putative deaminated bases to N and quality to !.

This method may be regarded as a conservative compromise between post-morterm damage rescaling methods such as MapDamage or PMDtools, and hard-clipping methods such as the trimBam module of bamUtil. Here, pmd-mask instead leverages nucleotide and position specific misincorporation rate estimes emitted from MapDamage to selectively trim read ends, up-until the local misincorporation rate reaches a designated, user-defined threshold (default: 1%). This approach can thus greatly mitigate the loss of information usually displayed when applying hard-clipping on ancient DNA samples, by specifically targeting potential C>T and G>A transitions on both the 5’ and 3’ end of the read, respectively.

An illustration of the pmd-mask workflow and functioning
Figure 1. Illustrative diagram of the pmd-mask workflow and functioning.

Installation

Source-compilation (recommended)

Dependencies

Cargo

This project is written in Rust, and thus requires cargo for source compilation. (Note that the current minimum supported version for source-compilation is Rust 1.64.0)

To install cargo:

curl --proto '=https' --tlsv1.2 https://sh.rustup.rs -sSf | sh

Compilation

  1. Clone this repository
git clone [email protected]:MaelLefeuvre/pmd-mask.git
  1. Run the test suite from the repository's root
cd pmd-mask && cargo test
  1. Compile and install
RUSTFLAGS="-Ctarget-cpu=native" cargo install --path .
  1. pmd-mask should be located within ~/.cargo/bin/ and included in your PATH
pmd-mask --help

Precompiled versions

If your workstation is operated under a Linux/GNU system and you do not wish to source-compile pmd-mask, precompiled binaries are currently available for x86_64-unknown-linux-gnu architectures. See the Release tab.

Usage

Data requirements:

The following inputs are required to use PMD-mask:

  1. An input bam file (SAM|BAM|CRAM formats are accepted). pmd-mask can either read from a file (using -b|--bam) or from the standard input, through shell piping.
  2. A MapDamage-v2 misinscorporation.txt file. This file provides strand-specific PMD frequency estimates, which are used to compute the threshold at which masking should be performed. Use -m|--misincorporation to specify this input. Of course, this file must have been obtained from your input bam file to provide with a sound estimate.
  3. A reference genome. This genome must of course be the same as the one used to align the aforementionned bam file. Use -f|--reference to specify the path to your reference
pmd-mask --reference data/GRCh37/Homo_sapiens.GRCh37.dna.primary_assemby.fa --misincorporation test-sample-MD-folder/misincorporation.txt --bam ./test-sample.srt.rmdup.bam 

Optional parameters:

  • The PMD-frequency threshold used to apply masking can be specified with the -t|--threshold parameter (Default: 0.01)
  • The name of the output can be specified using -o|--output. When unspecified, pmd-mask will flush results to the standard output.
  • The output format can be specified using -O|--output-fmt. (SAM|BAM|CRAM format accepted). When using BAM or CRAM, the compression level can be specified using --compress-level.
  • Use -@|--threads to allocate additional cores to the program. This can speed-up the (de)compression rate of your input and output files.
  • Add -v|--verbose flags to increase the verbosity. Multiple levels: -v: Will output general information (INFO) -vv: will output general information (DEBUG) -vvv: will output detailled debug information (TRACE). Not that warnings are still emitted, no matter the verbosity level. This behavior can be disabled using the -q|--quiet flag, which will inhibit all logging.

A more detailled example:

  1. Filter autosomes using samtools (notice the -h flags on this command, which is required to communicate the header information to pmd-mask)
  2. Apply PMD-masking with a threshold of 2%
  3. Pileup this sample, while targeting the Reich 1240K Compendium positions.
  4. Run grups-rs to compute an approximate estimate of the individual's average heterozygocity.
samtools view -h ./MT23/MT23.srt.rmdup.rescaled.bam {1..22} | pmd-mask -f ./hs37d5.fa -m ./MT23/misincorporation.txt -Ob --threshold 0.02 --quiet | samtools mpileup -RB -q25 -Q25 -f ./hs37d5.fa.gz -l ./v52.2_1240K_public.bed - | grups pwd-from-stin --samples 0 --self-comparison --sample-name MT23

Contributing

Feedback, issue reports and contributions are gladly accepted! If you encounter any issue while using this software, would like to request additional features, or simply wish to contribute, please submit a Github issue here.

For developpers and pull-requests, additional information and context regarding development tools surrounding pmd-mask can be found in the CONTRIBUTING.md file.

About

Perform hard selective masking of ancient DNA deamination patterns, using the output misincorporation frequency estimates of MapDamage (https://ginolhac.github.io/mapDamage/)

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages