PAQR3: Poly(A) site Quantification on RNA-Seq data

PAQR3 is a command-line tool for quantifying poly(A) sites from standard RNA-Seq data. It processes annotation files (GTF), PAS atlases (BED), and coverage data (BED) to identify and quantify poly(A) sites associated with genomic segments.

Installation

Using conda (recommended)

conda env create -f install/environment.yml
conda activate paqr3

This will install PAQR3 and its dependencies.

Usage

PAQR3 exposes three sub-commands with higly customizable parameters.

`paqr3 segment` — segmentation only

Parses the GTF annotation, extends terminal exons, merges nearby PAS sites, and writes a segments TSV used as input for paqr3 quant.

paqr3 segment \
  --annotation/-a                   GTF annotation file (required) \
  --pas-atlas/-pa                   PAS atlas BED file (required) \
  --output-dir/-o                   Output directory (required) \
  --downstream-exon-extension/-de   Bases to extend terminal exons (default: 200) \
  --merge-distance/-md              Merge PAS within this distance in bp (default: 5) \
  --emit/-e                         Additional outputs (see below) \
  --gzip/-g / --no-gzip             Gzip-compress TSV outputs (default: true) \
  --verbosity/-v                    Log level: INFO or DEBUG (default: INFO)

`paqr3 quant` — quantification from an existing segments TSV

paqr3 quant \
  --segments-tsv/-s                 Segments TSV from paqr3 segment (required) \
  --coverage-pos/-c-pos             Positive-strand coverage BigWig (required) \
  --coverage-neg/-c-neg             Negative-strand coverage BigWig (required) \
  --output-dir/-o                   Output directory (required) \
  --sample-id/-sid                  Sample name prefix (default: BigWig filename stem) \
  --max-pas-count/-mpc              Skip segments with more PAS than this (default: 10) \
  --f-stat-threshold/-fst           F-statistic threshold for PAS usage (default: 100) \
  --posterior-usage-weight/-puw     Weight for observed RPM in posterior blend (default: 0.1) \
  --bam/-b                          BAM file for expression rank statistics (optional) \
  --threads/-t                      Threads for BigWig reading and F-stat workers (default: 1) \
  --chr-sizes/-cs                   Chromosome-sizes file (required when any BigWig emit mode is set) \
  --emit/-e                         Additional outputs (see below) \
  --gzip/-g / --no-gzip             Gzip-compress TSV outputs (default: true) \
  --verbosity/-v                    Log level: INFO or DEBUG (default: INFO)

`paqr3 full` — segmentation and quantification end-to-end

Accepts all arguments from both segment and quant (except --segments-tsv).

paqr3 full \
  --annotation/-a  --pas-atlas/-pa \
  --coverage-pos/-c-pos  --coverage-neg/-c-neg \
  --output-dir/-o  [all other quant/segment options]

`--emit` modes

The --emit flag controls which additional outputs are written. Multiple modes can be combined.

Token	Files written
`mean_cov`	`{sample}_mean_cov.bw` — mean coverage per subsegment
`observed`	`{sample}_observed_usage.bw` + `{sample}_observed_rpm.bw`
`posterior`	`{sample}_posterior_rpm.bw` + `{sample}_posterior_usage.bw`
`atlas`	`{sample}_atlas_rpm.bw` + `{sample}_atlas_usage.bw`
`all`	All four BigWig groups above
`debug`	All BigWigs + debug JSON + four debug BED files

BigWig emission requires --chr-sizes.

Output files

`paqr3 segment`

Outputs are written directly to --output-dir:

File	Description
`output_segments.tsv`	One row per sub-segment; input for `paqr3 quant`. Columns: `chrom`, `start`, `end`, `subsegment_id`, `strand`, `overlapping_pas_id` (rep CS string or `.`), `atlas_rpm`, `pas_start`, `pas_end` (merged PAS cluster boundaries).
`output_segments_debug_{genes,segments,subsegments,pas}.bed`	Four headerless BED files (chrom, start, end, id, strand) — written only when `--emit debug` is set.

`paqr3 quant` / `paqr3 full`

Outputs are written to {output-dir}/{sample}_results/:

File	Description
`{sample}_segment_results.tsv[.gz]`	One row per evaluated segment. Columns: `chr`, `start`, `end`, `strand`, `segment_id`, `rna_sum_drop_cov`, `f_stat`, `p_value`.
`{sample}_subsegment_results.tsv[.gz]`	One row per subsegment (including trailing). Columns: `chr`, `start`, `end`, `strand`, `subsegment_id`, `pas_id`, `mean_cov`, `atlas_rpm`, `observed_rpm`, `posterior_rpm`.
`{sample}_pas_results.tsv[.gz]`	One row per PAS. Columns: `chr`, `start`, `end`, `strand`, `pas_id`, `subsegment_id`, `atlas_usage`, `observed_usage`, `posterior_usage`, `gene_level_usage`.
`{sample}_mean_cov.bw`	Mean RNA-seq coverage per subsegment. Written when `--emit mean_cov`, `all`, or `debug`.
`{sample}_observed_usage.bw`	F-stat derived PAS usage fraction, at PAS cluster coordinates.
`{sample}_observed_rpm.bw`	Observed coverage-drop RPM per PAS.
`{sample}_posterior_rpm.bw`	Posterior blended RPM per PAS.
`{sample}_posterior_usage.bw`	Posterior relative usage per PAS (normalised within segment).
`{sample}_atlas_rpm.bw`	Atlas RPM per PAS.
`{sample}_atlas_usage.bw`	Atlas relative usage per PAS (normalised within segment).
`{sample}_debug.json`	Per-segment pattern evaluation dump. Written when `--emit debug`.
`{sample}_debug_{genes,segments,subsegments,pas}.bed`	Debug BED files. Written when `--emit debug`.

{sample} defaults to the stem of the positive-strand BigWig filename (everything before the first .) and can be overridden with --sample-id. The .gz suffix appears when --gzip is active (default).

Contributing

Contributions are welcome! Please see the contributing guidelines for more information.

License

GNU General Public License v2.0

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github		.github
install		install
paqr3		paqr3
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PAQR3: Poly(A) site Quantification on RNA-Seq data

Installation

Using conda (recommended)

Usage

`paqr3 segment` — segmentation only

`paqr3 quant` — quantification from an existing segments TSV

`paqr3 full` — segmentation and quantification end-to-end

`--emit` modes

Output files

`paqr3 segment`

`paqr3 quant` / `paqr3 full`

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PAQR3: Poly(A) site Quantification on RNA-Seq data

Installation

Using conda (recommended)

Usage

paqr3 segment — segmentation only

paqr3 quant — quantification from an existing segments TSV

paqr3 full — segmentation and quantification end-to-end

--emit modes

Output files

paqr3 segment

paqr3 quant / paqr3 full

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`paqr3 segment` — segmentation only

`paqr3 quant` — quantification from an existing segments TSV

`paqr3 full` — segmentation and quantification end-to-end

`--emit` modes

`paqr3 segment`

`paqr3 quant` / `paqr3 full`

Packages