Clarification on how low-identity/short reads contribute to transcript quantification in Salmon (long-read data)

Hi Salmon developers,

We are currently using Salmon to perform transcript quantification on long-read RNA-seq data (Oxford Nanopore). During our analysis, we observed that a substantial proportion of reads aligned to certain transcripts are relatively short and have low alignment identity and/or low coverage against the reference transcriptome/genome.

From manual inspection, many of these reads do not appear sufficiently reliable to be confidently considered as true transcript-supporting reads. However, Salmon still reports relatively high TPM/CPM values for some of these transcripts.

We would therefore like to better understand how Salmon converts mapped reads into transcript abundance estimates in long-read datasets.

Specifically, we are wondering:

Does Salmon apply any minimum threshold on:
read identity,
alignment score,
mapping quality,
aligned fraction/coverage,
before a read is included in quantification?
If no explicit threshold is applied, how are very low-similarity or partially aligned reads handled internally during abundance estimation?
In alignment-based mode, does Salmon use the original aligner’s filtering decisions entirely, or does it additionally weight/filter alignments based on alignment quality metrics?
Could a large number of low-identity reads artificially inflate TPM/CPM estimates, especially in noisy long-read datasets such as ONT direct RNA sequencing?
Are there recommended preprocessing or alignment filtering strategies before running Salmon on long-read data to avoid potential over-quantification from noisy reads?

For context, we are particularly concerned because many reads mapping to our target genes show relatively low genome/transcript identity upon inspection, yet the final quantified abundance remains unexpectedly high.

We would greatly appreciate any clarification regarding the internal quantification logic or best practices for handling noisy long-read data with Salmon.

Thank you very much for your help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on how low-identity/short reads contribute to transcript quantification in Salmon (long-read data) #1009

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Clarification on how low-identity/short reads contribute to transcript quantification in Salmon (long-read data) #1009

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions