-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Hi,
Thanks for the active development of Isoquant!
This is more of a request for documentation than a bug report.
I'm testing the isoquant (version 3.7.1) feature to build models from trusted transcripts.
My goal is to understand:
- Weather is better to provide fasta of the transcripts or pre-map them
- Whether I should add a synthetic polyA to my transcripts
My workflow is the following.
This is not a real use case but more of a control.
- I build models from ont reads using isoquant in
ontmode - I extract the fasta sequence of the transcripts from the gtf
- I add an artificial polyA to each transcript sequence (I also test with non-polyadenilated, named simple below)
At this point, I have 11332 fasta sequences that correspond to transcripts.
At this point, my workflow branches into mapping on my own (a) and letting IsoQuant do the job (b).
1a. I map the sequencies (simple and polyA) with minimap2 -ax splice -t 16 "$REF" "$READS" > "$SAM"
2a. I run Isoquant from these mappings isoquant.py --reference "$REF" --bam "$BAM[simple,polyA]" --data_type transcripts
or
1b. I run Isoquant from reads isoquant.py --reference "$REF" --fastq "$FASTA[simple,polyA]" --data_type transcripts
In the slide below, you can see the results (where I'm also testing ont and pacbio presets).
In brief, if I provide BAM there is no difference between polyA or simple.
If I provide sequences of transcripts I do see a difference in the models
In all cases the transcripts that I get are less than the input ones.
My questions are:
- How and at what stage isoquant handle polyA?
- Why don't I get the same number of transcripts as the input? Especially in the case of bam as input?
Thanks again for opening and maintaining Isoquant.
Fabio