Question on CIGAR strings in UTA

Lately I've been trying to understand how to interpret CIGAR strings in UTA and running into some confusion. This might just be due to some incorrect assumptions about CIGAR, but any advice is appreciated.

Here I have a query to UTA for an alignment that contains a 3bp deletion:
```
uta=> select cigar, tx_ac, alt_ac, ord, (tx_end_i - tx_start_i) as tx_ex_len, (alt_end_i - alt_start_i) as alt_ex_len  
from tx_exon_aln_v where tx_ac = 'NM_001256326.1' and cigar !~ '^[0-9]+=$' and alt_ac = 'NC_000017.10' order by ord;
   cigar   |     tx_ac      |    alt_ac    | ord | tx_ex_len | alt_ex_len 
-----------+----------------+--------------+-----+-----------+------------
 1453=3D2= | NM_001256326.1 | NC_000017.10 |  35 |      1458 |       1455
 ```

I've been assuming that this alignment means that there is a deletion of 3 bases in the _transcript_ relative to to the _genome_ (i.e. transcript is the "query", genome is the "reference"). However based on the tx_ex_len and alt_ex_len columns computed in that query, it seems I have this backwards: there are 1455 bases in the aligned region of the genome, and 1455+3 bases in the transcript's aligned region.

So in UTA's transcript-genome alignments, is the genome considered the "query" sequence and the transcript the "reference"? Meaning that, when reading CIGAR strings that are describing indels, should I be assuming that a deletion event means a deletion of bases from the genome that ARE present in the transcript (and vice versa for insertions)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on CIGAR strings in UTA #266

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Question on CIGAR strings in UTA #266

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions