I've been (re-)reviewing a PR to HTSJDK that enables TP:circular in @SQ and that took me back to the unfinished business of how to index and random-access circular contigs.
If we extended the index into the greater-than-LEN space, we could easily index the reads that "hang off" the end of the contig. Then, in order to query them, one should also add LEN and see if there are any reads at query_pos+LEN (when querying for query_pos). Since the assumption is that we only have linear alignments that have POS≤LEN, we only need to look at query_pos+LEN and not, e.g., query_pos+2*LEN.
Thoughts?
I've been (re-)reviewing a PR to HTSJDK that enables
TP:circularin@SQand that took me back to the unfinished business of how to index and random-access circular contigs.If we extended the index into the greater-than-LEN space, we could easily index the reads that "hang off" the end of the contig. Then, in order to query them, one should also add LEN and see if there are any reads at
query_pos+LEN(when querying for query_pos). Since the assumption is that we only have linear alignments that havePOS≤LEN, we only need to look at query_pos+LEN and not, e.g., query_pos+2*LEN.Thoughts?