-
Notifications
You must be signed in to change notification settings - Fork 40
Description
I simulated some dup and inversion on hg38 chromosomal 10 , calling it simSV.fasta, and then construct pangenome with hg38Chr10.fasta, but found that the sv simualted cannot be added td the final gfa file, which only contains one node with full sequence in hg38Chr10.fasta.
finally, I found that problem is due to the function static void gg_merge_seg , when merging interval which probably contain variations;
The interval was chosen incorrectly when calculating the score between two intervals
static void gg_merge_seg(const ed_intv_t *intv, int32_t n_ss, mg_msseg_t *ss)
{
······ other code ······
for (i = s0->en + 1; i < s1->st; ++i)
mid += intv[i].sc;
//fprintf(stderr, "XX\t%d\t%d\t%d\t%d\t%d\t%d\n", j, s0->sc, mid, s1->sc, s0->en+1, s1->st);
if (-mid < s0->sc * 0.2 && -mid < s1->sc * 0.2) { // FIXME: mid is sometimes 0
s0->en = s1->en, s0->sc += s1->sc + mid;
s1->st = s1->en, s1->sc = 0;
} else j0 = j;
}
}
The starting value of the for loop was probably wrong during the calculation of the above code, according to the below in function gg_simple_cigar :
// find the initial positions
st = ss[j].st, en = ss[j].en; // this is a CLOSED interval
if (st == en) continue;
**is = &intv[st], ie = &intv[en - 1];**
It means that a interval can be defined by [st,en) which is a left open and right closed interval; And thus when calculating the score between two intervals ,the initial value should be s0->en
