Skip to content

Cannot add apparent simple SV on pangenome #124

@LzzCongee

Description

@LzzCongee

I simulated some dup and inversion on hg38 chromosomal 10 , calling it simSV.fasta, and then construct pangenome with hg38Chr10.fasta, but found that the sv simualted cannot be added td the final gfa file, which only contains one node with full sequence in hg38Chr10.fasta.
finally, I found that problem is due to the function static void gg_merge_seg , when merging interval which probably contain variations;
The interval was chosen incorrectly when calculating the score between two intervals

static void gg_merge_seg(const ed_intv_t *intv, int32_t n_ss, mg_msseg_t *ss)
{
                ······ other code ······
                for (i = s0->en + 1; i < s1->st; ++i)
                        mid += intv[i].sc;
                //fprintf(stderr, "XX\t%d\t%d\t%d\t%d\t%d\t%d\n", j, s0->sc, mid, s1->sc, s0->en+1, s1->st);
                if (-mid < s0->sc * 0.2 && -mid < s1->sc * 0.2) { // FIXME: mid is sometimes 0
                        s0->en = s1->en, s0->sc += s1->sc + mid;
                        s1->st = s1->en, s1->sc = 0;
                } else j0 = j;
        }
}

The starting value of the for loop was probably wrong during the calculation of the above code, according to the below in function gg_simple_cigar :

    // find the initial positions
    st = ss[j].st, en = ss[j].en; // this is a CLOSED interval
    if (st == en) continue;
    **is = &intv[st], ie = &intv[en - 1];**

It means that a interval can be defined by [st,en) which is a left open and right closed interval; And thus when calculating the score between two intervals ,the initial value should be s0->en

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions