Skip to content

Question about GFA format, unexpected link information. #126

@AltriaXY

Description

@AltriaXY

Hello,
I’m working with the HPRC pangenome graphs generated by minigraph-cactus and minigraph and I came across something in the gfa file that I find confusing.

I read the gfa file and found something I couldn't understand. Here is the subgenome in gfa format:

S	5446673	AAAATCTTCCTCTCT	SN:Z:CHM13#chr1	SO:i:159272821	SR:i:0
S	5446672	G
S	5446671	T	SN:Z:CHM13#chr1	SO:i:159272836	SR:i:0
S	5446670	GGGAAGACAGGCAGCCATGCTTCAGGGGCTGGGGGTGG	SN:Z:CHM13#chr1	SO:i:159272837	SR:i:0
L	5446673	+	5446671	+	0M
L	5446672	+	5446673	-	0M
L	5446670	-	5446672	+	0M
L	5446671	+	5446670	+	0M

The reference path of this subgenome is >5446673>5446671>5446670. The head of reverse complement of segment 5446672 links to 5446673, and the tail to 5446670.
This raises two questions:

  1. Why link information is described as
L	5446672	+	5446673	-	0M
L	5446670	-	5446672	+	0M

rather than

L	5446673	+	5446672	-	0M
L	5446672	-	5446670	+	0M

The below one is much more understandable, indicates links from 5446673 to 5446672 and from 5446672 to 5446670; but the above one seems to describe links from 5446670 to 5446672 and from 5446672 to 5446673, which are against with their real direction.

  1. Since segment 5446672 only participate in the pangenome graph with its reverse complement, why here uses "S 5446672 G" and its reverse complement in link information, rather than "S 5446672 C" and "L 5446673 + 5446672 + 0M" "L 5446672 + 5446670 + 0M".

Cloud you perhaps explain this questions?
Does link have direction? Whether there is a possibility that both reverse and forward complement of one segment participate in a graph?

Thank you

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions