Skip to content

Commit e91b99d

Browse files
authored
Enhance concepts.md with structural variation details
1 parent e68cbb1 commit e91b99d

File tree

1 file changed

+63
-3
lines changed

1 file changed

+63
-3
lines changed

docs/concepts.md

Lines changed: 63 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,66 @@ arbitrarily complex patterns of genetic inheritance.
2727
The Genetic Inheritance Graph Library (giglib) is a proof-of-concept implementation of the idea
2828
behind GIGs, and is inspired by the standard [tskit](https://tskit.dev) ARG library.
2929

30-
:::{todo}
31-
Fill out more details from the [README.md](https://github.com/hyanwong/GeneticInheritanceGraphLibrary/blob/main/README.md)
32-
:::
30+
## Details
31+
32+
In [`tskit`](https://tskit.dev) we use edge annotations to describe which pieces of DNA are inherited in terms of a left and right coordinate.
33+
In giglib, this is extended to track the L & R in the edge *child*, and the L & R in the edge *parent* separately.
34+
The left and right values in each case refer to the coordinate system of the child and parent respectively.
35+
36+
```{note}
37+
For terminological clarity, we switch to using the term interval-edge (`iedge`)
38+
to refer to what is normally called an `edge` in a *tskit* Tree Sequence.
39+
Separating child from parent coordinates brings a host of extra complexities,
40+
and it’s unclear if the efficiency of the tskit approach,
41+
with its edge indexing etc, will port in any meaningful way to this new structure.
42+
```
43+
44+
## Structural variation
45+
46+
Below are some examples of how different sorts of structural variation can be encoded. These correspond to the
47+
schematic below:
48+
49+
![GIG schematic](_static/schematic.png)
50+
51+
### Inversions
52+
53+
The easiest example is an inversion. This would be an iedge like
54+
55+
```
56+
{parent: P, child: C, child_left: 6, child_right: 14, parent_left: 14, parent_right: 6}
57+
```
58+
59+
There is a subtle gotcha here, because intervals in a GIG, as in _tskit_, are treated as half-closed
60+
(i.e. do not include the position given by the right coordinate). When we invert an interval, it
61+
therefore does not include the *left* parent coordinate, but does include the *right* parent coordinate.
62+
Any transformed position is thus out by one. Or to put it another way, an inversion specified
63+
by child_left=0, child_right=3, parent_left=3, parent_right=0 transforms the points
64+
0, 1, 2 to 2, 1, 0: although the *interval* 0, 3 is transformed to 0, 3., the *point* 0 is transformed
65+
to position 2, not position 3. See
66+
[here](https://github.com/hyanwong/giglib/issues/41#issuecomment-1858530867)
67+
for more discussion.
68+
69+
### Duplications
70+
71+
A tandem duplication is represented by two iedges, one for each duplicated region:
72+
73+
```
74+
{parent: P, child: C, child_left: 10, child_right: 20, parent_left: 10, parent_right: 20}
75+
{parent: P, child: C, child_left: 20, child_right: 30, parent_left: 10, parent_right: 20}
76+
```
77+
78+
Or one of the iedges could represent a non-adjacent duplication (e.g. corresponding to a transposable element):
79+
```
80+
{parent: P, child: C, child_left: 25, child_right: 35, parent_left: 10, parent_right: 20}
81+
```
82+
83+
### Deletions
84+
85+
A deletion simply occurs when no material from the parent is transmitted to any of its children (and the coordinate system is shrunk)
86+
87+
```
88+
# Deletion of parental region from 5-15
89+
{parent: P, child: C, child_left: 0, child_right: 5, parent_left: 0, parent_right: 5}
90+
{parent: P, child: C, child_left: 5, child_right: 10, parent_left: 15, parent_right: 20}
91+
```
92+

0 commit comments

Comments
 (0)