|
1 | 1 | # RCCX module |
2 | 2 |
|
3 | | -Medically relevant genes in this region include: |
4 | | - - [CYP21A2](https://www.ncbi.nlm.nih.gov/books/NBK1171/) (21-Hydroxylase-Deficient Congenital Adrenal Hyperplasia) |
5 | | - - TNXB (Ehlers-Danlos syndrome) |
6 | | - - C4A/C4B (relevant in autoimmune diseases) |
7 | | - |
8 | | -## Fields in the `json` file |
9 | | - |
10 | | -- `total_cn`: total copy number of RCCX |
11 | | -- `two_copy_haplotypes`: haplotypes that are present in two copies based on depth. This happens when (in a small number of cases) two haplotypes are identical and we infer that there exist two of them instead of one by checking the read depth. |
12 | | -- `alleles_final`: different copies of RCCX are phased into alleles with read based phasing. |
13 | | -- `ending_hap`: the last copy of RCCX on each allele. Only these copies contain parts of TNXB (while the other copies contain TNXA) |
14 | | -- `annotated_alleles`: allele annotation for the CYP21A2 gene. This is a list of two items, each representing one allele in the sample. This is only based on common gene-pseudogene (CYP21A2-CYP21A1P) differences (P31L, IVS2-13A/C>G, G111Vfs, I173N, I237N, V238E, M240K, V282L, Q319X and R357W). Please refer to the VCFs for most thorough variant calling and annotation. Below are a few examples of annotated alleles: |
15 | | - - `WT`: one copy of CYP21A2 and one copy of CYP21A1P (pseudogene) on this allele. |
16 | | - - `pseudogene_duplication`: On this allele, there is an additional copy of the pseudogene. |
17 | | - - `pseudogene_deletion`: On this allele, the pseudogene is deleted. |
18 | | - - `gene_duplication`: On this allele, there is an additional copy of CYP21A2. |
19 | | - - `gene_deletion`: On this allele, CYP21A2 is deleted. |
20 | | - - `deletion_P31L,G111Vfs`: On this allele, there is a deletion of one RCCX copy, creating a fusion gene between CYP21A1P and CYP21A2. This fusion gene carries the variants P31L and G111Vfs (which come from the pseudogene part of the fusion). |
21 | | - - `duplication_WT_plus_Q319X`: On this allele, there is an additional copy of CYP21A2. Among the two copies of CYP21A2, one copy is WT and the other carries Q319X. |
22 | | - - `Q319X`: On this allele, there is no CNV, i.e. there is one copy of CYP21A2 and one copy of CYP21A1P. CYP21A2 carries the variant Q319X. (Other known variants in CYP21A2 are also reported in this way, e.g. `282L`.) |
| 3 | +The RCCX module refers to a complex and variable region on chromosome 6, overlapping several medically relevant genes, including: |
| 4 | +- [CYP21A2](https://www.ncbi.nlm.nih.gov/books/NBK1171/) (21-Hydroxylase-Deficient Congenital Adrenal Hyperplasia) |
| 5 | +- TNXB (Ehlers-Danlos syndrome) |
| 6 | +- C4A/C4B (relevant in autoimmune diseases) |
| 7 | + |
| 8 | +Below is a simplified schematic of the region: |
| 9 | + |
| 10 | + |
| 11 | + |
| 12 | +## Region specific fields in the `json` file |
| 13 | + |
| 14 | +Fields shared across all genes are defined in the general [json file](json.md). The RCCX module includes several unique fields, listed below: |
| 15 | + |
| 16 | +- `ending_hap`: Indicates the last RCCX copy on each allele. These haplotypes have unique sequences from the unique region downstream of RCCX. Only these final copies contain the gene TNXB; all earlier copies on the same haplotype contain TNXA (the pseudogene). This field can be used to infer the order of RCCX haplotypes on an allele. |
| 17 | +- `starting_hap`: Indicates the first RCCX copy on each allele. These haplotypes have unique sequences from the unique region upstream of RCCX. This field can be used to infer the order of RCCX haplotypes on an allele. |
| 18 | +- `deletion_hap`: a deletion haplotype has the characteristics of both a starting haplotype and an ending haplotype, indicating that it's the only haplotype on an allele, indicating a deletion of an RCCX copy, leaving just one copy of RCCX. |
| 19 | +- `hap_variants`: Variant calls for common gene-pseudogene (CYP21A2-CYP21A1P) differentiating sites (P31L, IVS2-13A/C>G, G111Vfs, I173N, I237N, V238E, M240K, V282L, Q319X and R357W). This is used for allele annotation of CYP21A2. For comprehensive variant calls of the RCCX module please refer to the vcf file. |
| 20 | +- `annotated_alleles`: Provides per-allele annotations of CYP21A2 based on the `hap_variants` field. Possible values may include: |
| 21 | + - `WT`: one copy each of CYP21A2 and CYP21A1P (pseudogene) on this allele. |
| 22 | + - `pseudogene_duplication`: additional copy of CYP21A1P on this allele. |
| 23 | + - `pseudogene_deletion`: CYP21A1P is deleted on this allele. |
| 24 | + - `gene_duplication`: additional copy of CYP21A2 on this allele. |
| 25 | + - `gene_deletion`: CYP21A2 is deleted on this allele. |
| 26 | + - `deletion_P31L,G111Vfs`: Deletion of one RCCX copy on this allele, creating a CYP21A1P–CYP21A2 fusion gene carrying P31L and G111Vfs variants from the pseudogene. |
| 27 | + - `duplication_WT_plus_Q319X`: Two copies of CYP21A2 on this allele: one WT, the other carrying Q319X. |
| 28 | + - `Q319X`: Single CYP21A2 copy with variant Q319X, no CNV present (other variants like 282L are reported similarly). |
23 | 29 |
|
24 | 30 | ## Visualizing haplotypes |
25 | 31 |
|
26 | | -To visualize phased haplotypes, load the output bam file in IGV, group reads by the `HP` tag and color alignments by `YC` tag. Reads are realigned to CYP21A2. |
| 32 | +To visualize phased haplotypes, load the output bam file in IGV, group reads by the `HP` tag and color alignments by `YC` tag. Reads are realigned to the CYP21A2 reference. |
27 | 33 |
|
28 | | -Green and purple represent two alleles, i.e. all haplotypes in green are on one one allele and all haplotypes in purple are on the other allele. Reads in gray are either unassigned or consistent with more than one possible haplotype. When two haplotypes are identical over a region, there can be more than one haplotype consistent with a read, and the read is randomly assigned to a haplotype and colored in gray. |
| 34 | +Green and purple represent two alleles, i.e. all haplotypes in green are on one allele and all haplotypes in purple are on the other allele. Reads in gray are either unassigned or consistent with more than one possible haplotype. When two haplotypes are identical over a region, there can be more than one haplotype consistent with a read, and the read is randomly assigned to a haplotype and colored in gray. |
29 | 35 |
|
30 | 36 |  |
31 | 37 |
|
32 | | -- In this set of examples, the top panel shows a sample with no copy number change (both alleles are `WT`). There are four copies of RCCX, two on each allele. On each allele, one copy carries CYP21A2 and the other copy carries CYP21A1P (marked by a cluster of mismatches when aligned to CYP21A2). |
33 | | -- The middle panel shows a sample with a fusion deletion (purple allele `deletion_P31L,G111Vfs`). There is only one copy of RCCX on this allele. The deletion breakpoint is in CYP21A2, creating a fusion gene between CYP21A1P and CYP21A2. |
34 | | -- The bottom panel shows a sample with a CYP21A2 duplication that carries Q319X (purple allele `duplication_WT_plus_Q319X`). On this allele, there are two copies of CYP21A2, among which one copy is WT and the other (the one next to TNXB) carries Q319X. |
| 38 | +Examples: |
| 39 | +- **Top panel**: Sample with no copy number change (both alleles are `WT`). There are four copies of RCCX, two per allele. Each allele carries CYP21A2 and CYP21A1P (marked by a cluster of mismatches when aligned to CYP21A2). |
| 40 | +- **Middle panel**: sample with a fusion deletion on the purple allele (`deletion_P31L,G111Vfs`). This allele has only one RCCX copy. The breakpoint occurs within CYP21A2, creating a CYP21A1P–CYP21A2 fusion gene that includes variants inherited from the pseudogene. |
| 41 | +- **Bottom panel**: shows a sample with a CYP21A2 duplication on the purple allele (`duplication_WT_plus_Q319X`). This allele contains two CYP21A2 copies. One is wild-type; the other (next to TNXB) carries the Q319X variant. |
35 | 42 |
|
0 commit comments