TELR outputs non-referece TE insertion predictions in multiple format.
<sample>.telr.vcf: non-reference TE insertion predictions in VCF format (1-based).<sample>.telr.json: non-reference TE insertion predictions in JSON format (0-based).<sample>.telr.expanded.json: non-reference TE insertion predictions in JSON format (0-based) with expanded information.<sample>.telr.bed: non-reference TE insertion predictions in BED format (0-based).<sample>.telr.fasta: TE insertion sequences assembled by TELR.<sample>.telr.contig.fasta: Local contig sequences assembled by TELR that include new TE insertions.
Note: If two or more TE families are reported for a given insertion (separated by '|' in the TE prediction output file), this could indicate a complex nested insertion event.
TELR generates a standard VCF file <sample>.telr.vcf in v4.1 format that has detailed information for each non-reference TE insertion.
| Column | Description |
|---|---|
| chromosome | The chromosome name where the TE insertion occurred |
| position | Starting breakpoint position of the TE insertions. |
| ID | The id of the TE insertions. |
| Ref | The sequence of the reference is always set to N. |
| Alt | The TE insertion sequence. |
| Quality | This is currently not indicated. |
| Filter | This is currently always set to be PASS |
| Info | Provides a list of information (see below) |
| FORMAT | Provides information about the next tag |
| Sample information | Depending on the way sniffles was run: Genotype estimation:Reads supporting the reference: Reads supporting the variant. |
Sniffles report multiple information in the Info field. The entries are delimited by ;.
| INFO key | Description |
|---|---|
| SVTYPE= | The type of the variant, currently this is always set to INS |
| END= | The position of the second breakpoint of the TE insertion |
| FAMILY= | TE families of the insertion, multiple families are separated by '|' |
| STRANDS= | Strand that TE insertion occurs |
| SUPPORT_TYPE= | Type of support from flank to reference alignment (single_side or both_sides) |
| RE= | Number of reads supporting the TE insertion |
| AF= | Allele frequency of the variant |
| TSD_LEN= | Length of the TSD sequence if available |
| TSD_SEQ= | TSD sequence if available |
The VCF and JSON files are essentially equivalent, but the JSON file <sample>.telr.json can be easier to parse programmatically. For each non-reference TE insertion, the JSON file contain these keys:
| Key | Description |
|---|---|
| type | The type of the TE insertions (non-reference only in current version) |
| ID | The unique id of the TE insertions |
| chrom | The chromosome name where the TE insertion occurred |
| start | Starting breakpoint position of the TE insertions |
| end | The position of the second breakpoint of the TE insertion |
| family | TE families of the insertion, multiple families are separated by '|' |
| strand | Strand that TE insertion occurs |
| support | Type of support from flank to reference alignment (single_side or both_sides) |
| tsd_length | Length of the TSD sequence if available |
| tsd_sequence | TSD sequence if available |
| te_sequence | The TE insertion sequence |
| genotype | Genotype of the variant |
| num_sv_reads | Number of reads supporting the SV allele |
| num_ref_reads | Number of reads supporting the reference allele |
| allele_frequency | Allele frequency of the variant |
Comapred to the basic JSON file described above, the expanded JSON file <sample>.telr.expanded.json includes more QC metrics that could help with filtering TE sequences for subsquent analysis. For each non-reference TE insertion, the expanded JSON file contain these additional keys compared to the basic JSON file:
| Key | Description |
|---|---|
| gap_between_flank | The size of the gap between 3' and 5' flanking sequence alignment to the reference genome (the value is negative if two alignments overlap) |
| te_length | The length of the new TE insertion sequence |
| contig_id | Unique ID for the local contig assembly |
| contig_length | The length of the local contig assembly |
| contig_te_start | Starting position of new TE insertion in the contig assembly |
| contig_te_end | End position of new TE insertion in the contig assembly |
| 5p_flank_align_coord | Coordinate of 5' flanking sequence alignment to the reference genome |
| 5p_flank_mapping_quality | Mapping quality of 5' flanking sequence alignment |
| 5p_flank_num_residue_matches | Number of residue matches of 5' flanking sequence alignment |
| 5p_flank_alignment_block_length | Alignment block length of 5' flanking sequence alignment |
| 5p_flank_sequence_identity | Sequence identity of 5' flanking sequence alignment (calculated as 5p_flank_num_residue_matches/5p_flank_alignment_block_length) |
| 3p_flank_align_coord | Coordinate of 3' flanking sequence alignment to the reference genome |
| 3p_flank_mapping_quality | Mapping quality of 3' flanking sequence alignment |
| 3p_flank_num_residue_matches | Number of residue matches of 3' flanking sequence alignment |
| 3p_flank_alignment_block_length | Alignment block length of 3' flanking sequence alignment |
| 3p_flank_sequence_identity | Sequence identity of 3' flanking sequence alignment (calculated as 3p_flank_num_residue_matches/3p_flank_alignment_block_length) |
The BED file includes minimal info about non-reference TE insertions but is easier to use in subsquent bioinformatics analysis. For each non-reference TE insertion, the BED file contain these fields:
| Column | Description |
|---|---|
| Chromosome | The chromosome name where the TE insertion occurred |
| Start | Starting breakpoint position of the TE insertions |
| End | The position of the second breakpoint of the TE insertion |
| Family | TE families of the insertion, multiple families are separated by '|' |
| Score | '.' |
| Strand | Strand that TE insertion occurs |
For each non-reference TE insertion, TELR reports new TE insertion sequences in <sample>.telr.te.fasta.
For each non-reference TE insertion, TELR reports assembled contig sequences in <sample>.telr.contig.fasta.
For each TELR run, a log file called <sample>.log is generated that records all the major steps in the program and error messages.
<sample>.log: log file of TELR run.