GitHub

What is the SpliceDecoder?

Splice decoder provides functional annotation for your differential splicing events (DSEs)
The functional annotation contains NMD probability, alterations in functional domains (such as DNA binding, motif, regions, protein domain, and so on), CDS/UTR alterations, and effect score
The effect score can be used to prioritize and choose the most representative functional consequences of your DSEs
Currently, SpliceDecoder supports hg38 and mm10 genome

Workflow overview

Generate All Possible Splicing Cases (Processing input) : This step makes proper format of input data from the output of event-based splicing tools
Map Splicing Cases (Mapping DSEs and ORFs): This step explores the given transcriptome (.GTF) to find Ref-TX (Reference transcript, it contains perfectly matched exon structure for the given DSE) and assign the best three open reading frames (ORFs)
Simulate Splicing Events (Simulation): Based on the Ref-TXs and their ORFs, this step perform simulation of alternative splicing (e.g., if the Ref-TX has exon inclusion (EI) form, this step makes a simulated transcript (Sim-TX) with exon skipped (ES) form)
Functional Annotation (Annotation): Based on the Uniprot DB, SpliceDecoder assigns known functional domains and estimates functional changes between Ref-TX and Sim-TX
DSEs with Effect Score (Scoring): SpliceDecoder assigns an effect score to each DSE based on multiple biological factors, enabling prioritization of your DSEs

Quick start (conda is required)

SpliceDecoder can be downloaded from https://github.com/hyeon9/SpliceDecoder/
Install SpliceDecoder by using the install script
```
cd ./SpliceDecoder && bash install.sh
```
To perform a test run, you can use the provided toy_data
You can make a toy configuration file through an interactive way You can find more details here
```
cd code/
bash Make_config.sh toy
```
If you successfully created Your_toy.config, you can run SpliceDecoder
The steps are intended to be executed in order, so it is recommended to use all
```
bash Main.sh all ${Your_toy.config}
```
If your test run with the toy data finishes successfully, you will see the following output files (except for the Effect_score.tsv)

To run with your own data, create a configuration file using the Make_config.sh or modifying example.config

cd code/
bash Make_config.sh
bash Main.sh all ${Your.config}

OR

vi example.config
mv example.config ${Your.config}

Then, use this command to submit your job if you are using SLURM

You can specify #SBATCH options such as -c 10 AND --mem=40G

sbatch Main.sh {Make_input | DS_mapping | ORF_mapping | Simulation | Scoring | all} ${Your.config}

If needed, you can run a specific step by selecting one of the following: Make_input, DS_mapping, ORF_mapping, Simulation and Scoring
```
bash Main.sh {Make_input | DS_mapping | ORF_mapping | Simulation | Scoring} ${Your.config}
```
If you want to annotate a transcript-centric data you can find more details here

Guide for making config file

Make_config.sh will ask..

? Specify your config file name (e.g. HGjob)
> You just need to specify your config file

? Enter the path of SpliceDecoder (e.g. /User/usr/Tool/SpliceDecoder-main/)
> You just need to specify the install path of SpliceDecoder

? Enter your working directory (e.g. /User/usr/Tool/SpliceDecoder-main/project1)
> You just need to specify your new working directory

? Enter your rMATS output path (e.g. /User/usr/Tool/SpliceDecoder-main/toy_data)
> You just need to specify the rMATS output path

? Enter your target gene list (e.g. /User/usr/Tool/SpliceDecoder-main/target_genes.tsv)
> You just need to provide interesting gene list, or enter 'all' if you don’t have one
> SpliceDeocder will only consider there genes

? Enter your GTF file that you used in rMATS with its full path (e.g. /User/usr/Tool/SpliceDecoder-main/toy_data/toy.gtf or /User/usr/Tool/SpliceDecoder-main/toy_data/*.gtf)
> You just need to specify the full path + GTFfile

? Do you want to calculate the effect score? [yes/no]
> Simply type yes or no. If you type "yes", SpliceDecoder will ask TPM matrix or bamfile path to calculate the effect score

? Enter your TPM matrix with full path (e.g. /User/usr/Tool/SpliceDecoder-main/toy_data/tpm.tsv or N)
> Specify the full path to your TPM matrix, or enter 'N' if you don’t have one

? Enter your bamlist which should contains bamfile with their full path in each line (e.g. /User/usr/Tool/SpliceDecoder-main/toy_data/bam_list.txt or N)
> If you don’t have a TPM matrix, specify the full path to your BAM list file, or enter 'N'

? Enter a species of your data (e.g. human or mouse)
> You just need to specify the species of your data

? Enter a type of GTF (e.g., SR (GENCODE GTF) or LR (Custom GTF) )
> You just need to specify the type of your GTF

? Specify a NMD definition method (e.g., default (55rule) or advanced) )
> You just need to select one either 'default' or 'advanced'

? Enter a FDR cut off for your rMATS (float [0-1], default 0.05)
> Specify rMATS FDR cut off

? Enter a |dPSI| cut off for your rMATS (float [0-1], default 0.1)
> Specify rMATS FDR cut off

? Enter a number of cpu in spliceDecoder job (int [0-?])
> Specify a number of cpu will be used in your job

You can reuse a pre-existing config file by copying it:
```
cp ${existing_config} project2.config
```
Then, update the following fields in the new config: input, Your_GTF, and Your_rMATS

Outputs

├── table/
│   ├── *_w_Pfam.txt: Assigned domain information of simulated transcripts (Sim-TXs)
│   └── *_wo_Pfam.txt: Assigned domain information of reference transcripts (Ref-TXs)
├── result/
│   ├── *Main_table.tsv: description
│   ├── *Domain_alt.tsv: description
│   ├── *NMD.tsv: description
│   └── Effect_score.tsv: description
├── figure/
│   ├── mapping_rate.pdf: Mapping rates for each splicing type
│   ├── mat_tx_numbers.pdf: Distribution of Ref-TX for each splicing type
│   ├── splicing_categories_stacked_plot.pdf: description
│   ├── merged_stacked_plot.pdf: description
│   ├── Summary.html: HTML file to make summary pages (pdf_1_page_1.png, pdf_2_page_2.png, pdf_3_page_3.png, and pdf_4_page_4.png)
│   └── consequence: Output directory of visualization script
├── AF2/: Contains AlphaFold2 input (amino acid FASTA)
├── temp/: Contains all intermediate files
├── post_input/: Contains files used in downstream analyiss e.g., visualization and 3D structure generation
├── mapping.stats: Mappeing rates for each splicing type
└── SD.log: The log file

Example of summary HTML

Details of Outputs

Example of the Effect_score.tsv

Key Metrics

LongID: DS event ID
Gene symbol: Gene symbol
Reference_transcript: Matched Transcript (==Ref_TX)
Simulated_event: Simulated event (ES = Exon skipping, EI = Exon inclusion, SI = Skipped intron, RI = Retained intron, Can A3/5SS = canonical 3/5' splice site, Alt A3/5SS = alternative 3/5' splice site)
Effect_Score: A score to prioritize your DS events [0,2]
Domain_change_rate: Average rate of domain changes in Sim-TX compared to Ref-TX [0,1]
Probability_of_NMD: NMD (-1), PTC removal (1), No NMD related event (0)
Functional_class: It contains the following functional classes: GoD (Gain of Domain), LoD (Loss of Domain), NMD, CDS_alts, and UTR_alts

Supplementary Metrics

Delta_PSI: PSI difference (group2 - group1) [-1,1]
Transcript_usage: Proportion of expression of reference transcript for each gene [0,1]
ORF: Used ORF (This file only contains pORF1 which has the highest coding potential)
AUG (Ref-Sim): Start codon position on the Ref TX and Sim TX (Ref-Sim)
Stop_codon (Ref-Sim): Stop codon position on the Ref TX and Sim TX (Ref-Sim)
Nucleotide_difference: Coding sequence length difference (Ref TX - Sim TX)
5'UTR_difference: 5' UTR length difference (Ref TX - Sim TX)
3'UTR_difference: 3' UTR length difference (Ref TX - Sim TX)
Domain_integrity: (Sim_domain_length / Ref_domain_length) * 100 [0,inf]
Length_of_simulated_tx_domain: Total domain length of Sim TX
Length_of_referece_tx_domain: Total domain length of Ref TX
rMATS_FDR(-log10): -Log10 scale FDR, it came from rMATS

Example of the Domain_alt.tsv

Key Metrics

DS-TX pair ID: It contains, in order Long_ID, Ref-TX ID, and simulated event type
ORF priority: A priority of the used reading frame in simulation
Domain information: A name of altered domain by the simulated alternative splicing event
Functional_change_ratio (∆L): A difference of functional change ratio for simulated alternative splicing
Change direction: It indicates whether the altered domain is a gain (1) or a loss of domain (-1)

Example of the NMD.tsv

Key Metrics

LongID: Contains, in order Long_ID, Ref-TX ID, and simulated event type
AUG: A relative position of AUG on the given transcript (Ref or Sim)
pORF: A priority of the used reading frame in simulation
distance(last_exon_junction-stop): Distance between last exon-exon junction and stop codon (calculated by last_exon_junction - stop)
total_domain_length: Total domain length of the given transcript (Ref of Sim)
key(Ref/Sim): A type of transcript (Ref of Sim)
NMD_possibility: Indicates the possibility of NMD. In default mode, values are HIGH (55nt) or No. In advanced mode, values can be HIGH (55nt), INTERMEDIATE (Long-exon), INTERMEDIATE (Start-proximal), LOW (less 55nt), or No. Only events tagged as HIGH are considered NMD-associated events
contain_PTC: Indicates whether the given transcript contains PTC (Y) or not (N)

Visualize your alternative splicing simulation

Based on your Main_table file, you can pcik ceratin DS event to visualize it using this code

conda activate spliceDecoder
python code/02-3_v3_Draw_consequence.py \
       --input ${working directory} \
       --splicing_event RI \
       --gene MYLK2 \
       --sim_splicing_event RI \
       --transcript ENSMUST00000028970.7
python code/02-3_v3_Draw_consequence.py -h  # You can get more details

If you want to remove some information in figure space, using ri option (all categories should be separated by space)

python code/02-3_v3_Draw_consequence.py \
       --input ${working directory} \
       --splicing_event A3SS \
       --gene MYLK2 \
       --sim_splicing_event Can_A3SS \
       --transcript ENSMUST00000195957.4 \
       -ri proteome chain

All figures will be saved at ${input}/figure/consequence/

Create a 3D Protein structure based on simulated

You can use Make_aa_fa.py to extract amino acid sequences from your interesting targets
This function requires the Effect_score.tsv, Toy data is not eligible for this function

You can find the ${input} and ${Main} in your .config file

conda activate spliceDecoder
python code/Make_aa_fa.py \
       -i ${input} \
       -r human \
       -t ENST00000438015.6 \
       -e ES \
       -d ${Main}

You can copy and paste the amino acid sequences to the Alphafold server (https://alphafoldserver.com) as input

Please cite this article if you use SpliceDecoder in your research

https://doi.org/10.1101/2025.10.01.679902

Name		Name	Last commit message	Last commit date
Latest commit History 426 Commits
.github/workflows		.github/workflows
code		code
Main.sh		Main.sh
README.md		README.md
dat.zip		dat.zip
example.config		example.config
install.sh		install.sh
pkg.yml		pkg.yml
toy_data.zip		toy_data.zip
transcript-toolkit.sh		transcript-toolkit.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

What is the SpliceDecoder?

Workflow overview

Quick start (conda is required)

Guide for making config file

Outputs

Details of Outputs

Key Metrics

Supplementary Metrics

Key Metrics

Key Metrics

Visualize your alternative splicing simulation

Create a 3D Protein structure based on simulated

Please cite this article if you use SpliceDecoder in your research

About

Uh oh!

Releases 1

Packages

Languages

hyeon9/SpliceDecoder

Folders and files

Latest commit

History

Repository files navigation

What is the SpliceDecoder?

Workflow overview

Quick start (conda is required)

Guide for making config file

Outputs

Details of Outputs

Key Metrics

Supplementary Metrics

Key Metrics

Key Metrics

Visualize your alternative splicing simulation

Create a 3D Protein structure based on simulated

Please cite this article if you use SpliceDecoder in your research

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages