A comprehensive toolkit to unravel RNA isoform structures obtained from minigene splicing assay and massive parallel sequencing, integrating both an intuitive GUI and a powerful Snakemake pipeline.
This tool is composed of two main components:
- Graphical User Interface (GUI): A user-friendly application to generate the reference files required for the pipeline of the artificial genome representing the (multi-)exonic minigene splicing construction in fasta and gtf formats.
- Snakemake Pipeline: A robust and scalable workflow designed for high-throughput environments.
These components can be used independently or together, depending on the user's needs.
- Easy-to-use GUI for configuration file creation
- Modular and reproducible Snakemake pipeline
- Cross-platform compatibility
- Support for containerized execution using [Docker]
The GUI application for Windows is available as a standalone executable.
📦 Download the latest release here
To install:
- Download the
.exe
file from the Releases section. - Run the executable directly (no installation required).
This pipeline is adapted from the SOSTAR pipeline.
First, sequencing files were aligned using the minimap2 RNAseq aligner in either short or long read mode depending on the option used.
Then, isoform combinations were assembled using the StringTie tool in a guided mode with the construction.gtf
created by the GUI. They are merged using the merged mode of StringTie et expressions were computed with the expression mode of StringTie. Finally, isoforms were annotated using the SOSTAR nomenclature related to the construction used.
The Snakemake pipeline requires a Unix-like environment (Linux, macOS).
Requirements:
- Python >= 3.8
- Snakemake >= 7.32.4 or greater
- Apptainer/Singularity >= 3.5.3 or greater
The workflow automatically uses a docker image which contains the other tools required.
To set up:
git clone https://github.com/LBGC-CFB/MAGIC.git
cd ./MAGIC
To configure this workflow, modify the
./config/config.yaml
file according to your needs.
-
constructiondir
: Path of the construction directories that contains the different constructions with their correspondingconstruction.fasta
andconstruction.gtf
. -
indir
: Path of the directory containing either the*.fastq
file for long read sequencing or the*_L001_R1_001.fastq.gz
and*_L001_R2_001.fastq.gz
files for short read sequencing. -
outdir
: Path of the outdirectory. -
samples
: names of the differents samples: name of the construction used -
threads
: number of threads to use. -
option
: shortread or longread, indicated by true or false.
-
Launch the GUI application.
-
Define the necessary parameters.
-
Click on Generate files. The files will be generated once all the necessary parameters have been defined.
-
Transfer the file to the machine where the pipeline will be executed.
Given that the workflow has been properly configured, it can be executed as follows:
cd ./workflow
snakemake --use-singularity --cores
You can provide your own configuration file using the --configfile
option when launching the pipeline — as long as it follows the same structure and format.
APPTAINER_BINDPATH
or SINGULARITY_BINDPATH
as follow:
export APPTAINER_BINDPATH="/path/to/data/dir"
Output directory tree:
MAGIC/outdir
├── alignment
│ ├── {sample}.minimap2.bam
│ ├── {sample}.minimap2.bam.bai
│ └── ...
├── assembly
│ └── {construction}
│ ├── {sample}_assembly.gtf
│ ├── ...
│ └── {construction}_merged.gtf
│ └── ...
├── expression
│ ├── {sample}_expression.gtf
│ └── ...
├── constructions_all.gtf
└── MAGIC_SOSTAR_annotation_table_results.xlsx
alignment
folder: contains all aligned and sorted {sample} <.bam> with their corresponding index <.bai>.assembly
folder: contains subfolders of all assembled <.gtf> {sample} with StringTie and the corresponding merge <.gtf> file for each different construction.expression
folder: contains all assembled <.gtf> with expression metrics computed by StringTie.constructions_all.gtf
: a global gtf file of all the different file of constructions used for the merge step.MAGIC_SOSTAR_annotation_table_results.xlsx
: final output file containing descriptive annotation and expression metrics of each transcript in the cohort (see SOSTAR annotation section for more informations).
MAGIC uses the SOSTAR tool to provide annotation of all the different isoforms assembled by StringTie.
SOSTAR generates a spreadsheet file in <.xlsx> format:
transcript_id | construction | gene | annot_ref | annot_find | MIN01 | MIN02 | MIN03 | occurence | P_MIN01 | P_MIN02 | P_MIN03 |
---|---|---|---|---|---|---|---|---|---|---|---|
NM_000059_pCAS2_BRCA2_Ex3 | pCAS2_BRCA2_Ex3 | BRCA2 | 3 | 3 | 76093,1 | 1114,3 | 15320,6 | 3 | 98,76 | 0,6 | 10,53 |
M_pCAS2_BRCA2_Ex3.1.1 | pCAS2_BRCA2_Ex3 | BRCA2 | 3 | Δ3 | 954 | 184159,5 | 130188,3 | 3 | 1,24 | 99,4 | 89,47 |
transcript_id
: transcript identifierconstruction
: construction namegene
: gene associated to the transcriptannot_ref
: reference annotation of constructionannot_find
: SOSTAR annotation of transcript (see section SOSTAR nomenclature for more details).sample
: transcript coverageoccurence
: number of transcript occurrences in cohortP_sample
: transcript coverage proportion between all isoform combinations
Isoforms are described relative to reference transcripts (provided by user) by an annotation including only the alternative splicing events. Some conventions were established to annotate the alternative splicing events:
symbol | definition |
---|---|
∆ | skipping of a reference exon |
▼ | inclusion of a reference intron |
p | shift of an acceptor site |
q | shift of a donnor site |
(37) | number of skipped or retained nucleotides |
[p23, q59] | relative positions of new splice sites |
exo | exonization of an intronic sequence |
int | intronization of an exonic sequence |
- | continuous event |
, | discontinuous event |
Nomenclature example:
Black boxes: exon, black lines: intron, red boxes: exon (or part of exon) skipping, green boxes: novel exon (or part of exon).
A minimal test dataset is provided in the test/ directory.
- Select your output directory.
- Use the
pCAS2-51D-Ex2_9_assembledseq.fasta
file as input file. This corresponds to the complete sequence of a multi-exonic minigene splicing assay of the RAD51D gene spanning exons 2 to 9. - Complete the other parameters with thoses mentionned in
parameters_for_MAGIC_GUI.yaml
file. - Click Validate and Generate Files to create the
.fasta
and.gtf
files.
The config.yaml
config file in the config
directory has been completed to ensure the pipeline launches correctly with the test files.
Place the files generated by the GUI in the tests/pCAS2_RAD51D_Ex2-9
folder.
Define your own directory for output results by modifying the outdir:
parameter.
Launch the pipeline using the following command line:
snakemake --use-singularity --cores
Camille AUCOUTURIER @AUCAM
This project is under GPL-3.0 License, see LICENCE for more details.