Easy-contrast-DESeq2 is a module to analyse count data from bulk RNA-Sequencing experiments. It performs quality control, normalisation and differential analysis using the expression count files as input. This module uses the DESeq2 bioconductor R package. You will find in the Beginner's guide to using the DESeq2 package basic informations about DESeq2 and how produce the expression files it requires.
This module is part of the Eoulsan pipeline analysis.
The module is made of 4 RMarkdown files:
- 01_normDiffana.Rmd
- 02_child_collapse_replicates.Rmd (02 in the diagram)
- 03_child_differential_expression.Rmd (03 in the diagram)
- 04_child_pairwise_comparison.Rmd (04 in the diagram)
Notes, for the diagram:
(1) Load more packages in the Settings > Environment section. Use the package::function nomenclature as much as possible for function traceability.
(2) Write more custom functions in the Settings > Custom functions section.
(3) Add more parameters in the RMarkdown file header, and parse them in the Settings > Parameters section.
(4) Add more columns to the design file, but explain them in the Data preparation > Load metadata section.
The three child files are conditionally executed within the main document based on parameter values:
- The file
02_child_collapse_replicates.Rmdis executed when there are at least two rows (samples) in the design file with the same value in theRepTechGroupcolumn. It is executed only once. - The file
03_child_differential_expression.Rmdis executed when the input parameterdiffanaTestis set to TRUE. It is executed only once. - The file
04_child_pairwise_comparison.Rmdis executed when:- Simple mode: There are at least two distinct values in the
Referencecolumn of the design file. - Complex mode: There are comparisons in the comparison file, that match the ones automatically identified based on the provided formula (
deseqModel) and that lead to unique constrast vectors.
This file is executed as many times as there are pairwise comparisons.
- Simple mode: There are at least two distinct values in the
Either render the main 01_normDiffana.Rmd document within RStudio, or use the command line.
docker run \
-ti --rm \
-v /:/ \
-w $(readlink -f .) \
-u $(id -u):$(id -g) \
genomicpariscentre/easycontrasts:2.0 \
Rscript -e "rmarkdown::render(
input = '01_normDiffana.Rmd',
output_file = 'myProject.html',
params = list(projectName = 'myProject',
moreParameter = parameterValue))"
They are represented in orange (except complexMode which is internally determined) on the pipeline overview.
| Parameter | Default value | Definition |
|---|---|---|
| projectName | "Experiment1" | name of the project, display in the titles (document, figures) |
| designPath | "deseq2_Experiment1-deseq2Design.txt" | path to the design file |
| comparisonPath | "deseq2_Experiment1-comparisonFile.txt" | path to the comparison file* |
| diffanaTest | TRUE | Whether to perform the differential expression analysis or not |
| expHeader | TRUE | Whether the design table has a header or not |
| deseqModel | "~Condition" | DESeq2 model |
| sizeFactorType | "ratio" | Size factor type |
| fitType | "parametric" | Fit type |
| statisticTest | "Wald" | Statistic test |
| weightContrast | FALSE | Whether to weight the contrast vector by the number of samples or not* |
| prefix | "deseq2_" | Prefix to save the output TSV files |
| plotInteractive | FALSE | Whether to make interactive volcano plots or not |
| logoUrl | "logo-GenomiqueENS-90pxh.png" | Link to the logo displayed in the top left corner of the output document |
| authorName | "Eoulsan" | Name of the author, for the bottom left corner of the document |
| authorMail | "eoulsan@biologie.ens.fr" | Mail of the author, for the bottom left corner of the document |
| leaveOnError | TRUE | Whether to stop the rendering in case of error |
*: The comparisonPath and weightContrast are considered only when complexMode is TRUE.
CAUTION: All the input files should be tabulated files.
The design file should include at least the following columns: Name, Condition, RepTechGroup, Reference and expressionFiles.
- Name: the names of your samples
- Condition: the biological replicates. All biological replicates should have the same Condition value
- RepTechGroup: the technical replicates. All technical replicates should have the same RepTechGroup value to be pooled together
- Reference: conditions to use as a reference in the differential expression analysis should have a Reference value set to 0. Conditions to ignore should have a negative value. Other conditions to be compared to the reference must have a positive value.
Notes:
- More columns can be used for the contrast mode (see the model design with column type and day).
- None of the values must starts by a digit or a symbol.
The Reference column should be filled:
- Reference < 0: the samples are considered for the normalisation step only but not by the differential expression analysis
- Reference = 0: the condition associated with these samples is considered as a reference in the differential expression analysis
- Reference > 0: the condition associated with these samples is compared to each condition having a lower reference value
Example:
| Name | Condition | RepTechGroup | Reference | expressionFiles | Condition |
|---|---|---|---|---|---|
| sample1 | WT-day1a | WT-day1 | 0 | expression_WT-day1a.tsv | WT |
| sample2 | WT-day1b | WT-day1 | 0 | expression_WT-day1b.tsv | WT |
| sample3 | KO-day1a | KO-day1 | 1 | expression_KO-day1a.tsv | KO |
| sample4 | KO-day1b | KO-day1 | 1 | expression_KO-day1b.tsv | KO |
| sample5 | WT-day2a | WT-day2 | 0 | expression_WT-day2a.tsv | WT |
| sample6 | WT-day2b | WT-day2 | 0 | expression_WT-day2b.tsv | WT |
| sample7 | KO-day2a | KO-day2 | 2 | expression_KO-day2a.tsv | KO |
| sample8 | KO-day2b | KO-day2 | 2 | expression_KO-day2b.tsv | KO |
The deseqModel parameter could be "~ Condition".
The Reference column should NOT be filled.
Example:
| Name | Condition | RepTechGroup | expressionFiles | type | day |
|---|---|---|---|---|---|
| sample1 | WT-day1a | WT-day1 | expression_WT-day1a.tsv | WT | d1 |
| sample2 | WT-day1b | WT-day1 | expression_WT-day1b.tsv | WT | d1 |
| sample3 | KO-day1a | KO-day1 | expression_KO-day1a.tsv | KO | d1 |
| sample4 | KO-day1b | KO-day1 | expression_KO-day1b.tsv | KO | d1 |
| sample5 | WT-day2a | WT-day2 | expression_WT-day2a.tsv | WT | d2 |
| sample6 | WT-day2b | WT-day2 | expression_WT-day2b.tsv | WT | d2 |
| sample7 | KO-day2a | KO-day2 | expression_KO-day2a.tsv | KO | d2 |
| sample8 | KO-day2b | KO-day2 | expression_KO-day2b.tsv | KO | d2 |
The deseqModel parameter could be "~type+day+type:day".
The comparison file (beind the comparisonPath parameter) could be:
comparison1 dayd1_vs_dayd2
comparison2 typeKO_vs_typeWT
comparison3 typeWT%dayd1_vs_typeWT%dayd2
comparison4 typeKO%dayd1_vs_typeWT%dayd1
comparison5 typeKO%dayd2_vs_typeWT%dayd2
Notes:
- The comparison file doesn't contain a header.
- The comparison file contains two columns: name of the comparison and comparison formula.
- The comparison formula is case sensitive. Respect the column names and values from the design file.
Count files should include a first column with features description (identifiers or symbols) and a second column with counts. These count files can include a header or not (expHeader parameter). For each sample, the file path is retrieved through the expressionFile column of the design file (designPath parameter).
Example:
Id Count
ENST00000000233 569
ENST00000000412 119
ENST00000000442 25
ENST00000001008 42
ENST00000001146 0
ENST00000002125 19
...
Rendering the main 01_normDiffana.Rmd document output three types of files:
- the rendered HTML notebook ;
- various figures, saved in the
${prefix}_figuresfolder, and highlighted in blue on the pipeline overview ; - various TSV files, along with the HTML report, and highlighted in purple on the pipeline overview.
This module has been developed using R version 4.5.1 and the following main packages:
| Package | Version |
|---|---|
| Biobase | 2.68.0 |
| BiocGenerics | 0.54.0 |
| ComplexHeatmap | 2.24.0 |
| circlize | 0.4.16 |
| DESeq2 | 1.48.0 |
| dplyr | 1.1.4 |
| FactoMineR | 2.12 |
| generics | 0.1.3 |
| ggdendro | 0.2.0 |
| ggplot2 | 4.0.0 |
| ggrepel | 0.9.6 |
| GenomeInfoDb | 1.44.0 |
| GenomicRanges | 1.60.0 |
| gridtext | 0.1.5 |
| IRanges | 2.42.0 |
| kableExtra | 1.4.0 |
| knitr | 1.50 |
| Matrix | 1.7-3 |
| MatrixGenerics | 1.20.0 |
| matrixStats | 1.5.0 |
| plotly | 4.11.0 |
| RColorBrewer | 1.1-3 |
| rmdformats | 1.0.4 |
| rmarkdown | 2.29 |
| reshape2 | 1.4.4 |
| S4Vectors | 0.46.0 |
| stringr | 1.5.2 |
| SummarizedExperiment | 1.38.0 |
A Docker container containing all the packages of interest along with their dependencies is available at:
(https://hub.docker.com/r/genomicpariscentre/easycontrasts)[https://hub.docker.com/r/genomicpariscentre/easycontrasts] (Version 2.0)
