You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The `msigdbr` R package provides Molecular Signatures Database (MSigDB) gene sets typically used with the Gene Set Enrichment Analysis (GSEA) software:
9
11
10
-
* in an R-friendly tidy format (a data frame in a "long" format with one gene per row)
11
-
* for multiple frequently studied model organisms (human, mouse, rat, pig, zebrafish, fly, yeast, etc.)
12
-
* as both gene symbols and NCBI/Entrez Gene IDs (for better compatibility with pathway enrichment tools)
13
-
* that can be used in a script without requiring additional external files
12
+
* in an R-friendly tidy/long format with one gene per row
13
+
* for multiple frequently studied model organisms, such as mouse, rat, pig, zebrafish, fly, and yeast, in addition to the original human genes
14
+
* as both gene symbols and NCBI/Entrez Gene IDs for better compatibility with pathway enrichment tools
15
+
* that can be installed and loaded as a package without requiring additional external files
16
+
17
+
## Installation
18
+
19
+
The package can be installed from [CRAN](https://cran.r-project.org/package=msigdbr).
20
+
21
+
```{r}
22
+
install.packages("msigdbr")
23
+
```
24
+
25
+
## Usage
14
26
15
-
The package is available on [CRAN](https://cran.r-project.org/package=msigdbr).
27
+
The package data can be accessed using the `msigdbr()` function, which returns a data frame of gene sets and their member genes. For example, you can retrieve mouse genes from the C2 (curated) CGP (chemical and genetic perturbations) gene sets.
Copy file name to clipboardExpand all lines: vignettes/msigdbr-intro.Rmd
+41-43Lines changed: 41 additions & 43 deletions
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,10 @@
1
1
---
2
-
title: "Introduction to the msigdbr package"
2
+
title: "Introduction to msigdbr"
3
3
output:
4
4
rmarkdown::html_vignette:
5
5
keep_md: true
6
6
vignette: >
7
-
%\VignetteIndexEntry{Introduction to the msigdbr package}
7
+
%\VignetteIndexEntry{Introduction to msigdbr}
8
8
%\VignetteEngine{knitr::rmarkdown}
9
9
%\VignetteEncoding{UTF-8}
10
10
---
@@ -16,21 +16,23 @@ knitr::opts_chunk$set(
16
16
)
17
17
# increase the screen width
18
18
options(width = 90)
19
-
# reduce the minimum number of characters for the tibble column titles
20
-
options(pillar.min_title_chars = 8)
19
+
# reduce the minimum number of characters for the tibble column titles (default: 15)
20
+
options(pillar.min_title_chars = 10)
21
+
# increase the maximum number of rows printed (default: 20)
22
+
options(tibble.print_max = 25)
21
23
```
22
24
23
25
## Overview
24
26
25
-
Performing pathway analysis is a common task in genomics and there are many available software tools, many of which are R-based.
26
-
Depending on the tool, it may be necessary to import the pathways into R, translate genes to the appropriate species, convert between symbols and IDs, and format the object in the required way.
27
+
Pathway analysis is a common task in genomics research and there are many available R-based software tools.
28
+
Depending on the tool, it may be necessary to import the pathways, translate genes to the appropriate species, convert between symbols and IDs, and format the resulting object.
27
29
28
30
The `msigdbr` R package provides Molecular Signatures Database (MSigDB) gene sets typically used with the Gene Set Enrichment Analysis (GSEA) software:
29
31
30
-
* in an R-friendly tidy format (a data frame in a "long" format with one gene per row)
31
-
* for multiple frequently studied model organisms (human, mouse, rat, pig, zebrafish, fly, yeast, etc.)
32
-
* as both gene symbols and NCBI/Entrez Gene IDs (for better compatibility with pathway enrichment tools)
33
-
* that can be used in a script without requiring additional external files
32
+
* in an R-friendly tidy/long format with one gene per row
33
+
* for multiple frequently studied model organisms, such as mouse, rat, pig, zebrafish, fly, and yeast, in addition to the original human genes
34
+
* as both gene symbols and NCBI/Entrez Gene IDs for better compatibility with pathway enrichment tools
35
+
* that can be installed and loaded as a package without requiring additional external files
34
36
35
37
Please be aware that the homologs were computationally predicted for distinct genes.
36
38
The full pathways may not be well conserved across species.
@@ -51,40 +53,24 @@ Load package.
51
53
library(msigdbr)
52
54
```
53
55
54
-
Retrieve the gene sets data frame. In this example, for the hallmark collection.
@@ -138,8 +138,8 @@ You can check the installed version with `packageVersion("msigdbr")`.
138
138
139
139
Yes.
140
140
You can then import the GMT files (with `getGmt()` from the `GSEABase` package, for example).
141
-
The GMTs only include the human genes, even for gene sets generated from mouse data.
142
-
If you are not working with human data, you then have to convert the MSigDB genes to your organism or your genes to human.
141
+
The GMTs only include the human genes, even for gene sets generated from mouse experiments.
142
+
If you are not working with non-human data, you then have to convert the MSigDB genes to your organism or your genes to human.
143
143
144
144
**Can I convert between human and mouse genes just by adjusting gene capitalization?**
145
145
@@ -156,14 +156,12 @@ You may still end up with dozens of homologs for some genes, so additional clean
156
156
There are a few other resources that and provide some of the functionality and served as an inspiration for this package.
157
157
[Ge Lab Gene Set Files](http://ge-lab.org/#/data) has GMT files for many species.
158
158
[WEHI](http://bioinf.wehi.edu.au/software/MSigDB/) provides MSigDB gene sets in R format for human and mouse, but the genes are provided only as Entrez IDs and each collection is a separate file.
159
-
[MSigDF](https://github.com/stephenturner/msigdf) is based on the WEHI resource, so it provides the same data, but converted to a more tidyverse-friendly data frame.
160
-
When `msigdbr` was initially released, all of them were multiple releases behind the latest version of MSigDB, so they are possibly no longer maintained.
159
+
[MSigDF](https://github.com/stephenturner/msigdf) is based on the WEHI resource, but is converted to a more tidyverse-friendly data frame.
160
+
When `msigdbr` was initially released, these were multiple releases behind the latest version of MSigDB, so they may not be actively maintained.
161
161
162
162
## Details
163
163
164
164
The Molecular Signatures Database (MSigDB) is a collection of gene sets originally created for use with the Gene Set Enrichment Analysis (GSEA) software.
165
165
166
166
Gene homologs are provided by HUGO Gene Nomenclature Committee at the European Bioinformatics Institute which integrates the orthology assertions predicted for human genes by eggNOG, Ensembl Compara, HGNC, HomoloGene, Inparanoid, NCBI Gene Orthology, OMA, OrthoDB, OrthoMCL, Panther, PhylomeDB, TreeFam and ZFIN.
167
167
For each human equivalent within each species, only the ortholog supported by the largest number of databases is used.
0 commit comments