-
Notifications
You must be signed in to change notification settings - Fork 1
Database Creation
The Molecular Almanac displays data drawn from an underlying SQLite database pre-loaded with assertion data. This page describes the process by which this data is automatically inserted into a new SQLite database compatible with the Browser.
The database structure is defined in Vertabelo, from which the SQL commands to CREATE or DROP each table are generated. View the moalmanac-admin repository for the process by which credentials to view the database may be generated and the CREATE/DROP scripts may be generated. Once downloaded, the db_create.sql and db_drop.sql scripts must be moved to the db_scripts/ directory at the root level of the almanac-browser repository.
The database contents are stored in a collection of TSV files along with a separate feature_definitions.tsv file that provides the definitions for each feature and attribute. Descriptions of the separate TSV files are given below. All of the existing feature TSV files may be downloaded directly from the Molecular Almanac About Page, however, the feature_definitions.tsv file must be separately defined.
The feature_definitions.tsv file has 6 columns: feature, readable_name, attribute, readable_attribute, and type.
| Column | Description |
|---|---|
feature |
The "programmatic" name of the feature; it should not contains spaces or special characters (other than underscores). |
readable_name |
A human-readable string representing the name of the feature and can contain any characters. |
attribute |
Holds the name of the attribute defined by this row, and should be in "programmatic" format as defined above. |
readable_attribute |
The "human-readable" version of the attribute name. |
type |
Defines what kind of data stored by this attribute; currently, the database can handle text, integer, and gene data. |
Note that there must exist one row per attribute; when defining multiple attributes for a single feature, the feature, and readable_name columns must be duplicated for each attribute. An example features_definition.tsv file is given below..
For each feature, a separate TSV file must be created containing all assertions about that feature. The name of the file must be the programmatic feature name as defined in feature_definitions.tsv appended with .tsv (e.g., knockdown.tsv). The initial columns headers must be the names of the attributes defined for that feature in feature_definitions.tsv, in the same order they are defined. The remaining columns are disease, oncotree_ontology, oncotree_code, therapy, therapy_type, sensitivity, resistance, favorable_prognosis, predictive_implication, description, source_type, citation, url, doi, pmid, citation, nct, last_updated.
| Column | Description |
|---|---|
disease |
Name of the disease this assertion references, as specified by the article/user. |
oncotree_ontology |
Standard OncoTree name for this disease. |
oncotree_code |
Standard OncoTree code for this disease (e.g., LUAD for Lung Adenocarcinoma). |
therapy |
Name of the therapy associated with this assertion. |
therapy_type |
Type/class of the therapy associated with this assertion. |
sensitivity |
1 if this assertion implies sensitivity to the given therapy; 0 otherwise. |
resistance |
1 if this assertion implies resistance to the given therapy; 0 otherwise. |
favorable_prognosis |
1 if this assertion implies a less severe disease state (somatic assertions) or lower likelihood of acquiring cancer (germline assertions); 0 otherwise. |
predictive_implication |
Category of likelihood that the assertion has clinical relevance, as defined on the Molecular Almanac About Page. |
description |
String explaining the how and why the source makes the given assertion. |
source_type |
Classification of source (e.g., Journal or Guideline). |
citation |
Formal citation string for this source (preferably in AMA format). |
url |
URL to the source of this assertion, if applicable. |
doi |
DOI of the source making this assertion (do not insert DOI URLs; only the DOI itself). |
pubmed_id |
PMID of the source making this assertion, if applicable. |
nct |
NCT code of the clinical trial, if applicable. |
last_updated |
Date this assertion was last updated by a Molecular Oncology Almanac admin (in MM/DD/YYYY format). |
An example features TSV file, knockdown.tsv, is given below.
The Molecular Almanac Browser includes several scripts to automate creation of the database. First, all feature TSV files must be stored in the same directory; the example below will use db_scripts/almanac_v0.4.0/. As described at the beginning of these instructions, the db_create.sql and db_drop.sql files from Vertabelo must be stored within the db_scripts/ directory.
Next, the db_scripts/create_db.sh script must be called. This script will import the features & assertions given in a specified directory into a new SQLite3 database. The script takes 6 required positional arguments:
./db_scripts/create_db.sh <feature definitions TSV> <feature TSV directory> <DB base name> <DB major version> <DB minor version> <DB patch version>
| Parameter | Description | Example |
|---|---|---|
<feature definitions TSV> |
TSV file defining all features and attributes used in the database. | db_scripts/almanac_v0.4.0/feature_definitions.tsv |
<feature TSV directory> |
Directory containing TSV files corresponding to each feature defined in feature_definitions.tsv. |
db_scripts/almanac_v0.4.0/ |
<DB base name> |
Output filename for the resulting database; the version number specified below will be appended to this name (e.g., almanac.0.4.0.sqlite3). |
almanac |
<DB major version> |
First digit of the DB version number. | 0 |
<DB minor version> |
Second digit of the DB version number. | 4 |
<DB patch version> |
Third digit of the DB version number. | 0 |
The result SQLite DB file is then output into the db_versions/ directory. Note that several "Table
| feature | readable_name | attribute | readable_attribute | type |
|---|---|---|---|---|
| rearrangement | Rearrangement | rearrangement_type | Rearrangement Type | text |
| rearrangement | Rearrangement | gene1 | Gene 1 | gene |
| rearrangement | Rearrangement | gene2 | Gene 2 | gene |
| rearrangement | Rearrangement | locus | Locus | text |
| somatic_variant | Somatic Variant | variant_type | Variant Type | text |
| somatic_variant | Somatic Variant | gene | Gene | gene |
| somatic_variant | Somatic Variant | protein_change | Protein Change | text |
| germline_variant | Germline Variant | variant_type | Variant Type | text |
| germline_variant | Germline Variant | protein_change | Protein Change | text |
| germline_variant | Germline Variant | gene | Gene | gene |
| copy_number | Copy Number | direction | Direction | text |
| copy_number | Copy Number | gene | Gene | gene |
| copy_number | Copy Number | locus | Locus | text |
| microsatellite_stability | Microsatellite Stability | direction | Direction | text |
| mutational_signature | Mutational Signature | signature_number | Signature Number | integer |
| mutational_burden | Mutational Burden | burden | Burden | integer |
| neoantigen_burden | Neoantigen Burden | burden | Burden | integer |
| knockdown | Knockdown | technique | Technique | text |
| knockdown | Knockdown | gene | Gene | gene |
| silencing | Silencing | technique | Technique | text |
| silencing | Silencing | gene | Gene | gene |
| aneuploidy | Aneuploidy | effect | Effect | text |
| technique | gene | disease | oncotree_term | oncotree_code | context | therapy_name | therapy_type | therapy_sensitivity | therapy_resistance | favorable_prognosis | predictive_implication | description | source_type | citation | url | doi | pmid | nct | last_updated |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| shRNA | CDK12 | Ovarian Cancer | Ovarian Cancer, Other | OOVC | Olaparib | Targeted therapy | 1 | Preclinical | shRNA knockout in ovarian cancer suggests suggests sensitivity to PARP1/2 inhibition, specifically olaparib. | Journal | Bajrami I, Frankum JR, Konde A, et al. Genome-wide profiling of genetic synthetic lethality identifies CDK12 as a novel determinant of PARP1/2 inhibitor sensitivity. Cancer Res. 2014;74(1):287-97. | https://doi.org/10.1158/0008-5472.CAN-13-2541 | 10.1158/0008-5472.CAN-13-2541 | 24240700 | 11/3/17 | ||||
| shRNA | RAD17 | Breast Cancer | Invasive Breast Carcinoma | BRCA | Veliparib | Targeted therapy | 1 | Preclinical | RNAi knockdown of RAD17 in the HME-CC breast cancer cell line resulted in increased sensitivity to ABT-888, Veliparib (PARP inhibitor). If RAD50 is also knocked out, a further increase in sensitivity is observed. | Journal | Weigman VJ, Chao HH, Shabalin AA, et al. Basal-like Breast cancer DNA copy number losses identify genes involved in genomic instability, response to therapy, and patient survival. Breast Cancer Res Treat. 2012;133(3):865-80. | https://doi.org/10.1007/s10549-011-1846-y | 10.1007/s10549-011-1846-y | 22048815 | 11/3/17 |