Skip to content
svengato edited this page Nov 27, 2020 · 5 revisions

How to add a dataset

Not necessarily in this order,

  1. Make sure that your annotations file exists on the LIS data store.

  2. Make sure your genetic marker files (in .gff3.gz and .gff3.gz.tbi format), and raw GWAS and/or QTL data files (in .tsv.gz format), exist on DSCensor, with a canonical_type of mrk/gwas/qtl respectively. ZZBrowse will use these to generate datasets from your raw data.
    [To do: add something about the file specifications and how to list files on DSCensor.]

  3. If your organism file does not already exist, create it in the organisms subdirectory.
    Line 1 - the organism display name
    Line 2 - its chromosome lengths, either numeric or in the form name:length
    Line 3 - forms of the organism name: Genus species,G.species,Gensp
    Line 4 - URL or local file path of the annotations file (from step 1)
    Line 5 - chromosome name formats: (1) format for display, (2) full format in annotations file, (3) regex format for validating chromosome names returned by the Genome Context Viewer
    Line 6 - base URL for Services API genomic linkage queries
    Line 7 - tags for constructing annotations table: strand column name, forward strand code, reverse strand code, start-of-gene column name, end-of-gene column name, URL format for returning gene links, gene id column name (to plug into URL format), gene name column name, chromosome column name, gene description column name

  4. In www/config/datasetProperties.csv, add a line for each of your new GWAS and/or QTL datasets.
    dataset = the dataset's display name.
    chrColumn = which column in the dataset contains the chromosome name. Note that this must begin with "chr" (case-insensitive).
    bpColumn = which column contains the SNP position (for GWAS data) or interval center position (for QTL data).
    traitCol = which column contains the trait or phenotype.
    yAxisColumn = which column contains the p-value (or other significance value or score).
    logP = whether to use -log10(yAxisColumn) in the charts (generally TRUE for p-values, FALSE for others).
    axisLim = whether to specify hard y-axis limits on the charts (always FALSE for our data).
    axisMin = hard bottom of y-axis (or 0 if axisLim = FALSE).
    axisMax = hard top of y-axis (or 1 if axisLim = FALSE).
    organism = the species to which the dataset refers.
    plotAll = whether all data are for the same trait (probably always FALSE for our data).
    supportInterval = whether to support interval data, as for QTL data. Set the remaining columns to something meaningful if supportInterval is TRUE:
    SIyAxisColumn = which column contains the significance value for interval data ("val" for those we generate on the fly).
    SIbpStart = which column contains the start position for interval data.
    SIbpEnd = which column contains the end position for interval data.
    SIaxisLimBool = whether to specify hard y-axis limits for interval data (always FALSE for our data).
    SIaxisMin = hard bottom of interval y-axis (or 0 if SIaxisLimBool = FALSE).
    SIaxisMax = hard bottom of interval y-axis (or 1 if SIaxisLimBool = FALSE).

  5. Tell ZZBrowse where to find your data:
    buildGWAS.R - add its lis.datastore.info
    buildQTL.R - add its lis.datastore.info
    server.R - add it to lis.datastore.gwas or lis.datastore.qtl

For GWAS data that live elsewhere than DSCensor: in buildGWAS.R, add any remote GWAS URLs, specify their column names, and do any special handling.

  1. Other notes

To do: buildQTL.R needs some generalization.

Combined GWAS-QTL datasets: manually for now, though I am working on a script (merge-gwas-and-qtl.R) to do it.

Also to do: investigate eliminating legumeInfo.organisms (unused?) from server.R

Clone this wiki locally