Skip to content

Upload Files

Cris Williams edited this page Jan 26, 2022 · 47 revisions

You must first create a dataset and describe the experiments, biosamples, and replicates used for that dataset. Once you have completed those steps you may upload data files.

In general, the steps to follow are:

  1. Organize your files into a specific directory hierarchy.
  2. Use one of our desktop or command-line upload tools.

IMPORTANT: Do NOT upload Human Subjects data using the procedures described below. Human subjects data are uploaded to FaceBase using a different procedure. Please contact FaceBase Help for more information.

On this page:

Uploading from the browser

We do support the ability to upload data from the browser. This is sufficient when it's not a large number of files and they're not that large (less than 1 GB each). In general, we recommend using the desktop or command-line upload tools. But if you prefer, here is how to upload data from the browser:

  1. Go to the Replicate record (to find Replicate records, go to the Dataset, select the Experiment and then you'll see a listing of Replicates for that Experiment). You will see sections for various data types such as: sequencing, processed, track, imaging, and/or mesh data files.
  2. For the data type section related to your data, click the Add Record button. A new browser tab opens with the data entry form.
  3. In the "Url" field, click the Select file button to select the data file.
  4. In the "File Type" field, click the field to open the "Select File Type" modal window. Find the appropriate extension under the "Name" column and select it.
  5. When you are finished, click Submit to upload the file.

Uploading using our tools

We also offer desktop or command-line tools that are useful for uploading many and/or very large files to FaceBase.

Organize your files

The upload tools will scan a directory of your choice and identify the files for upload. It will process them according to rules based on the subdirectories it finds them in. It is very important to organize your files as follows:

<dataset>/<replicate>/
    seq/
        raw sequencing files (.fastq.gz)
    proc/<mapping-assembly>/
        processed data (.bam, .bam.bai, .count, .tsv, .fastqc{.tgz|.zip})
        track data (.bed, .bb, .bw)
    img/
        high-resolution imaging data (.nii.gz, .ome.tif[f], .aim, .tif[f], .jp[e]g)
    mesh/
        3D model mesh objects (.obj.gz)
    thumb/<derived-from>/
        low-resolution "thumbnail" images (.jp[e]g, .png)
    array/
        microarray data (.CEL.gz)

Where:

  • <dataset> is the Dataset's Record ID (RID), e.g., 1-BBC4
  • <replicate> is the Replicates's Record ID (RID), e.g., 1-BBCA
  • <mapping-assembly> is the reference genome mapping assembly, i.e., mm9, mm10, hg18, hg19
  • <derived-from> is a directory using the exact same name as a filename including file extension from your raw images under the img directory, e.g., .../my-confocal-image.ome.tiff/....

NOTE it is generally advised to avoid special characters in your filenames like ;, #, spaces ' ', $, etc. These have to be converted into what is known as "percent-encodings" per the Web standards. In general, we believe we are able to support special characters, but this can be a source of confusion and errors at times. For more information, see this Wikipedia article on Percent-encoding.

Here is an example of the directory layout.

File Organization

Install the DERIVA clients

See the Deriva Clients document for installation instructions.

Uploading your files with our desktop tool (DERIVA-Upload)

Configure the DERIVA Upload Utility

  1. Open the DERIVA-Upload application.

  2. First time use: you will be asked to "Add server configuration now?" Click "Yes".

    Add server configuration

  3. In the Server Configuration dialog enter host www.facebase.org and Catalog ID 1. You may optionally add a Description FaceBase. Click "OK"

    Add server configuration

  4. From the Options dialog click "OK" again.

    Add server configuration

  5. From the main window, click "Login" to begin you session.

    Add server configuration

Upload files with DERIVA Upload Utility

Here are the instructions for uploading using our desktop tool, DERIVA-Upload:

Upload Files Interactive

  1. Open the DERIVA-Upload application
  2. Click Login (upper right hand side)
    • This uses your usual FaceBase username and password
  3. Click Browse (upper right hand side)
    • Find the directory
    • Select it
    • Click Open
  4. Confirm that your files are all accounted for in the "Pending" state.
  5. Click Upload (upper left hand side)
  6. Confirm that the status of all of your files are now in the "Completed" state.

If there are any errors, they should be reported in the status panel beneath the file listing panel.

Uploading your files with our command-line interface

Here are the instructions for uploading using our command-line client:

  1. Establish an authentication token. See Authentication Tokens.
  2. Get familiar with the deriva-upload-cli options:
    $ deriva-upload-cli --help
    
  3. From the command-line of your host, you will run a command like this:
    $ deriva-upload-cli --token <your-auth-token> www.facebase.org path/to/<dataset-RID>
    
  4. Errors will be reported to the standard output or error. Please include them in any email to the Hub.

Uploading Single Cell RNA-seq (scRNA-seq) data

For Single Cell experiments, we encourage users to upload the following 3 types of data if available:

  1. Raw sequencing files
  2. Processed files generated from a standard Single Cell pipeline like Cell Ranger
  3. Standard Seurat files

For data within the same Replicate, processed and Seurat files derived from given sequencing files should be given similar filenames so that users can see their relationships.

When not all 3 stages of data were produced during the experiments, then the user may upload the ones that are available. However, please note that currently only the Seurat files will be processed for visualization by the UCSC Cell Browser (for an example of how such a visualization appears, please see https://www.facebase.org/id/1-DTK2 and scroll down to the "Processed Data" section).

To upload single cell data files:

  1. Upload the raw sequencing files in fastq.gz format to the "Sequencing Data" section of the Replicate.
  2. Upload the expression matrices, barcodes and features files in .mtx.gz and .tsv.gz format (e.g., matrix.mtx.gz, barcodes.tsv.gz, features.tsv.gz) to the "Processed Data" section of the Replicate.
  • The corresponding File Type (e.g., .tsv.gz) and Mapping Assembly (e.g., mm10) must be manually selected if uploading from the browser.
  • If uploading with the Deriva Upload tool, the files must be placed under the <dataset>/<replicate>/proc/<mapping_assembly> folder.
  1. In order to be processed by the FaceBase Cell Browser pipeline and visualized by the UCSC Cell Browser, the Seurat files must be in .RData format and uploaded to the "Processed Data" section of the Replicate.
  • The File Type (e.g., Seurat object(v2)) and Mapping Assembly (e.g., mm10) must be manually selected if uploading from the browser.
  • If uploading with the Deriva Upload tool, the files must be placed under the <dataset>/<replicate>/proc/<mapping_assembly> folder.
  • Seurat files in other formats like .rds may be uploaded to the "Processed Data" section but they will not be processed for visualization with the UCSC Cell Browser.

Review the uploaded files

Return to the FaceBase site to your Dataset record. Drill down through the Experiments and Replicates in order to see the data files that you have uploaded. Make sure that their metadata are correct. For example, if you uploaded raw sequencing files, make sure that the "paired" and "read" attributes are correct in the metadata seen on the site. If they are incorrect, click the 'edit' icon, correct the attribute, and 'Submit' the updated record.

Visualization options

Display thumbnails on Dataset page

If you uploaded thumbnails, and you want those thumbnails to appear on the main page of your dataset, find the thumbnail record, edit it, and set the "Show In Dataset" attribute to "True".

Display 3D surface models

If you uploaded 3D model mesh objects (.obj.gz) and you want to display a 3D model on the dataset page, follow the instructions to define surface models.

Display Genome Browser tracks

If you uploaded genome browser track data and you want to display your tracks on the dataset page:

  1. Go to your dataset and scroll down to the "Genome Browser" subsection.
  2. Click "Add record".
  3. In the form, select the Mapping Assembly (required). We highly recommend entering a chromosome name (chr1, etc.) and a start and end position -- the browser will then display by default at that loci. The user will be able to change the browser position from the default position as they desire.

Clone this wiki locally