Skip to content

Importing Matrix Data

sebastian-raubach edited this page Oct 6, 2017 · 1 revision

The Matrix mapper tab of Germinate Daim is used to import what we call Matrix Data.

File format

Matrix Data can be described as a kind of data where each item you want to import is stored in a cell of your data file. Below we show an example of such a file:

       marker1   marker2   marker3   marker4
6763543   G                   G/C       C
6763993   A         C         G         C
6764259   A         T         G         T
6763298   G         C         C         A

Each data item (e.g. G or G/C) is described by its row identifier, column identifier and value. The (required) row and column headers identify the individual cell values. In this example, the row identifiers are the ids in the first cell of each data row. The column identifiers are the marker names along the top. The values are located in the actual matrix. Empty cells will be ignored during the import process.

Data separator

In this example we chose "tab" as the data separator. Germinate Daim currently supports the following characters as data separators:

  • tab
  • ,
  • ;
  • |

Selecting the target table

The first thing after connecting to the database is to select the target database table, i.e., the table you want to import data into. This is done by selecting a table from the combo box in the top left corner.

The number in brackets shows the number of data items currently stored in this table.

Selecting an input file

Click on the folder icon button to select an input file. This is the file containing your actual data. Depending on your previous selections, the Input options dialog will open to ask you for the data separator and number format.

You can prevent this dialog from opening each time you select a file by un-ticking the checkbox labelled "Always ask after selecting a file".

Mapping your data

After the target table and the input file have been selected, you can start mapping your data to the database table.

The screenshot above shows the final mapping of our example input file and the "genotypes" table. We'll now explain in detail how this view works. As you can see, the first three rows cannot be removed. This is because we require (at least) these three input mappings to be defined. Each row represents a mapping of one item of the input file (first combo box) to one of the columns of the database table (second combo box). As an example, we map the "Row identifier" of the input file to the germinatebase_id column in the database. The various other buttons are explained below:

  • Clicking on this allows you to enter a static value that will be imported into the mapped database column for each of your input items (rows). This is useful if you don't have an input column that you could map to the database column.
  • Depending on your requirements, you may need to import only parts of the actual cell value. Or you may need to split the cell value into two database columns. To achieve this, we allow defining regular expressions in the Regular Expressions dialog. This regular expression is then used to extract the part of the cell value that you need. In our example, we defined two Value rows to import the genotype into the allele1 and allele2 columns of the database. This is necessary to store heterozygous genotypes.
  • In certain cases you may want to reference a database item in a different table. If you don't know its id, but only its value, the Key Lookup dialog helps you define the relation between the two tables. In our example dataset, we selected the column identifier to map to marker_id. As can easily be seen marker1 is not an id, but rather the marker name. This is why we define a lookup to the "markers" table and the marker_name column. Germinate Daim will then search for the marker name in this table-column combination. It will the automatically use the id when importing your data.
  • Clicking this button removes the current row.

Starting the import process

To start the import process, click on the import data button () starts the import process. A dialog will open informing you about the import progress. Should the import fail due to an error with either the input data, the mapping or the database. An error message will open showing the error message and asking if you want to continue the import anyway. After the import finished successfully, the progress dialog will close.

Clone this wiki locally