Skip to content

EBIBioSamples/biosamples-v4

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Codacy code quality Docker Repository on Quay

BioSamples

BioSamples https://www.ebi.ac.uk/biosamples/ is an ELIXIR Core Deposition Database that stores and supplies descriptions and metadata about biological samples used in research and development by academia and industry. Samples are either 'reference' samples (e.g. from 1000 Genomes, HipSci, FAANG) or experimental samples, and have been used in an assay that has generated publicly available data in the European Nucleotide Archive (ENA) or ArrayExpress.

BioSamples supports links between sample records and any sample-derived datasets, including sequence-based datasets such as those held in ENA or ArrayExpress, -omics-based datasets such as those in PRIDE or MetaboLights, and any other assay-based databases. It provides links to assays and specific samples and accepts direct submissions of sample information.

BioSamples also synchronizes data with the NCBI BioSample database and imports data from ENA.

This document provides information about local installation and development environment setup for the BioSamples database.

Softwares

  • Git 2.17.1

  • Java 17

  • JDK 17

  • Maven 3.6.3

  • Docker 18.6

Setup

  1. Run this in terminal to install the dependent software.

    sudo apt-get update
    sudo apt-get install temurin-17-jdk maven git docker
  2. Verify the software versions.

    docker -v
    # Docker version 18.06.1-ce
    
    java -version
    # openjdk version "17.0.12" 2024-07-16
  3. Install BioSamples on your computer.

    git clone https://github.com/EBIBioSamples/biosamples-v4.git
    cd biosamples-v4
    ./mvnw -T 2C package
  4. Start BioSamples on your machine

    docker-compose up

    If you get: ERROR: Couldn’t connect to Docker daemon - you might need to run docker-machine start default, try: sudo docker-compose up

  5. Access the web interface at http://localhost:8081/biosamples/. Initially, there is no data in the local instance.

  6. Create an ENA WEBIN account for API authentication and data upload

    An ENA WEBIN account is required to upload data via the BioSamples API.

    To register a test account, replace www with wwwdev in the URL.

Data import

NCBI

Download the XML dump (~400MB):

Run the pipeline to submit the data to BioSamples API via REST:

docker-compose up biosamples-pipelines-ncbi

You may need to mount the directory where the XML file is located. Use a docker-compose.override.yml file to handle volume mounting.

MongoDB notes

A useful MongoDB client tool: http://www.mongoclient.com

Developing

Docker can be run in a virtual machine (e.g., VirtualBox) if needed. You can mount shared folders for IDE use.

To build code changes: ./mvnw -T 2C package

To rebuild docker containers: docker-compose build

To rebuild a single container: docker-compose build biosamples-pipelines

To run a service with its dependencies: docker-compose up biosamples-webapp-api

To run a containerized executable: docker-compose run --service-ports biosamples-pipelines

To pass command-line arguments (note: replaces the default executable): docker-compose run --service-ports biosamples-pipelines java -jar pipelines-4.0.0-SNAPSHOT.jar --debug

Combined Maven build and container launch: ./mvnw -T 2C package && docker-compose build && docker-compose run --service-ports biosamples-pipelines

If docker-compose is slow, check for large volumes in the source directory. Use these to clean up:

Remove all Docker volumes: docker volume ls -q | xargs -r docker volume rm

Remove all Docker images: docker images -q | xargs -r docker rmi

Warning
The above removes everything Docker-related from your machine.

Client usage

Add the spring-boot starter module for BioSamples in your Maven project:

<dependencies>
    <dependency>
        <groupId>uk.ac.ebi.biosamples</groupId>
        <artifactId>biosamples-spring-boot-starter</artifactId>
        <version>5.3.7</version>
    </dependency>
</dependencies>
maven {
  url 'https://gitlab.ebi.ac.uk/api/v4/projects/2669/packages/maven'
}

Configure biosamples.client.uri in your application.properties to point to the correct BioSamples instance.

Issues and troubleshooting

Problems with spring-data-rest

Originally, Spring Data REST was used for exposing the API but had issues:

  • Content type negotiation problems due to overlaps with Thymeleaf routes.

  • Cannot serve XML even with converters.

  • List ordering caused optional attributes to mix—better handled via Map of attributes.

Known issues

Solr has a limitation on field size (term vector length). Values over 255 characters are not indexed.

License