BioSamples https://www.ebi.ac.uk/biosamples/ is an ELIXIR Core Deposition Database that stores and supplies descriptions and metadata about biological samples used in research and development by academia and industry. Samples are either 'reference' samples (e.g. from 1000 Genomes, HipSci, FAANG) or experimental samples, and have been used in an assay that has generated publicly available data in the European Nucleotide Archive (ENA) or ArrayExpress.
BioSamples supports links between sample records and any sample-derived datasets, including sequence-based datasets such as those held in ENA or ArrayExpress, -omics-based datasets such as those in PRIDE or MetaboLights, and any other assay-based databases. It provides links to assays and specific samples and accepts direct submissions of sample information.
BioSamples also synchronizes data with the NCBI BioSample database and imports data from ENA.
This document provides information about local installation and development environment setup for the BioSamples database.
-
Run this in terminal to install the dependent software.
sudo apt-get update sudo apt-get install temurin-17-jdk maven git docker
-
Verify the software versions.
docker -v # Docker version 18.06.1-ce java -version # openjdk version "17.0.12" 2024-07-16
-
Install BioSamples on your computer.
git clone https://github.com/EBIBioSamples/biosamples-v4.git cd biosamples-v4 ./mvnw -T 2C package
-
Start BioSamples on your machine
docker-compose up
If you get:
ERROR: Couldn’t connect to Docker daemon - you might need to run docker-machine start default
, try:sudo docker-compose up
-
Access the web interface at http://localhost:8081/biosamples/. Initially, there is no data in the local instance.
-
Create an ENA WEBIN account for API authentication and data upload
An ENA WEBIN account is required to upload data via the BioSamples API.
Production WEBIN accounts can be created here: https://www.ebi.ac.uk/ena/submit/webin/auth/swagger-ui/index.html?configUrl=/ena/submit/webin/auth/v3/api-docs/swagger-config#/AdministrationAPI/createSubmissionAccount
To register a test account, replace
www
withwwwdev
in the URL.
An example JSON payload to POST to http://localhost:8081/biosamples/beta/samples can be found here: https://github.com/EBIBioSamples/biosamples-v4/blob/master/models/core/src/test/resources/TEST1.json
Download the XML dump (~400MB):
Run the pipeline to submit the data to BioSamples API via REST:
docker-compose up biosamples-pipelines-ncbi
You may need to mount the directory where the XML file is located. Use a docker-compose.override.yml
file to handle volume mounting.
A useful MongoDB client tool: http://www.mongoclient.com
Docker can be run in a virtual machine (e.g., VirtualBox) if needed. You can mount shared folders for IDE use.
To build code changes:
./mvnw -T 2C package
To rebuild docker containers:
docker-compose build
To rebuild a single container:
docker-compose build biosamples-pipelines
To run a service with its dependencies:
docker-compose up biosamples-webapp-api
To run a containerized executable:
docker-compose run --service-ports biosamples-pipelines
To pass command-line arguments (note: replaces the default executable):
docker-compose run --service-ports biosamples-pipelines java -jar pipelines-4.0.0-SNAPSHOT.jar --debug
Monitoring and debugging info: http://www.jamasoftware.com/blog/monitoring-java-applications/
Combined Maven build and container launch:
./mvnw -T 2C package && docker-compose build && docker-compose run --service-ports biosamples-pipelines
If docker-compose
is slow, check for large volumes in the source directory. Use these to clean up:
Remove all Docker volumes:
docker volume ls -q | xargs -r docker volume rm
Remove all Docker images:
docker images -q | xargs -r docker rmi
Warning
|
The above removes everything Docker-related from your machine. |
Add the spring-boot starter module for BioSamples in your Maven project:
<dependencies> <dependency> <groupId>uk.ac.ebi.biosamples</groupId> <artifactId>biosamples-spring-boot-starter</artifactId> <version>5.3.7</version> </dependency> </dependencies>
maven { url 'https://gitlab.ebi.ac.uk/api/v4/projects/2669/packages/maven' }
Configure biosamples.client.uri
in your application.properties
to point to the correct BioSamples instance.
Originally, Spring Data REST was used for exposing the API but had issues:
-
Content type negotiation problems due to overlaps with Thymeleaf routes.
-
Cannot serve XML even with converters.
-
List ordering caused optional attributes to mix—better handled via
Map
of attributes.