This service is configured with a directory of reference sequences. It then receives requests containing coordinates of sequences, and replies with a .fasta.
The best (easiest) way to run locally is with docker compose. If you have a relatively recent version of Docker installed, this should work "out of the box" with a short wrapper script by using the following commands:
> cd docker-compose
> cp .env.sample .env
> ./runLocal.shThis configuration will download and run the latest (tagged) image on docker hub and use the sample .fa and .fai files included in the repo.
Once up, you should be able to access the service locally on port 8080, e.g. GET http://localhost:8080/health
To point the service at your own files, change the FASTA_FILES_DIR environment variable in .env.
A number of queries, valid against the sample data, are provided and codified as curl commands wrapped in bash scripts. So view them and run one, execute (in a new terminal) e.g.
> ls -1 src/test/query
> bash src/test/query/query-genomic-bed.shTo make and test changes to the service code, first try to build and run the service as-is. To do so, perform the following steps:
-
Build the code (note Java 17+ is required):
> ./gradlew clean generate-jaxrs jar test -
If this succeeds, build a local docker image
> ./gradlew build-docker -
Uncomment the line
SEQUENCE_RETRIEVAL_IMAGE=sequence-retrievalin your.envfile -
Run the service as above, i.e.
cd docker-compose && ./runLocal.sh
For each reference, we provide a file with the sequences, as well as an index file describing where each sequence is in the file.
To index a fasta with the sequences called x.fa:
samtools faidx x.fa
scripts/index_to_sqlite3.sh x.fa.fai x.fa.fai.sqliteWe use SQLite because we may have so many sequences that the index is too large to keep in memory.
The main query format is .bed. See the BED Wikipedia page for an overview.
Name column of .bed files should describe the sequences as they make sense to the user - for example, feature description. You can ask for exon sequence of a gene by using all 12 columns.
This service requires Java 17+ and Docker for local development.
The main library used for sequence lookups is htsjdk, which does a lot of work on the references and .bed features. It’s just a plain Java dependency; a few more methods have been copied and adapted.