Sequence Retrieval Service

About This Service

This service is configured with a directory of reference sequences. It then receives requests containing coordinates of sequences, and replies with a .fasta.

Running Locally

The best (easiest) way to run locally is with docker compose. If you have a relatively recent version of Docker installed, this should work "out of the box" with a short wrapper script by using the following commands:

> cd docker-compose
> cp .env.sample .env
> ./runLocal.sh

This configuration will download and run the latest (tagged) image on docker hub and use the sample .fa and .fai files included in the repo.

Once up, you should be able to access the service locally on port 8080, e.g. GET http://localhost:8080/health

To point the service at your own files, change the FASTA_FILES_DIR environment variable in .env.

Example Queries

A number of queries, valid against the sample data, are provided and codified as curl commands wrapped in bash scripts. So view them and run one, execute (in a new terminal) e.g.

> ls -1 src/test/query
> bash src/test/query/query-genomic-bed.sh

Building Locally

To make and test changes to the service code, first try to build and run the service as-is. To do so, perform the following steps:

Add your Github credentials into your environment.
Build the code (note Java 17+ is required):
```
> ./gradlew clean generate-jaxrs jar test
```
If this succeeds, build a local docker image
```
> ./gradlew build-docker
```
Uncomment the line SEQUENCE_RETRIEVAL_IMAGE=sequence-retrieval in your .env file
Run the service as above, i.e. cd docker-compose && ./runLocal.sh

Architecture Overview

Reference Files

For each reference, we provide a file with the sequences, as well as an index file describing where each sequence is in the file.

To index a fasta with the sequences called x.fa:

samtools faidx x.fa
scripts/index_to_sqlite3.sh x.fa.fai x.fa.fai.sqlite

We use SQLite because we may have so many sequences that the index is too large to keep in memory.

Query Format

The main query format is .bed. See the BED Wikipedia page for an overview.

Name column of .bed files should describe the sequences as they make sense to the user - for example, feature description. You can ask for exon sequence of a gene by using all 12 columns.

Dependencies

This service requires Java 17+ and Docker for local development.

The main library used for sequence lookups is htsjdk, which does a lot of work on the references and .bed features. It’s just a plain Java dependency; a few more methods have been copied and adapted.

Name		Name	Last commit message	Last commit date
Latest commit History 372 Commits
docker-compose		docker-compose
docs		docs
gradle/wrapper		gradle/wrapper
schema		schema
scripts		scripts
src		src
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
api.raml		api.raml
build.gradle.kts		build.gradle.kts
gradlew		gradlew
gradlew.bat		gradlew.bat
readme.adoc		readme.adoc
settings.gradle.kts		settings.gradle.kts
startup.sh		startup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sequence Retrieval Service

About This Service

Running Locally

Example Queries

Building Locally

Architecture Overview

Reference Files

Query Format

Dependencies

About

Uh oh!

Releases

Packages

Contributors 6

Uh oh!

Languages

License

VEuPathDB/service-sequence-retrieval

Folders and files

Latest commit

History

Repository files navigation

Sequence Retrieval Service

About This Service

Running Locally

Example Queries

Building Locally

Architecture Overview

Reference Files

Query Format

Dependencies

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Uh oh!

Languages

Packages