This handles the uploaded summary statistics files, validates them, reports errors to the deposition app and puts valid files in the queue for sumstats file harmonisation and HDF5 loading.
- There is a Flask app handling
POSTandGETrequests via the endpoints below. Celery worker(s) perform the validation tasks in the background. They can work from anywhere the app is installed and can see the RabbitMQ queue.
- Python3.9
- RabbitMQ
- libmagic (e.g.
brew install libmagic) - mongodb and start the mongodb service
- nextflow
- Clone the repository
git clone https://github.com/EBISPOT/gwas-sumstats-service.gitcd gwas-sumstats-service- Set up environment
virtualenv --python=python3.6 .envsource activate .env/bin/activate
- Install
pip install .pip install -r requirements.txt
- Make sure that the installation is complete.
- Start locally or
docker-compose up. - To setup up a RabbitMQ server, run the tests, and tear it all down:
rm -rf .tox tox
- Spin up a RabbitMQ server on the port (
BROKER_PORT) specified in the config e.g.rabbitmq-server
- Start the flask app with gunicorn http://localhost:8000
- from
gwas-sumstats-service: gunicorn -b 0.0.0.0:8000 sumstats_service.app:app --log-level=debug
- from
- Start a celery worker for the database side
- from
gwas-sumstats-service: celery -A sumstats_service.app.celery worker --loglevel=debug --queues=postval
- from
- Start a celery worker for the validation side
- from
gwas-sumstats-service: celery -A sumstats_service.app.celery worker --loglevel=debug --queues=preval
- from
This section guides you through using Docker-compose to set up and run the gwas-sumstats-service with all necessary services, including Flask, RabbitMQ, Celery, and MongoDB.
- Ensure Docker and Docker-compose are installed on your system.
- Clone the repository:
git clone [repository-url]
-
Replace the local Dockerfile and docker-compose file with
Dockerfileanddocker-compose.yaml, respectively. -
Build the Docker Containers
Navigate to the cloned directory and build the Docker containers:
docker-compose build
-
Start the Docker Containers
Spin up the Flask, RabbitMQ, Celery, and MongoDB containers:
docker-compose up
- Use the
CONTAINERISEenvironment variable to adapt the application's behavior accordingly if you require Singularity. - To debug locally using Docker, update the Dockerfile and local executor configurations in the config file as follows.
... NEXTFLOW_CONFIG = ( # "executor.name = 'slurm'\n" # "process.executor = 'slurm'\n" "executor.name = 'local'\n" ...
- First, deploy rabbitmq using helm
helm install --name rabbitmq --namespace rabbitmq --set rabbitmq.username=<user>,service.type=NodePort,service.nodePort=<port> stable/rabbitmq
- create kubernetes secrets for the ssh keys and Globus
kubectl --kubeconfig=<path to config> -n <namespace> create secret generic ssh-keys --from-file=id_rsa=<path/to/id_rsa> --from-file=id_rsa.pub=/path/to/id_rsa.pub> --from-file=known_hosts=/path/to/known_hostskubectl --kubeconfig=<path to config> -n gwas create secret generic globus --from-file=refresh-tokens.json=<path/to/refresh-tokens.json>
- deploy the sumstats service
helm install --name gwas-sumstats k8chart/ --wait
- Start a celery worker from docker
docker run -it -d --name sumstats -v /path/to/data/:$INSTALL_PATH/sumstats_service/data -e CELERY_USER=<user> -e CELERY_PASSWORD=<pwd> -e QUEUE_HOST=<host ip> -e QUEUE_PORT=<port> gwas-sumstats-service:latest /bin/bashdocker exec sumstats celery -A sumstats_service.app.celery worker --loglevel=debug --queues=preval
This section provides instructions on how to test the gwas-sumstats-service using Postman. The Postman collection for this service includes requests for submitting summary statistics and retrieving their validation status. Please find the collection here.
- Ensure you have Postman installed.
- Import the Postman collection
gwas-sumstats-service(ID: e03dcb59-01cb-411b-a8d0-b216e2860c9f) into your Postman application.
-
Submit Summary Statistics
- Use the
POST {{protocol}}://{{host}}:{{port}}/v1/sum-statsrequest to submit summary statistics. - Update the
idfield in the request body with a unique identifier. Example body for a valid file submission:{ "requestEntries": [ { "id": "{{callbackId}}", "filePath": "test_sumstats_file.tsv", "md5": "9b5f307016408b70cde2c9342648aa9b", "assembly": "GRCh38", "readme": "optional text", "entryUUID": "ABC1234", "minrows": "2" } ] } - For an invalid file submission, modify the
filePathand other relevant fields accordingly. - Note the returned
callbackIDfrom the response for the next step.
- Use the
-
Retrieve Validation Status
- Use the
GET {{protocol}}://{{host}}:{{port}}/v1/sum-stats/<callbackID>request to retrieve the status of your submission. - Replace
<callbackID>with the ID obtained from the previous POST request. - The response will indicate the validation status of the submission.
- Use the
- In case of an invalid submission, access the Docker container's shell as root to inspect the validation logs and output files:
root@container-id:/sumstats_service# ls depo_ss_validated/<callbackID>/
- Check the
nextflow.logfor detailed execution logs:root@container-id:/sumstats_service# cat depo_ss_validated/<callbackID>/logs/nextflow.log
- Check the
- The collection includes two primary requests:
POST sum-statsfor submission andGET sum-statsfor status retrieval. - Variables such as
{{protocol}},{{host}}, and{{port}}are pre-defined in the collection for ease of use. - Each request includes appropriate headers and request bodies as per the API specifications.
curl -i -H "Content-Type: application/json" -X POST -d '{"requestEntries":[{"id":"abc123","filePath":"https://raw.githubusercontent.com/EBISPOT/gwas-sumstats-service/master/tests/test_sumstats_file.tsv","md5":"a1195761f082f8cbc2f5a560743077cc","assembly":"GRCh38", "readme":"optional text", "entryUUID": "globusdir"},{"id":"bcd234","filePath":"https://raw.githubusercontent.com/EBISPOT/gwas-sumstats-service/master/tests/test_sumstats_file.tsv","md5":"a1195761f082f8cbc","assembly":"GRCh38", "entryUUID": "globusdir"}]}' http://localhost:8000/v1/sum-stats
HTTP/1.0 201 CREATED
Content-Type: application/json
Content-Length: 26
Server: Werkzeug/0.15.4 Python/3.6.5
Date: Wed, 17 Jul 2019 15:15:23 GMT
{"callbackID": "TiQS2yxV"}
curl http://localhost:8000/v1/sum-stats/TiQS2yxV
{
"callbackID": "TiQS2yxV",
"completed": false,
"statusList": [
{
"id": "abc123",
"status": "VALID",
"error": null
},
{
"id": "bcd234",
"status": "INVALID",
"error": "md5sum did not match the one provided"
}
]
}
Follow these steps to set up FormatLint:
Create a new virtual environment for the project to manage dependencies separately from your global Python setup:
python -m venv formatlintActivate the virtual environment:
source formatlint/bin/activateInstall the required Python packages:
pip install -r requirements.dev.txtExecute the formatting and linting script:
./format-lint