This repo contains source code relevant to OurDNA Browser.
OurDNA Browser is CPG customised version of gnomadBrowser
It depends on:
Terraform folder contains infrastracture setup originally provided by Garvan Institute of Medical Research
makefile is work in progress file originally developed by Garvan Institute of Medical Research
Major challange is gnomAD code contains a lot of hardcoded strings, e.g. cluster name is always 'gnomad', the same name is used to create GCP buckets, but GCP buckets have to be unique across all the Google cloud. This repository is trying to address those things, using environment variables where possible.
-
python3 (minimum 3.8)
-
terraform
-
docker
-
make
-
kustomize (e.g snap install kustomize)
-
gcloud: gke-gcloud-auth-plugin, kubectl
-
service account on GCP with private key
-
Google bucket where terraform state is going to be stored (tf-remote-state)
-
Create .env file with all the env variables (look at example.env)
-
Create terraform.tfvars in terraform folder (look at terraform.tfvars.example)
-
Load the environmental variables:
source .env
- Initialise terraform, provide tf-remote-state bucket on GCP created prior
make tf-init
- Configure / set initial variables:
make config
- Preview the configuration values:
make config-ls
- Autheticate GCP service account:
make gcloud-auth
- Create Cluster (type 'yes' when prompted), this step might take a long time:
make tf-apply
- Configure Kubernetes:
make kube-config
- Prepare ES cluster master nodes:
make eck-create
make eck-apply
- Wait a bit for nodes to start, then check if running:
make eck-check
- Create ES server more details here
make elastic-create
- Create Redis server
make redis-create
- Wait a bit for ES disks to be created
- Forward ES port so we can talk to it
make forward-es-http
- Store ES password for later use
export ELASTICSEARCH_PASSWORD=$(make -s es-secret-get)
make es-secret-create
- Create 'browser/build.env' in gnomad-browser location and provide gnomAD (OurDNA Browser) API url
echo 'GNOMAD_API_URL="https://ourdna-dev.popgen.rocks/api"' > $GNOMAD_PROJECT_PATH/browser/build.env
- Build all components:
make docker
- Create new deployment:
make deploy-create
- Deploy:
make deploy-apply
- Preview all deployments:
make deployments-list
- Setup Ingress (TODO get static IP address working): This step requires policy 'deny-problematic-requests' Cloud Armor policy to be present before running the next step https://stackoverflow.com/questions/68944745/is-there-a-workaround-to-attach-a-cloud-armor-policy-to-a-load-balancer-created
make ingress-apply
- TODO fix this one - different for DEV and PRD
make ingress-describe
- Wait for up to 5 minutes for IP to be allocated
make ingress-get
kubectl get ingress
NAME CLASS HOSTS ADDRESS PORTS AGE
gnomad-ingress-demo-ourdna-browser-dev-green <none> * 34.36.115.66 80 3h55m
-
Setup your favourite python environment.
-
Install requirements:
pip install setuptools
pip install -r $GNOMAD_PROJECT_PATH/data-pipeline/requirements.txt
- Start dataproc cluster - this might take a while
make es-dataproc-start
- Add permissions to existing data-pipeline service account so it can access ES secrets
make es-secret-add
-
Have hail tables ready in $OUTPUT_BUCKET
-
Load dataset
make DATASET=clinvar_grch38_variants es-load
- Review the loaded indexes:
make es-show-indices
- Show how much space on ES cluster:
make es-show-space
- When done with loading shutdown ES loading dataproc cluster (to lower the cost), it will shutdown itself after hour on inactivity
make es-dataproc-stop
- Stop port frowarding:
ps -ef | grep port-forward
kill PID
- Delete all:
make ingress-delete
make deployments-local-clean
make deployments-cluster-delete
make es-secret-delete
- Finally destroy GCP cluster:
make tf-destroy
- Check for any VM disks, which might be still present, esp. created by ES-create terraform