Skip to content
This repository was archived by the owner on Nov 15, 2019. It is now read-only.

Commit 9e791ae

Browse files
committed
Merge branch 'release/0.1.0'
2 parents 775e202 + b2f9de7 commit 9e791ae

37 files changed

+25028
-0
lines changed

.flake8

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[flake8]
2+
max-line-length=120
3+
ignore: E301, E302, E401, E261, E265, E226, F401, E501

.gitignore

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
2+
# Reminder:
3+
# - A leading slash means the pattern is anchored at the root.
4+
# - No leading slash means the pattern matches at any depth.
5+
6+
# Python files
7+
*.pyc
8+
__pycache__/
9+
.mypy_cache/
10+
11+
# IntelliJ IDEA / PyCharm project files
12+
/.idea
13+
/*.iml
14+
15+
# MyPy
16+
.mypy_cache
17+
18+
# PyCharm JIRA plugin
19+
atlassian-ide-plugin.xml
20+
21+
# Virtualenv
22+
.venv
23+
24+
# Development build artifacts
25+
*.egg-info
26+
27+
# OS X metadata files
28+
.DS_Store
29+
30+
# Temporary folder
31+
/tmp
32+
33+
# Travis-ci (don't want to accidentally commit google key)
34+
client-secret.json

.travis.yml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
language: python
2+
python:
3+
- '3.6'
4+
before_install:
5+
- openssl aes-256-cbc -K $encrypted_e70c9f59db9f_key -iv $encrypted_e70c9f59db9f_iv
6+
-in client-secret.json.enc -out client-secret.json -d
7+
install:
8+
- make develop
9+
script:
10+
- make test
11+
deploy:
12+
provider: pypi
13+
on:
14+
tags: true
15+
user: jessebrennan
16+
password:
17+
secure: NTzkGJ6KUlyVxkyD5DjnnpwwT4mKCfaFzsSrLv9TWBlpk0YF0xBiOSLoK1yegLfWjPendPMfx+k54BOv9WZbZV95BFxWXdk0WpeZhfw2qoqzddPZtkWXXgU926kwM/DXb1X117iUzfG26oRoRfciccEiNgFq9ikEY0xDKJEyo3IquOqPpn6GYbTD6WcsDOoMbk24KXI1l/BGOsG93yfDCYg8iEIqGjY1SioUO5vAoggwY+rV/MAt0GpRM5zPh2XycbAjI1MBNwxIq5kc+Q0y2sOi5Cnj0EN+QpuLoUrpwOKEC7VJk0BaOzqDKvOrQYT6g6bFpT8u2Ry8ekggusbUQ7O3W2fnjoapWqPfbC3Q8+rqf8K1dsWeSv0j9zlTWNEtowaoPc5tenSiTntS9iHlP1Z+TlKvlo9bTif97PsZ0HNsjV2aReRlbUusSsQl6lU2XIs4TbOIesf5+/ju4LzacbLws8bvKpGdRJL1T5Qu6IVIk3Wk4Nv4EHMPJKovw0Yomrpa4ccmv2nQ5J3e7nU52DxkRPh6sZLQaKafuETYbcMN5EZI6RsmQ7cPMr3uaGzJHuRDEgIwTVcpC1tXAtTTLjEMMLs8TPU6rCTKdGi1MMe1+72sPjipNJWA0ZMMAZHkhKTmBV0FwfMOuDhR0ZBvW3OzbxZtIZdoMgoygFh3hSE=

MANIFEST.in

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
include README.md VERSION release.py
2+
include *.txt

Makefile

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
2+
include common.mk
3+
MODULES=loader transformer scripts tests datasets/topmed/topmed_107_open_access
4+
5+
all: test
6+
7+
lint:
8+
flake8 $(MODULES)
9+
10+
mypy:
11+
mypy --ignore-missing-imports $(MODULES)
12+
13+
check_readme:
14+
python setup.py check -s
15+
16+
tests:=$(wildcard tests/test_*.py)
17+
18+
# A pattern rule that runs a single test module, for example:
19+
# make tests/test_gen3_input_json.py
20+
21+
$(tests): %.py : mypy lint check_readme
22+
python -m unittest --verbose $*.py
23+
24+
test: $(tests)
25+
26+
develop:
27+
pip install -e .
28+
pip install -r requirements-dev.txt
29+
30+
undevelop:
31+
python setup.py develop --uninstall
32+
pip uninstall -y -r requirements-dev.txt
33+
34+
.PHONY: all lint mypy test

README.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,81 @@
11
# cgp-dss-data-loader
22
Simple data loader for CGP HCA Data Store
3+
4+
## Common Setup
5+
1. **(optional)** We recommend using a Python 3
6+
[virtual environment](https://docs.python.org/3/tutorial/venv.html).
7+
8+
1. Run:
9+
10+
`pip3 install cgp-dss-data-loader`
11+
12+
## Setup for Development
13+
1. Clone the repo:
14+
15+
`git clone https://github.com/DataBiosphere/cgp-dss-data-loader.git`
16+
17+
1. Go to the root directory of the cloned project:
18+
19+
`cd cgp-dss-data-loader`
20+
21+
1. Make sure you are on the branch `develop`.
22+
23+
1. Run (ideally in a new [virtual environment](https://docs.python.org/3/tutorial/venv.html)):
24+
25+
`make develop`
26+
27+
## Cloud Credentials Setup
28+
Because this program uses Amazon Web Services and Google Cloud Platform, you will need to set up credentials
29+
for both of these before you can run the program.
30+
31+
### AWS credentials
32+
1. If you haven't already you will need to make an IAM user and create a new access key. Instructions are
33+
[here](https://docs.aws.amazon.com/general/latest/gr/managing-aws-access-keys.html).
34+
35+
1. Next you will need to store your credentials so that Boto can access them. Instructions are
36+
[here](https://boto3.readthedocs.io/en/latest/guide/configuration.html).
37+
38+
### GCP credentials
39+
1. Follow the steps [here](https://cloud.google.com/docs/authentication/getting-started) to set up your Google
40+
Credentials.
41+
42+
## Running Tests
43+
Run:
44+
45+
`make test`
46+
47+
## Getting Data from Gen3 and Loading it
48+
49+
1. The first step is to extract the Gen3 data you want using the
50+
[sheepdog exporter](https://github.com/david4096/sheepdog-exporter). The TopMed public data extracted
51+
from sheepdog is available [on the release page](https://github.com/david4096/sheepdog-exporter/releases/tag/0.3.1)
52+
under Assets. Assuming you use this data, you will now have a file called `topmed-public.json`
53+
54+
1. Make sure you are running the virtual environment you set up in the **Setup** instructions.
55+
56+
1. Now we need to transform the data. We can transform to the outdated gen3 format, or to the new standard format.
57+
58+
- For the standard format, follow instructions at
59+
[newt-transformer](https://github.com/jessebrennan/newt-transformer#transforming-data-from-sheepdog-exporter).
60+
61+
- For the old Gen3 format, run this from the root of the project:
62+
63+
```
64+
python transformer/gen3_transformer.py /path/to/topmed_public.json --output-json transformed-topmed-public.json
65+
```
66+
67+
1. Now that we have our new transformed output we can run it with the loader.
68+
69+
If you used the standard transformer use the command:
70+
71+
```
72+
dssload --no-dry-run --dss-endpoint MY_DSS_ENDPOINT --staging-bucket NAME_OF_MY_S3_BUCKET standard --json-input-file transformed-topmed-public.json
73+
```
74+
75+
Otherwise for the outdated gen3 format run:
76+
77+
```
78+
dssload --no-dry-run --dss-endpoint MY_DSS_ENDPOINT --staging-bucket NAME_OF_MY_S3_BUCKET gen3 --json-input-file transformed-topmed-public.json
79+
```
80+
81+
1. You did it!

VERSION

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
0.1.0

client-secret.json.enc

2.33 KB
Binary file not shown.

common.mk

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
SHELL=/bin/bash
2+
3+
ifeq ($(findstring Python 3.6, $(shell python --version 2>&1)),)
4+
$(error Please run make commands from a Python 3.6 virtualenv)
5+
endif

datasets/topmed/topmed_107_open_access/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)