You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -8,52 +10,59 @@ ggd-cli: "The command line interface for gogetdata"
8
10
9
11
The command-line interface to Go Get Data (GGD).
10
12
11
-
Search, and install genomic and omic data packages. Build and check new ggd data packages.
13
+
Search, and install genomic data packages. Build and check new ggd data packages.
12
14
13
-
ggd provides easy access to processed omic data. It removes the difficulties and complexities with finding and processing the data sets and annotations germane to your experiments and/or analyses. You can quickly and easily search and install data package using ggd. ggd also offers tools to easily create and contribute data packages to ggd. (From more information see the [ggd docs](https://gogetdata.github.io/index.html#).
15
+
ggd provides easy access to processed genomic data. It removes the difficulties and complexities with finding and processing the data sets and annotations germane to your experiments and/or analyses. You can quickly and easily search and install data package using ggd. ggd also offers tools to easily create and contribute data packages to ggd. (From more information see the [ggd docs](https://gogetdata.github.io/index.html#).
14
16
15
17
**The documentation for ggd is available**[here](https://gogetdata.github.io/index.html#) and contains detailed information about the ggd system, including installing ggd, using ggd, available data packages, etc. The information below provides a quick overview of using ggd, but we encourage you to visit the [ggd docs](https://gogetdata.github.io/index.html#) for detailed information and questions you may have.
16
18
17
-
You can also vist the [ggd docs: quick-start](https://gogetdata.github.io/quick-start.html) page to start using ggd quickly.
19
+
You can also visit the [ggd docs: quick-start](https://gogetdata.github.io/quick-start.html) page to start using ggd quickly.
20
+
21
+
You can request a new data recipe be added to GGD by filling out the [GGD Recipe Request](https://forms.gle/3WEWgGGeh7ohAjcJA) Form.
18
22
19
23
## Setting up ggd
20
24
21
-
Assuming that you have already installed an *ananconda*distrubtion on your system, you can run the following commands to set up ggd.
25
+
Assuming that you have already installed an *ananconda*distribution on your system, you can run the following commands to set up ggd.
22
26
23
-
(NOTE: If you have not installed an anaconda distribution on your system please install it. We suggest using [miniconda](https://conda.io/en/latest/miniconda.html))
27
+
> **_NOTE:_** If you have not installed an anaconda distribution on your system please install it. We suggest using [miniconda](https://conda.io/en/latest/miniconda.html)
24
28
25
-
1) Adding the ggd-genomics conda channel:
26
-
- ggd data packages are hosted on the conda cloud through the ggd-genomics channel. You will need to add this channel to your configured conda channels. You will also need to add the channels that have the software dependencies for building these data pacakges. Run the following commands:
29
+
1) Adding the required conda channels including ggd specific channels:
30
+
31
+
- ggd data packages are hosted on the Anaconda cloud through the ggd-genomics channel. You will need to add this channel to your configured conda channels. You will also need to add the channels that have the software dependencies for building these data packages. Run the following commands:
27
32
28
33
```
34
+
$ conda config --add channels defaults
29
35
$ conda config --add channels ggd-genomics
30
36
$ conda config --add channels bioconda
31
37
$ conda config --add channels conda-forge
32
38
```
33
39
34
-
2) Installing the required software packages:
35
-
- The ggd tool requires certain software packages to be installed on your system. To install these software packages run the following command:
40
+
2a) Installing ggd:
41
+
42
+
- The ggd cli can be installed by conda, and this is the recommended way to do it.
- The ggd command line tool can be installed using the following command:
50
+
- ggd can also be installed through github. Conda is required and it is still recommended that you install with conda. Below is an additional option you can use to install ggd.
ggd is now set up on your system and you should be able to run `ggd`. Test that ggd has been installed by running:
58
+
Now that ggd is installed on your system you should be able to run `ggd`. Test that ggd has been installed by running:
50
59
51
60
```
52
61
$ ggd -h
53
62
```
54
63
55
64
56
-
## ggd tools
65
+
## ggd commands
57
66
58
67
### ggd search
59
68
@@ -64,7 +73,7 @@ If you need the GRCh38 reference genome you can use ggd to search and install it
64
73
the desired data package:
65
74
66
75
```
67
-
$ ggd search -t reference genome
76
+
$ ggd search reference genome
68
77
```
69
78
70
79
You can further filter the results using additional options with `ggd search`. Run `ggd search -h` to see all options.
@@ -76,17 +85,17 @@ For more information about ggd's search tool see: [ggd docs: ggd search](https:/
76
85
77
86
You can install any ggd data package using the `ggd install` tool.
78
87
79
-
If you need the GRCh38 reference genome, and you have used `ggd search -t reference genome` to identify which reference-genome data package you want to install, you can use ggd to install that data package.
88
+
If you need the GRCh38 reference genome, and you have used `ggd search reference genome` to identify which reference-genome data package you want to install, you can use ggd to install that data package.
80
89
81
90
```
82
-
$ ggd install grch38-reference-genome
91
+
$ ggd install grch38-reference-genome-ensembl-v1
83
92
```
84
93
85
94
The output from this command will provide the locations of where the files were installed, as well as an environment variable that you can use to quickly access the files.
86
95
87
-
**NOTE: If you want to move the files PLEASE make a copy and move the copy. Moving the original files from the location ggd installed them will remove ggd's ability to manage those data files.**
96
+
> **_NOTE:_** If you want to move the files PLEASE make a copy and move the copy. Moving the original files from the location ggd installed them will remove ggd's ability to manage those data files
88
97
89
-
For more information about ggd's install tool see: [ggd docs: ggd install](https://gogetdata.github.io/ggd-install.html)
98
+
For more information about ggd's install tool see: [ggd docs: ggd install](https://gogetdata.github.io/install.html)
90
99
91
100
92
101
### ggd uninstall
@@ -95,40 +104,59 @@ You can uninstall any ggd data package that has previously been installed by ggd
95
104
96
105
The ggd uninstall tool provides file and system-wide handling for ggd package. Problems may occur if you do not use the `ggd uninstall` tool to uninstall and remove the un-needed data packages.
97
106
98
-
If you no longer needed or wanted the GRCh38 reference genome installed from above you can use ggd to remove it from your system.
107
+
If you no longer need or want the GRCh38 reference genome installed from above you can use ggd to remove it from your system.
99
108
100
109
```
101
-
ggd uninstall grch38-reference-genome
110
+
ggd uninstall grch38-reference-genome-ensembl-v1
102
111
103
112
```
104
113
105
114
For more information about ggd's uninstall tool see: [ggd docs: ggd uninstall](https://gogetdata.github.io/uninstall.html)
106
115
107
116
108
-
### Aditional ggd tools for file management
117
+
### Additional ggd tools for file management
109
118
110
119
ggd has additional tools available to find, access, and use the data install by ggd.
111
120
112
121
These tools include:
113
122
114
-
`ggd list-files`
115
-
- Shows files that have been installed locally from a ggd recipe.
123
+
`ggd list`
124
+
125
+
- get a list of installed data files
126
+
127
+
`ggd get-files`
128
+
129
+
- get files that have been installed locally from a ggd recipe.
116
130
117
131
`ggd pkg-info`
132
+
118
133
- Show the information for a specific data package installed by ggd.
119
134
120
135
`ggd show-env`
136
+
121
137
- Shows the status of variables available in the conda environment. This is important as the installation of a new ggd package will create a new environment variable to access data installed with the package, but will not always activate that variable.
122
138
- The environment variables store the location of the installation directory for the package. When activated, these variables can be used to simplify data access.
123
139
124
140
You can get more information about each of these tools on the ggd docs pages.
GGD utilizes conda environments. To facilitate the use of different conda environments, some ggd commands use a `--prefix` argument. This `--prefix` argument
148
+
can be used to install, list, and even access data files in a different conda environment then the one you are actively working in.
149
+
150
+
The prefix capability of ggd allows user to install all data from ggd into a specific conda environment, and access that data all without having to be in
151
+
that conda environment. This helps to reduce duplicate data installs on your system, as well as provide a means to access data in any environment you are using
152
+
as long as ggd is installed in that environment.
153
+
154
+
129
155
## Contributing to ggd
130
156
131
-
We intend ggd to become a widely used omics data management system. If this effort we encourage and invite everyone to contribute to the ggd recipe repository. ggd provides multiple tools to create and check data recipes that can be added to ggd. If you have data you would like to be hosted on ggd, whether your own or from somewhere else, please either use ggd to make the recipe or request it in the Issue section.
157
+
We intend ggd to become a widely used genomics data management system. In this effort we encourage and invite everyone to contribute to the ggd recipe repository.
158
+
ggd provides multiple tools to create and check data recipes that can be added to ggd. If you have data you would like to be hosted on ggd, whether your own or
159
+
from somewhere else, please either use ggd to make the recipe or request a new data recipe be added.
132
160
133
161
For more information about contributing data recipes/packages to ggd please see [ggd docs: contribute](https://gogetdata.github.io/contribute.html)
134
162
@@ -138,17 +166,19 @@ Two scripts are available to assist you in making and checking recipes
138
166
139
167
Make a recipe from a bash script that is likely to pass the tests in ggd-recipes.
140
168
141
-
Most of the arguments are required. For example, we don't want a recipe to litter
169
+
Most of the arguments in `ggd make-recipe` are required. Any recipe created should be able to clean up
170
+
after it has finished processing the data fiels. For example, we don't want a recipe to litter
142
171
the user-space with extra files so if the recipe downloads a `.zip`, and processes
143
172
the files inside of it, it should clean-up (`rm`) the .zip file upon completion.
144
173
145
174
146
175
You can run `ggd make-recipe -h` to get all the parameters needed to make a recipe. You can also see them
147
176
at [ggd docs: ggd make-recipe](https://gogetdata.github.io/make-recipe.html).
148
177
149
-
To make a recipe you need to start with a bash script that downloads and processes the desired omic data. For example, if you wanted to make a recipe for the GRCh37 reference genome hosted by 1000 Genomes your bash script would look something like this:
178
+
To make a recipe you need to start with a bash script that downloads and processes the desired data. For example, if you wanted to
179
+
make a recipe for the GRCh37 reference genome hosted by 1000 Genomes your bash script would look something like this:
@@ -166,30 +196,35 @@ With this bash script, you can now create a ggd recipe using `ggd make-recipe`.
166
196
ggd make-recipe \
167
197
-s Homo_sapiens \
168
198
-g GRCh37 \
169
-
--author me \
170
-
--ggd_version 1 \
171
-
--data_version phase2_reference \
199
+
--author <your name> \
200
+
--package-version 1 \
201
+
--data-version phase2_reference \
202
+
--data-provider 1000G \
203
+
-cb "NA" \
172
204
--summary 'GRCh37 reference genome from 1000 genomes' \
173
-
-k ref -k reference \
174
-
reference-genome \
205
+
-k ref \
206
+
-k reference \
207
+
--name reference-genome \
175
208
recipe.sh
176
209
```
177
210
178
-
Running `ggd make-recipe` will create a new "directory" *recipe* with multiple processing files. For the GRCh37 reference genome recipe made above the directory/recipe will be called "grch37-reference-genome"
211
+
Running `ggd make-recipe` will create a new "directory" *recipe* with multiple processing files. For the GRCh37 reference genome recipe made above
212
+
the directory/recipe will be called "grch37-reference-genome-1000g-v1"
179
213
180
214
For more information about ggd's make-recipe tool see: [ggd docs: ggd make-recipe](https://gogetdata.github.io/make-recipe.html).
181
215
182
216
## ggd check-recipe
183
217
184
-
Use `ggd check-recipe` after you have created a new recipe with `ggd make-recipe`. Running `ggd check-recipe` will run the same checks as our testing framework. It will build and install the recipe.
218
+
Use `ggd check-recipe` after you have created a new recipe with `ggd make-recipe`. Running `ggd check-recipe` will run the same
219
+
checks as our testing framework. It will build and install the recipe.
185
220
186
221
It may miss dependencies if you have them installed on your system, but they are not specified in
187
222
the recipe. This will cause the recipe to fail when tested in our testing framework.
188
223
189
224
To check the grch37-reference-genome recipe created above run:
0 commit comments