Skip to content
This repository was archived by the owner on Apr 5, 2021. It is now read-only.

Commit ed6c920

Browse files
committed
Cleaning up some documentation.
1 parent 017782b commit ed6c920

14 files changed

+62
-293
lines changed

API.md

Lines changed: 27 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -21,21 +21,19 @@ Each query is expressed as a URL, containing:
2121
* The **API Version String**. Currently the only supported version string is: `v1`
2222
* The **Endpoint** representing a particular dataset, e.g. `schools`. Endpoint
2323
names are usually plural.
24-
* The **Format** for the result data. The default output format is JSON ([JavaScript Object Notation](http://json.org/)); CSV is
25-
also available.
24+
2625
* The **Query String** containing a set of named key-value pairs that
2726
represent the query, which incude
2827
* **Field Parameters**, specifying a value (or set of values) to match
2928
against a particular field, and
30-
* **Option Parameters**, which affect the filtering and output of the
31-
entire query. Option Parameter names are prefixed with an underscore (`_`).
29+
3230

3331
### Query Example
3432

3533
Here's an example query URL:
3634

3735
```
38-
https://api.data.gov/ed/collegescorecard/v1/schools.json?school.degrees_awarded.predominant=2,3&_fields=id,school.name,2013.student.size
36+
https://api.data.gov/ed/collegescorecard/v1/schools.json?school.degrees_awarded.predominant=2,3&fields=id,school.name,2013.student.size
3937
```
4038

4139
In this query URL:
@@ -134,7 +132,7 @@ When failing to execute a query, Open Data Maker will attempt to return a JSON e
134132

135133
## Field Parameters
136134

137-
Parameter names _without_ an underscore prefix are assumed to be field names in the dataset. Supplying a value to a field parameter acts as a query filter, and only returns records where the given field exactly matches the given value.
135+
Parameter names are assumed to be field names in the dataset. Supplying a value to a field parameter acts as a query filter, and only returns records where the given field exactly matches the given value.
138136

139137
For example: Use the parameter `school.region_id=6` to only fetch records with a `school.region_id` value of `6`.
140138

@@ -176,7 +174,6 @@ For example: `2013.student.size__range=100..500` matches on schools which had be
176174

177175
Open-ended ranges can be performed by omitting one side of the range. For example: `2013.student.size__range=1000..` matches on schools which had over 1000 students.
178176

179-
You can even supply a list of ranges, separated by commas. For example, For example: `2013.student.size__range=..100,1000..2000,5000..` matches on schools which had under 100 students, between 1000 and 2000 students, or over 5000 students.
180177

181178
#### Additional Notes on Ranges
182179

@@ -186,39 +183,48 @@ You can even supply a list of ranges, separated by commas. For example, For exam
186183

187184
## Option Parameters
188185

189-
You can perform extra refinement and organisation of search results using **option parameters**. These special parameters have names beginning with an underscore character (`_`).
186+
You can perform extra refinement and organisation of search results using **option parameters**. These special parameters are listed below.
190187

191-
### Limiting Returned Fields with `_fields`
188+
### Limiting Returned Fields with `fields`
192189

193-
By default, records returned in the query response include all their stored fields. However, you can limit the fields returned with the `_fields` option parameter. This parameter takes a comma-separated list of field names. For example: `_fields=id,school.name,school.state` will return result records that only contain those three fields.
190+
By default, records returned in the query response include all their stored fields. However, you can limit the fields returned with the `fields` option parameter. This parameter takes a comma-separated list of field names. For example: `fields=id,school.name,school.state` will return result records that only contain those three fields.
194191

195192
Requesting specific fields in the response will significantly improve performance and reduce JSON traffic, and is recommended.
196193

197-
### Pagination with `_page` and `_per_page`
194+
### Pagination with `page` and `per_page`
198195

199-
By default, results are returned in pages of 20 records at a time. To retrieve pages after the first, set the `_page` option parameter to the number of the page you wish to retrieve. Page numbers start at zero; so, to return records 21 through 40, use `_page=1`. Remember that the total number of records available for a given query is given in the `total` field of the top-level `metadata` object.
196+
By default, results are returned in pages of 20 records at a time. To retrieve pages after the first, set the `page` option parameter to the number of the page you wish to retrieve. Page numbers start at zero; so, to return records 21 through 40, use `page=1`. Remember that the total number of records available for a given query is given in the `total` field of the top-level `metadata` object.
200197

201-
You can also change the number of records returned per page using the `_per_page` option parameter, up to a maximum of 100 records. Bear in mind, however, that large result pages will increase the amount of JSON returned and reduce the performance of the API.
198+
You can also change the number of records returned per page using the `per_page` option parameter, up to a maximum of 100 records. Bear in mind, however, that large result pages will increase the amount of JSON returned and reduce the performance of the API.
202199

203-
### Sorting with `_sort`
200+
### Sorting with `sort`
204201

205-
To sort results by a given field, use the `_sort` option parameter. For example, `_sort=2015.student.size` will return records sorted by 2015 student size, in ascending order.
202+
To sort results by a given field, use the `sort` option parameter. For example, `sort=2015.student.size` will return records sorted by 2015 student size, in ascending order.
206203

207-
By default, using the `_sort_` option returns records sorted into ascending order, but you can specify ascending or descending order by appending `:asc` or `:desc` to the field name. For example: `_sort=2015.student.size:desc`
204+
By default, using the `sort` option returns records sorted into ascending order, but you can specify ascending or descending order by appending `:asc` or `:desc` to the field name. For example: `sort=2015.student.size:desc`
208205

209-
**Note:** Sorting is only availble on fields with the data type `integer`, `float`, `autocomplete` or `name`.
206+
**Note:** Sorting is only available on fields with the data type `integer`, `float`, `autocomplete` or `name`.
210207

211208
**Note:** Make sure the sort parameter is a field in the data set. For more information, please take a look at [data dictionary](https://collegescorecard.ed.gov/assets/CollegeScorecardDataDictionary.xlsx)
212209

213-
### Geographic Filtering with `_zip` and `_distance`
210+
### Geographic Filtering with `zip` and `distance`
214211

215212
When the dataset includes a `location` at the root level (`location.lat` and
216-
`location.lon`) then the documents will be indexed geographically. You can use the `_zip` and `_distance` options to narrow query results down to those within a geographic area. For example, `_zip=12345&_distance=10mi` will return only those results within 10 miles of the center of the given zip code.
213+
`location.lon`) then the documents will be indexed geographically. You can use the `zip` and `distance` options to narrow query results down to those within a geographic area. For example, `zip=12345&distance=10mi` will return only those results within 10 miles of the center of the given zip code.
217214

218-
Additionally, you can request `location.lat` and `location.lon` in a search that includes a `_fields` filter and it will return the record(s) with respective lat and/or lon coordinates.
215+
Additionally, you can request `location.lat` and `location.lon` in a search that includes a `fields` filter and it will return the record(s) with respective lat and/or lon coordinates.
219216

220217
#### Additional Notes on Geographic Filtering
221218

222-
* By default, any number passed in the `_distance` parameter is treated as a number of miles, but you can specify miles or kilometers by appending `mi` or `km` respectively.
219+
* By default, any number passed in the `distance` parameter is treated as a number of miles, but you can specify miles or kilometers by appending `mi` or `km` respectively.
223220
* Distances are calculated from the center of the given zip code, not the boundary.
224221
* Only U.S. zip codes are supported.
222+
223+
224+
# New for Version 1.7
225+
226+
With the inclusion of the Department of Education's Field of Study data, there are a number of new improvements that have been incorporated into Open Data Maker.
227+
228+
* The field of study data is included as an array of objects nested under a specified key. These objects may be queried just like any other data. However, there is an additional parameters to add to your API call to manage what is returned. By default, if specifying a search parameter, only objects of the array that match that parameter will be returned. You can pass `&all_programs_nested=true` to return all the items in the array instead of just those that match.
229+
* When specifying specific fields to be returned from the API, the default response is to have a dotted string of the path to the field returned. As of verison 1.7, you can pass the parameter `keys_nested=true` get back a true json object instead of the dotted string.
230+
* Lastly, wildcard fields are now possible with version 1.7. If you want to get back data for just the latest available data, it is now possible to specify a query such as `fields=id,school,latest` which will return the ID field, the School object and the Latest object and all the nested objects contained within each.

CONTRIBUTING.md

Lines changed: 5 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
## Contributing
22

33
We aspire to create a welcoming environment for collaboration on this project.
4-
To that end, we follow the [18F Code of Conduct](https://github.com/18F/code-of-conduct/blob/master/code-of-conduct.md) and ask that all contributors do the same.
4+
55

66
### Public domain
77

@@ -15,11 +15,7 @@ with this waiver of copyright interest.
1515

1616
## Communication
1717

18-
There are a few ways to communicate with other folks working on this project:
19-
20-
* For general questions, discussion and announcements, please join [Google Group]
21-
* For noisy, informal chatter, you can join us on the [open-data-maker-pub Slack Channel](https://chat.18f.gov). Notifications from github are posted here.
22-
* For bug reports, please [file an issue](https://github.com/18F/open-data-maker/issues).
18+
For bug reports, please [file an issue](https://github.com/18F/open-data-maker/issues).
2319

2420
## About the Tech
2521

@@ -46,7 +42,7 @@ This project follows the [git flow](http://nvie.com/posts/a-successful-git-branc
4642
for review by our design and product folks, then to master.
4743

4844
This project is in alpha, so things are fast moving! We hope you consider it
49-
a fun time to get involved. In the near term, we have a very specific focus for this app, but we expect it will be generally useful for other projects as well. If you are thinking about deploying this app at your agency or organization, please let us know by introducing yourself in the [Google Group] and telling us a bit about your project or idea.
45+
a fun time to get involved. In the near term, we have a very specific focus for this app, but we expect it will be generally useful for other projects as well.
5046

5147
### Testing
5248

@@ -98,7 +94,7 @@ chances of your issue being dealt with quickly:
9894
### Submitting a Pull Request
9995
Before you submit your pull request consider the following guidelines:
10096

101-
* Search [GitHub](https://github.com/18F/open-data-maker/pulls) for an open or closed Pull Request that relates to your submission. You don't want to duplicate effort.
97+
* Search [GitHub](https://github.com/RTICWDT/open-data-maker/pulls) for an open or closed Pull Request that relates to your submission. You don't want to duplicate effort.
10298
* Make your changes in a new git branch
10399

104100
```shell
@@ -137,37 +133,7 @@ That's it! Thank you for your contribution!
137133
138134
#### After your pull request is merged
139135
140-
After your pull request is merged, you can safely delete your branch and pull the changes from the main (upstream) repository:
141-
142-
* Check out the dev branch:
143-
144-
```shell
145-
git checkout dev -f
146-
```
147-
148-
* Delete the local branch:
149-
150-
```shell
151-
git branch -D dev-my-fix
152-
```
153-
154-
* Update with the latest upstream version:
155-
156-
```shell
157-
git pull --ff upstream dev
158-
```
159-
Note: this assumes that you have already added the `upstream` remote repository, using this command:
160-
161-
```shell
162-
git remote add upstream https://github.com/18F/open-data-maker.git
163-
```
164-
165-
166-
* For folks with write access to the repo: delete the remote branch on GitHub either through the GitHub web UI or your local shell as follows:
167-
168-
```shell
169-
git push origin --dev-my-fix
170-
```
136+
After your pull request is merged, you can safely delete your branch and pull the changes from the main (upstream) repository
171137
172138
### Reviewing Pull Requests
173139
@@ -183,5 +149,3 @@ someone has looked at it. For larger commits, we like to have a +1 from someone
183149
else on the core team and/or from other contributor(s). Please note if you
184150
reviewed the code or tested locally -- a +1 by itself will typically be
185151
interpreted as your thinking its a good idea, but not having reviewed in detail.
186-
187-
[Google Group]: https://groups.google.com/d/forum/open-data-maker

DICTIONARY.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,22 @@
1+
# Data
2+
3+
Details about the data are specified by DATA_PATH/data.yaml.
4+
Where DATA_PATH is an environment variable, which may be:
5+
6+
* `s3://username:password@bucket_name/path`
7+
* `s3://bucket_name/path`
8+
* `s3://bucket_name`
9+
* a local path like: `./data`
10+
11+
12+
This file is loaded the first time it is needed and then stored in memory. The contents of `data.yaml` are stored as JSON in Elasticsearch in a single document of type `config` with id `1`.
13+
14+
The version field of this document is checked at startup. If the new config has a new version, then we delete the whole index and re-index all of the files referred to in the `data.yaml` files section.
15+
16+
If no data.yml or data.yaml file is found, then all CSV files in `DATA_PATH` will be loaded, and all fields in their headers will be used.
17+
18+
For an example data file, visit https://collegescorecard.ed.gov/data/ and download the full data package. A data.yaml file will be included in the ZIP file download.
19+
120
# Dictionary Format
221

322
The data dictionary format may be (optionally) specified in the `data.yaml` file. If unspecified, all columns are imported as strings.

INSTALL.md

Lines changed: 11 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -19,22 +19,14 @@ To run Open Data Maker, you will need to have the following software installed o
1919
* [Elasticsearch] 1.7.3
2020
* [Ruby] 2.2.2
2121

22-
**NOTE: Open Data Maker does not currently work with Elasticsearch versions 2.x and above.**
23-
You can follow or assist our progress towards 2.x compatibility [at this GitHub issue](https://github.com/18F/open-data-maker/issues/248).
22+
**NOTE: Open Data Maker indexing currently is very slow on ES2.x, however, an index created on 1.x can be restored to 2.x.
2423

2524
### Mac OS X
2625

27-
On a Mac, we recommend installing Ruby 2.2.2 via [RVM], and Elasticsearch 1.7.3 via
28-
[Homebrew]. If you don't want to use the bootstrap script above, you can install
29-
elasticsearch 1.7 with brew using the following command:
30-
31-
```
32-
brew install elasticsearch17
33-
```
26+
On a Mac, we recommend installing [RVM].
3427

3528
If you are contributing to development, you will also need [Git].
36-
If you don't already have these tools, the 18F [laptop] script will install
37-
them for you.
29+
3830

3931
## Get the Source Code
4032

@@ -48,14 +40,6 @@ cd open-data-maker
4840

4941
## Run the App
5042

51-
### Make sure Elasticsearch is up and running
52-
If you just ran `script/bootstrap`, then Elasticsearch should already be
53-
running. But if you stopped it or restarted your computer, you'll need to
54-
start it back up. Assuming you installed Elasticsearch via our `bootstrap`
55-
script, you can restart it with this command:
56-
57-
```brew services restart elasticsearch```
58-
5943

6044
### Import the data
6145

@@ -116,24 +100,24 @@ rake es:delete[_all]
116100
The data directory can optionally include a file called `data.yaml` (see [the sample one](sample-data/data.yaml) for its schema) that references one or more `.csv` files and specifies data types,
117101
field name mapping, and other support data.
118102

119-
## Experimental web UI for indexing
120103

121-
Optionally, you can enable indexing from webapp, but this option is still experimental:
122-
* `export INDEX_APP=enable`
123-
* in your browser, go to /index/reindex
104+
## Debugging
105+
106+
`ES_DEBUG` environment variable will turn on verbose tracer in the Elasticsearch client
107+
108+
optional performance profiling for rake import: `rake import[profile=true]`
124109

125-
the old index (if present) will be deleted and re-created from source files at DATA_PATH.
126110

127111
## Want to help?
128112

129113
See [Contribution Guide](CONTRIBUTING.md)
130114

131-
Read additional [implementation notes](NOTES.md)
132-
133115
[Elasticsearch]: https://www.elastic.co/products/elasticsearch
134116
[Homebrew]: http://brew.sh/
135117
[RVM]: https://github.com/wayneeseguin/rvm
136118
[rbenv]: https://github.com/sstephenson/rbenv
137119
[Ruby]: https://www.ruby-lang.org/en/
138120
[Git]: https://git-scm.com/
139-
[laptop]: https://github.com/18F/laptop
121+
122+
123+

NOTES.md

Lines changed: 0 additions & 23 deletions
This file was deleted.

README.md

Lines changed: 0 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -84,21 +84,6 @@ options:
8484
```
8585

8686

87-
88-
## Help Wanted
89-
90-
1. Try out importing multiple data sets with different endpoints and data.yaml configuration
91-
2. Take a look at our [open issues](https://github.com/18F/open-data-maker/issues) and our [Contribution Guide](CONTRIBUTING.md)
92-
93-
## More Info
94-
95-
Here's how it might look in the future:
96-
97-
![overview of data types, prompt to download data, create a custom data set, or look at API docs](/doc/data-overview.png)
98-
99-
100-
![Download all the data or make choices to create a csv with a subset](/doc/csv-download.png)
101-
10287
### Acknowledgements
10388
Zipcode latitude and longitude provided by [GeoNames](http://www.geonames.org/) under under a [Creative Commons Attribution 3.0 License](http://creativecommons.org/licenses/by/3.0/).
10489

doc/csv-download.png

-69.2 KB
Binary file not shown.

doc/data-overview.png

-80.5 KB
Binary file not shown.

manifest-dev.yml

Lines changed: 0 additions & 12 deletions
This file was deleted.

manifest-ex.yml

Lines changed: 0 additions & 14 deletions
This file was deleted.

0 commit comments

Comments
 (0)