You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Apr 5, 2021. It is now read-only.
@@ -134,7 +132,7 @@ When failing to execute a query, Open Data Maker will attempt to return a JSON e
134
132
135
133
## Field Parameters
136
134
137
-
Parameter names _without_ an underscore prefix are assumed to be field names in the dataset. Supplying a value to a field parameter acts as a query filter, and only returns records where the given field exactly matches the given value.
135
+
Parameter names are assumed to be field names in the dataset. Supplying a value to a field parameter acts as a query filter, and only returns records where the given field exactly matches the given value.
138
136
139
137
For example: Use the parameter `school.region_id=6` to only fetch records with a `school.region_id` value of `6`.
140
138
@@ -176,7 +174,6 @@ For example: `2013.student.size__range=100..500` matches on schools which had be
176
174
177
175
Open-ended ranges can be performed by omitting one side of the range. For example: `2013.student.size__range=1000..` matches on schools which had over 1000 students.
178
176
179
-
You can even supply a list of ranges, separated by commas. For example, For example: `2013.student.size__range=..100,1000..2000,5000..` matches on schools which had under 100 students, between 1000 and 2000 students, or over 5000 students.
180
177
181
178
#### Additional Notes on Ranges
182
179
@@ -186,39 +183,48 @@ You can even supply a list of ranges, separated by commas. For example, For exam
186
183
187
184
## Option Parameters
188
185
189
-
You can perform extra refinement and organisation of search results using **option parameters**. These special parameters have names beginning with an underscore character (`_`).
186
+
You can perform extra refinement and organisation of search results using **option parameters**. These special parameters are listed below.
190
187
191
-
### Limiting Returned Fields with `_fields`
188
+
### Limiting Returned Fields with `fields`
192
189
193
-
By default, records returned in the query response include all their stored fields. However, you can limit the fields returned with the `_fields` option parameter. This parameter takes a comma-separated list of field names. For example: `_fields=id,school.name,school.state` will return result records that only contain those three fields.
190
+
By default, records returned in the query response include all their stored fields. However, you can limit the fields returned with the `fields` option parameter. This parameter takes a comma-separated list of field names. For example: `fields=id,school.name,school.state` will return result records that only contain those three fields.
194
191
195
192
Requesting specific fields in the response will significantly improve performance and reduce JSON traffic, and is recommended.
196
193
197
-
### Pagination with `_page` and `_per_page`
194
+
### Pagination with `page` and `per_page`
198
195
199
-
By default, results are returned in pages of 20 records at a time. To retrieve pages after the first, set the `_page` option parameter to the number of the page you wish to retrieve. Page numbers start at zero; so, to return records 21 through 40, use `_page=1`. Remember that the total number of records available for a given query is given in the `total` field of the top-level `metadata` object.
196
+
By default, results are returned in pages of 20 records at a time. To retrieve pages after the first, set the `page` option parameter to the number of the page you wish to retrieve. Page numbers start at zero; so, to return records 21 through 40, use `page=1`. Remember that the total number of records available for a given query is given in the `total` field of the top-level `metadata` object.
200
197
201
-
You can also change the number of records returned per page using the `_per_page` option parameter, up to a maximum of 100 records. Bear in mind, however, that large result pages will increase the amount of JSON returned and reduce the performance of the API.
198
+
You can also change the number of records returned per page using the `per_page` option parameter, up to a maximum of 100 records. Bear in mind, however, that large result pages will increase the amount of JSON returned and reduce the performance of the API.
202
199
203
-
### Sorting with `_sort`
200
+
### Sorting with `sort`
204
201
205
-
To sort results by a given field, use the `_sort` option parameter. For example, `_sort=2015.student.size` will return records sorted by 2015 student size, in ascending order.
202
+
To sort results by a given field, use the `sort` option parameter. For example, `sort=2015.student.size` will return records sorted by 2015 student size, in ascending order.
206
203
207
-
By default, using the `_sort_` option returns records sorted into ascending order, but you can specify ascending or descending order by appending `:asc` or `:desc` to the field name. For example: `_sort=2015.student.size:desc`
204
+
By default, using the `sort` option returns records sorted into ascending order, but you can specify ascending or descending order by appending `:asc` or `:desc` to the field name. For example: `sort=2015.student.size:desc`
208
205
209
-
**Note:** Sorting is only availble on fields with the data type `integer`, `float`, `autocomplete` or `name`.
206
+
**Note:** Sorting is only available on fields with the data type `integer`, `float`, `autocomplete` or `name`.
210
207
211
208
**Note:** Make sure the sort parameter is a field in the data set. For more information, please take a look at [data dictionary](https://collegescorecard.ed.gov/assets/CollegeScorecardDataDictionary.xlsx)
212
209
213
-
### Geographic Filtering with `_zip` and `_distance`
210
+
### Geographic Filtering with `zip` and `distance`
214
211
215
212
When the dataset includes a `location` at the root level (`location.lat` and
216
-
`location.lon`) then the documents will be indexed geographically. You can use the `_zip` and `_distance` options to narrow query results down to those within a geographic area. For example, `_zip=12345&_distance=10mi` will return only those results within 10 miles of the center of the given zip code.
213
+
`location.lon`) then the documents will be indexed geographically. You can use the `zip` and `distance` options to narrow query results down to those within a geographic area. For example, `zip=12345&distance=10mi` will return only those results within 10 miles of the center of the given zip code.
217
214
218
-
Additionally, you can request `location.lat` and `location.lon` in a search that includes a `_fields` filter and it will return the record(s) with respective lat and/or lon coordinates.
215
+
Additionally, you can request `location.lat` and `location.lon` in a search that includes a `fields` filter and it will return the record(s) with respective lat and/or lon coordinates.
219
216
220
217
#### Additional Notes on Geographic Filtering
221
218
222
-
* By default, any number passed in the `_distance` parameter is treated as a number of miles, but you can specify miles or kilometers by appending `mi` or `km` respectively.
219
+
* By default, any number passed in the `distance` parameter is treated as a number of miles, but you can specify miles or kilometers by appending `mi` or `km` respectively.
223
220
* Distances are calculated from the center of the given zip code, not the boundary.
224
221
* Only U.S. zip codes are supported.
222
+
223
+
224
+
# New for Version 1.7
225
+
226
+
With the inclusion of the Department of Education's Field of Study data, there are a number of new improvements that have been incorporated into Open Data Maker.
227
+
228
+
* The field of study data is included as an array of objects nested under a specified key. These objects may be queried just like any other data. However, there is an additional parameters to add to your API call to manage what is returned. By default, if specifying a search parameter, only objects of the array that match that parameter will be returned. You can pass `&all_programs_nested=true` to return all the items in the array instead of just those that match.
229
+
* When specifying specific fields to be returned from the API, the default response is to have a dotted string of the path to the field returned. As of verison 1.7, you can pass the parameter `keys_nested=true` get back a true json object instead of the dotted string.
230
+
* Lastly, wildcard fields are now possible with version 1.7. If you want to get back data for just the latest available data, it is now possible to specify a query such as `fields=id,school,latest` which will return the ID field, the School object and the Latest object and all the nested objects contained within each.
Copy file name to clipboardExpand all lines: CONTRIBUTING.md
+5-41Lines changed: 5 additions & 41 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
## Contributing
2
2
3
3
We aspire to create a welcoming environment for collaboration on this project.
4
-
To that end, we follow the [18F Code of Conduct](https://github.com/18F/code-of-conduct/blob/master/code-of-conduct.md) and ask that all contributors do the same.
4
+
5
5
6
6
### Public domain
7
7
@@ -15,11 +15,7 @@ with this waiver of copyright interest.
15
15
16
16
## Communication
17
17
18
-
There are a few ways to communicate with other folks working on this project:
19
-
20
-
* For general questions, discussion and announcements, please join [Google Group]
21
-
* For noisy, informal chatter, you can join us on the [open-data-maker-pub Slack Channel](https://chat.18f.gov). Notifications from github are posted here.
22
-
* For bug reports, please [file an issue](https://github.com/18F/open-data-maker/issues).
18
+
For bug reports, please [file an issue](https://github.com/18F/open-data-maker/issues).
23
19
24
20
## About the Tech
25
21
@@ -46,7 +42,7 @@ This project follows the [git flow](http://nvie.com/posts/a-successful-git-branc
46
42
for review by our design and product folks, then to master.
47
43
48
44
This project is in alpha, so things are fast moving! We hope you consider it
49
-
a fun time to get involved. In the near term, we have a very specific focus for this app, but we expect it will be generally useful for other projects as well. If you are thinking about deploying this app at your agency or organization, please let us know by introducing yourself in the [Google Group] and telling us a bit about your project or idea.
45
+
a fun time to get involved. In the near term, we have a very specific focus for this app, but we expect it will be generally useful for other projects as well.
50
46
51
47
### Testing
52
48
@@ -98,7 +94,7 @@ chances of your issue being dealt with quickly:
98
94
### Submitting a Pull Request
99
95
Before you submit your pull request consider the following guidelines:
100
96
101
-
* Search [GitHub](https://github.com/18F/open-data-maker/pulls) for an open or closed Pull Request that relates to your submission. You don't want to duplicate effort.
97
+
* Search [GitHub](https://github.com/RTICWDT/open-data-maker/pulls) for an open or closed Pull Request that relates to your submission. You don't want to duplicate effort.
102
98
* Make your changes in a new git branch
103
99
104
100
```shell
@@ -137,37 +133,7 @@ That's it! Thank you for your contribution!
137
133
138
134
#### After your pull request is merged
139
135
140
-
After your pull request is merged, you can safely delete your branch and pull the changes from the main (upstream) repository:
141
-
142
-
* Check out the dev branch:
143
-
144
-
```shell
145
-
git checkout dev -f
146
-
```
147
-
148
-
* Delete the local branch:
149
-
150
-
```shell
151
-
git branch -D dev-my-fix
152
-
```
153
-
154
-
* Update with the latest upstream version:
155
-
156
-
```shell
157
-
git pull --ff upstream dev
158
-
```
159
-
Note: this assumes that you have already added the `upstream` remote repository, using this command:
Copy file name to clipboardExpand all lines: DICTIONARY.md
+19Lines changed: 19 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,3 +1,22 @@
1
+
# Data
2
+
3
+
Details about the data are specified by DATA_PATH/data.yaml.
4
+
Where DATA_PATH is an environment variable, which may be:
5
+
6
+
*`s3://username:password@bucket_name/path`
7
+
*`s3://bucket_name/path`
8
+
*`s3://bucket_name`
9
+
* a local path like: `./data`
10
+
11
+
12
+
This file is loaded the first time it is needed and then stored in memory. The contents of `data.yaml` are stored as JSON in Elasticsearch in a single document of type `config` with id `1`.
13
+
14
+
The version field of this document is checked at startup. If the new config has a new version, then we delete the whole index and re-index all of the files referred to in the `data.yaml` files section.
15
+
16
+
If no data.yml or data.yaml file is found, then all CSV files in `DATA_PATH` will be loaded, and all fields in their headers will be used.
17
+
18
+
For an example data file, visit https://collegescorecard.ed.gov/data/ and download the full data package. A data.yaml file will be included in the ZIP file download.
19
+
1
20
# Dictionary Format
2
21
3
22
The data dictionary format may be (optionally) specified in the `data.yaml` file. If unspecified, all columns are imported as strings.
Copy file name to clipboardExpand all lines: INSTALL.md
+11-27Lines changed: 11 additions & 27 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,22 +19,14 @@ To run Open Data Maker, you will need to have the following software installed o
19
19
*[Elasticsearch] 1.7.3
20
20
*[Ruby] 2.2.2
21
21
22
-
**NOTE: Open Data Maker does not currently work with Elasticsearch versions 2.x and above.**
23
-
You can follow or assist our progress towards 2.x compatibility [at this GitHub issue](https://github.com/18F/open-data-maker/issues/248).
22
+
**NOTE: Open Data Maker indexing currently is very slow on ES2.x, however, an index created on 1.x can be restored to 2.x.
24
23
25
24
### Mac OS X
26
25
27
-
On a Mac, we recommend installing Ruby 2.2.2 via [RVM], and Elasticsearch 1.7.3 via
28
-
[Homebrew]. If you don't want to use the bootstrap script above, you can install
29
-
elasticsearch 1.7 with brew using the following command:
30
-
31
-
```
32
-
brew install elasticsearch17
33
-
```
26
+
On a Mac, we recommend installing [RVM].
34
27
35
28
If you are contributing to development, you will also need [Git].
36
-
If you don't already have these tools, the 18F [laptop] script will install
37
-
them for you.
29
+
38
30
39
31
## Get the Source Code
40
32
@@ -48,14 +40,6 @@ cd open-data-maker
48
40
49
41
## Run the App
50
42
51
-
### Make sure Elasticsearch is up and running
52
-
If you just ran `script/bootstrap`, then Elasticsearch should already be
53
-
running. But if you stopped it or restarted your computer, you'll need to
54
-
start it back up. Assuming you installed Elasticsearch via our `bootstrap`
55
-
script, you can restart it with this command:
56
-
57
-
```brew services restart elasticsearch```
58
-
59
43
60
44
### Import the data
61
45
@@ -116,24 +100,24 @@ rake es:delete[_all]
116
100
The data directory can optionally include a file called `data.yaml` (see [the sample one](sample-data/data.yaml) for its schema) that references one or more `.csv` files and specifies data types,
117
101
field name mapping, and other support data.
118
102
119
-
## Experimental web UI for indexing
120
103
121
-
Optionally, you can enable indexing from webapp, but this option is still experimental:
122
-
* `export INDEX_APP=enable`
123
-
* in your browser, go to /index/reindex
104
+
## Debugging
105
+
106
+
`ES_DEBUG` environment variable will turn on verbose tracer in the Elasticsearch client
107
+
108
+
optional performance profiling for rake import: `rake import[profile=true]`
124
109
125
-
the old index (if present) will be deleted and re-created from source files at DATA_PATH.
Copy file name to clipboardExpand all lines: README.md
-15Lines changed: 0 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -84,21 +84,6 @@ options:
84
84
```
85
85
86
86
87
-
88
-
## Help Wanted
89
-
90
-
1. Try out importing multiple data sets with different endpoints and data.yaml configuration
91
-
2. Take a look at our [open issues](https://github.com/18F/open-data-maker/issues) and our [Contribution Guide](CONTRIBUTING.md)
92
-
93
-
## More Info
94
-
95
-
Here's how it might look in the future:
96
-
97
-

98
-
99
-
100
-

101
-
102
87
### Acknowledgements
103
88
Zipcode latitude and longitude provided by [GeoNames](http://www.geonames.org/) under under a [Creative Commons Attribution 3.0 License](http://creativecommons.org/licenses/by/3.0/).
0 commit comments