You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 7, 2018. It is now read-only.
Copy file name to clipboardExpand all lines: API.md
+72-1
Original file line number
Diff line number
Diff line change
@@ -5,6 +5,7 @@ This document explains:
5
5
* How to define and execute queries as URLs
6
6
* Refining query results using option parameters
7
7
* Extracting query results in JSON and CSV format
8
+
* Generating aggregate data using statistics queries
8
9
* Detecting query errors
9
10
10
11
## Introduction to Queries
@@ -18,10 +19,12 @@ Each query is expressed as a URL, containing:
18
19
* The **API Version String**. Currently the only supported version string is: `v1`
19
20
* The **Endpoint** representing a particular dataset, e.g. `schools`. Endpoint
20
21
names are usually plural.
22
+
* An optional **Query Type**, added to the Endpoint's path. Currently the only
23
+
additional type is `stats`; see the section on [Statistics Queries](#statistics-queries) for more information.
21
24
* The **Format** for the result data. The default output format is JSON ([JavaScript Object Notation](http://json.org/)); CSV is
22
25
also available.
23
26
* The **Query String** containing a set of named key-value pairs that
24
-
represent the query, which incude
27
+
represent the query, which include
25
28
***Field Parameters**, specifying a value (or set of values) to match
26
29
against a particular field, and
27
30
***Option Parameters**, which affect the filtering and output of the
@@ -215,3 +218,71 @@ When the dataset includes a `location` at the root level (`location.lat` and
215
218
* By default, any number passed in the `_distance` parameter is treated as a number of miles, but you can specify miles or kilometers by appending `mi` or `km` respectively.
216
219
* Distances are calculated from the center of the given zip code, not the boundary.
217
220
* Only U.S. zip codes are supported.
221
+
222
+
## Statistics Queries
223
+
224
+
The queries discussed so far are only capable of returning individual records and selected values from those records. However, it's also possible to generate aggregate data from a specified set of records by making use of Statistics Queries.
*`/stats` is appended to the Endpoint. This is the key indicator that
237
+
statistics should be returned instead of individual records.
238
+
*`school.degrees_awarded.predominant=2,3` is a Field Parameter. In this case, it's searching for records which have a `school.degrees_awarded.predominant` value of either `2` or `3`. The aggregated statistics will be generated from this subset of records.
239
+
*`_fields=2013.student.size` limits the aggregation to only operating over the `2013.student.size` field. Multiple fields can be specified and aggregated in a single query, but only those with numeric data can be used.
240
+
*`_metrics` is an Option Parameter only available to statistics queries, and limits the kinds of aggregations performed. See below for more information.
241
+
242
+
This is the JSON document returned:
243
+
244
+
```json
245
+
{
246
+
"metadata": {
247
+
"total": 3667,
248
+
"page": 0,
249
+
"per_page": 20
250
+
},
251
+
"results": [],
252
+
"aggregations": {
253
+
"school.tuition_revenue_per_fte": {
254
+
"avg": "0.1088815711947627E5",
255
+
"sum": 73288234,
256
+
"std_deviation": "0.75913587304684015E4",
257
+
"std_deviation_bounds": {
258
+
"upper": "0.26070874580413074E5",
259
+
"lower": "-0.4294560341460534E4"
260
+
}
261
+
}
262
+
}
263
+
}
264
+
```
265
+
266
+
Note that the top-level elements returned by a statistics query differ from those returned by other kinds of queries:
267
+
268
+
***`metadata`** provides the same information as it does in other queries.
269
+
***`total`** provides the number of records matching the query (in this case, all those schools with a `school.degrees_awarded.predominant` of 2 or 3). This is the subset of records from which the statistics are calculated.
270
+
***`page`** and **`per_page`** are irrelevant in statistics queries, and will likely be removed in a future version of the API.
271
+
***`results`** is always empty in statistics queries, and may be removed in a future version of the API.
272
+
***`aggregations`** contains a JSON Object for every field specified in the `_fields` parameter. Within these Objects there's an entry for every type of aggregation performed. In this case, use of the `_metrics` parameter has limited the returned aggregations to `avg`, `sum`, `std_deviation` and `std_deviation_bounds`. See below for more information.
273
+
274
+
### Specifying aggregations with `_metrics`
275
+
276
+
By default, the full set of available aggregations is calculated and returned for each field specified in the `_fields` parameter. These aggregations are calculated by ElasticSearch's [Extended Stats Aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/1.7/search-aggregations-metrics-extendedstats-aggregation.html):
277
+
278
+
*`count`
279
+
*`min`
280
+
*`max`
281
+
*`avg`
282
+
*`sum`
283
+
*`sum_of_squares`
284
+
*`variance`
285
+
*`std_deviation`
286
+
*`std_deviation_bounds`
287
+
288
+
Each of these provides a single value, with the expection of `std_deviation_bounds`, which provides a JSON Object containing `upper` and `lower` bounds.
0 commit comments