Skip to content
This repository was archived by the owner on Nov 7, 2018. It is now read-only.

Conversation

@siruguri
Copy link
Contributor

@siruguri siruguri commented Sep 2, 2015

  • Syntax is WIP; needs to be reviewed - currently, localhost:3000/v1/cities/stats?state=TX&fields=population,land_area&_metrics=avg,max,sum_of_squares works
  • We should have an API spec test - edit: added an integration test in a second commit - shd probably be added to spec/features/api_spec.rb

* Syntax is WIP; needs to be reviewed
* We should have an API spec test - tbd.
@ultrasaurus
Copy link
Contributor

@siruguri this is pretty sweet. Since the metrics are delivered with the results anyhow, I don't think need the /stats endpoint, especially since @yozlet added the _ prefix to all of the option params. I think it would be nice to have consistent naming -- I like the option _metrics and would use that same name for DataMagic.search option instead of add_aggregations.

let's get feedback from @shawnbot and/or @yozlet before finalizing the naming

Thank you so much!

@siruguri
Copy link
Contributor Author

All sounds good ... will wait for more fdbk. Will rebase to @yozley's
changes shortly.

On Sun, Sep 13, 2015 at 11:44 PM, Sarah Allen [email protected]
wrote:

@siruguri https://github.com/siruguri this is pretty sweet. Since the
metrics are delivered with the results anyhow, I don't think need the
/stats endpoint, especially since @yozlet https://github.com/yozlet
added the _ prefix to all of the option params. I think it would be nice to
have consistent naming -- I like the option _metrics and would use that
same name for DataMagic.search option instead of add_aggregations.

let's get feedback from @shawnbot https://github.com/shawnbot and/or
@yozlet https://github.com/yozlet before finalizing the naming

Thank you so much!


Reply to this email directly or view it on GitHub
#181 (comment).

@shawnbot
Copy link
Contributor

@ultrasaurus I still think that it'd be better to have a separate endpoint for stats. In our case, we need to display stats for a larger subset of the data than we're fetching to display.

For instance, we need the median value of three fields for a large subset of schools (common to all searches), but then we need to query the API for a specific subset determined by the search parameters. Having these separated into two different requests should actually improve performance because the stats queries can be cached separately.

@ultrasaurus
Copy link
Contributor

For reference this is the current output of the API:

{
  "metadata": {
    "total": 100,
    "page": 0,
    "per_page": 20
  },
  "results": [
  ...
  ],
  "aggregations": {
    "population": {
      "max": 8175133,
      "avg": 600361.57
    }
  }
}

@siruguri
Copy link
Contributor Author

I think @ultrasaurus is implying that without the _metrics option, metrics are NOT calculated. So the use case @shawnbot describes would be something like:

  • /cities?state=TX&_metrics=...&_fields=my,three,fields - this would be contain the median values and would be cached separately
  • /cities?state=TX - no metrics, just the cities in Texas.

Does that describe what both of you are looking to do?

@shawnbot
Copy link
Contributor

@siruguri I think it does, but in that case I don't care about the results in the first request, and in the case of College Scorecard they could increase the payload size significantly. I guess that limiting the fields would help, but you'd still end up with 20 objects * 3 properties in the results array that just get ignored.

Also with College Scorecard (and, I imagine, in most apps), there are lots of fields that we need to display that we don't necessarily want stats for. In fact, I'm having a hard time imagining a scenario in which you would only want to get the fields that you've summarized. What happens, for instance, when you want the name and population of your cities, but want to summarize only the population?

I guess what I'm suggesting is that there aren't many use cases for getting results and stats in the same request, and that overloading the results endpoint with stats capabilities might lead to confusion, both in the code and among API users.

@siruguri
Copy link
Contributor Author

I could implement it so that _metrics takes one of two values - included
and only. The latter would not generate search hits; the former would. One
endpoint; but optimal behavior ... might that be an acceptable compromise?

Tbc, I don't really care - if you give me the final word, I'll go that way
:)

On Tue, Sep 15, 2015 at 11:08 AM, Shawn Allen [email protected]
wrote:

@siruguri https://github.com/siruguri I think it does, but in that case
I don't care about the results in the first request, and in the case of
College Scorecard they could increase the payload size significantly. I
guess that limiting the fields would help, but you'd still end up with 20
objects * 3 properties in the results array that just get ignored.

Also with College Scorecard (and, I imagine, in most apps), there are lots
of fields that we need to display that we don't necessarily want stats for.
In fact, I'm having a hard time imagining a scenario in which you would
only want to get the fields that you've summarized. What happens, for
instance, when you want the name and population of your cities, but want
to summarize only the population?

I guess what I'm suggesting is that there aren't many use cases for
getting results and stats in the same request, and that overloading the
results endpoint with stats capabilities might lead to confusion, both in
the code and among API users.


Reply to this email directly or view it on GitHub
#181 (comment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this addition about? Looks like it's related to the test you added to api_spec.rb - is that right?

@yozlet
Copy link
Contributor

yozlet commented Sep 18, 2015

I'm inclined towards @shawnbot's take on the API design, specifically that

  1. aggregations and results should not be returned in the same request; it should be one or the other, for the reasons Shawn gave. (This also has a slight cacheability benefit)
  2. Given the above, aggregations render half of the option parameters useless (along with _metrics, only _fields, _distance and _zip remain)
  3. And given that aggregations also completely change the shape of the data being returned, I support the idea of appending /stats to the endpoint.

@ultrasaurus
Copy link
Contributor

I like this direction! I think we're talking about an API that looks like this:

http://localhost:3000/v1/cities/stats?state=TX&_fields=population,land_area&_metrics=avg,max

that would generate results that look like this:

{
  "metadata": {
      total: 30
  },
  "metrics": {
    "population": {
      "max": 8175133,
      "avg": 600361.57
    },
    "land_area": {
      "max": 3467.78,
      "avg": 2296.44
    }
  }
}

@siruguri
Copy link
Contributor Author

What branch shd I rebase to before continuing to work on this? dev?

@yozlet
Copy link
Contributor

yozlet commented Sep 22, 2015

Yep.

@yozlet
Copy link
Contributor

yozlet commented Sep 22, 2015

BTW, I recommend breaking the stats endpoint into its own controller method, and maybe add some methods down the line for it in DataMagic, QueryBuilder and ErrorChecker, rather than cramming more logic into the existing methods. Big thumbs up to any refactoring you can do here, and I'm happy to help with that too.

@siruguri
Copy link
Contributor Author

@yozlet - QueryBuilder has a stats-specific method, but the other classes don't. I'm not super conversant with Padrino/Sinatra coding idioms, but it seems like you're suggesting something like:

- app
  - controller
    - search.rb # This would be the main endpoint's controller code
    - stats.rb # /stats

For re-factoring, maybe pull the params management out into a module in lib.
Am I getting that right?

I am not sure if the routing/controller code's ready for re-factoring already ... maybe a third endpoint will inform it a bit better? I can see cleaning up DataMagic a bit but the routing isn't super complicated yet, imo.

@yozlet
Copy link
Contributor

yozlet commented Sep 24, 2015

Oh, sorry, I didn't mean to go as far as breaking everything out into separate files (though if that's where refactoring takes us, then sure). More just adding an additional get for /stats into the existing controllers.rb.

@siruguri
Copy link
Contributor Author

Ah, ok. Got it, yes, that makes sense.

On Thu, Sep 24, 2015 at 10:56 AM, Yoz Grahame [email protected]
wrote:

Oh, sorry, I didn't mean to go as far as breaking everything out into
separate files (though if that's where refactoring takes us, then sure).
More just adding an additional get for /stats into the existing
controllers.rb.


Reply to this email directly or view it on GitHub
#181 (comment).

@yozlet
Copy link
Contributor

yozlet commented Oct 22, 2015

Hey @siruguri, many apologies for the slowness around this PR. We're about to get funding to continue work on Open Data Maker so hopefully things should move faster soon. Many thanks for sticking with us!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the if clause returns a boolean anyway, you can just have it be the body of the method without the if/else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, idk why my brain doesn't notice these redundant blocks.

* Allow /stats to not be given _metrics option
* Re-factor code in controllers to separate /stats from main endpoint
* Remove result hits from aggregation payload in /stats
* Remove stray debugging cruft
* Refactor error checking
* Fix comments position
@siruguri siruguri force-pushed the siruguri-stats-endpoint branch from ed5922b to f93aece Compare October 23, 2015 04:16
@siruguri
Copy link
Contributor Author

FYI, I pushed an amended commit with the edits from today's discussion, but without rebasing to dev.

@yozlet
Copy link
Contributor

yozlet commented Nov 9, 2015

@siruguri Thanks for the update! However, it looks like you have a failing test - see the Travis CI failure.

@siruguri
Copy link
Contributor Author

Sorry, I didn't realize that the test would fail on Travis - it's because
the test is too specific to the structure of the Hash response. FYI, I am
shortly going to be outside Internet contact for a few weeks, and will
probably have to get back to this in Dec. Hope that's okay.

On Mon, Nov 9, 2015 at 3:21 PM, Yoz Grahame [email protected]
wrote:

@siruguri https://github.com/siruguri Thanks for the update! However,
it looks like you have a failing test - see the Travis CI failure.


Reply to this email directly or view it on GitHub
#181 (comment).

@ultrasaurus
Copy link
Contributor

@siruguri that is likely fine -- plenty of other things we need to do on this project, though I feel bad that you keep having to rebase! Let us know if you prefer we pick it up and integrate it. Otherwise we'll only do that if it (unexpectedly) becomes a critical feature in the interim.

@siruguri siruguri force-pushed the siruguri-stats-endpoint branch from 5d0adcc to b662c95 Compare December 5, 2015 17:31
@siruguri
Copy link
Contributor Author

siruguri commented Dec 5, 2015

Ok, I'm back from vacation! :) I merged in dev and fixed the tests that failed earlier. This should be good to go now.

@yozlet
Copy link
Contributor

yozlet commented Dec 8, 2015

YAY! Almost there, just one last question: What's that application.css for? (If it's just for making the front page look nicer, I suggest either (a) filing a separate PR or (b) waiting for #236)

@yozlet
Copy link
Contributor

yozlet commented Dec 8, 2015

Thank you so much for sticking with us!

@yozlet
Copy link
Contributor

yozlet commented Dec 8, 2015

Oh, duh, sorry, I see what happened - the application.css(.map) was generated from application.sass when you ran the server, and it stayed there and got checked in. I suggest removing it. (Maybe we should add it to .gitignore, but we can do that in a separate commit.)

@siruguri
Copy link
Contributor Author

siruguri commented Dec 9, 2015

They were generated by rake spec, interestingly enough. I did not in fact
start the Rack server... should I just do a git rm -r public/stylesheets?

On Tue, Dec 8, 2015 at 3:32 PM, Yoz Grahame [email protected]
wrote:

Oh, duh, sorry, I see what happened - the application.css(.map) was
generated from application.sass when you ran the server, and it stayed
there and got checked in. I suggest removing it. (Maybe we should add it to
.gitignore, but we can do that in a separate commit.)


Reply to this email directly or view it on GitHub
#181 (comment).

@yozlet
Copy link
Contributor

yozlet commented Dec 9, 2015

No, because you'll delete the application.sass from #235 and that's meant to be there. But git rm public/stylesheets/application.cs* should do the trick.

@yozlet
Copy link
Contributor

yozlet commented Dec 9, 2015

@siruguri WOOHOO! Thank you so much for this fantastic contribution!

you are clear to merge

yozlet added a commit that referenced this pull request Dec 9, 2015
@yozlet yozlet merged commit b295242 into 18F:dev Dec 9, 2015
@siruguri
Copy link
Contributor Author

siruguri commented Dec 9, 2015

:) Love the meme.

On Tue, Dec 8, 2015 at 6:04 PM, Yoz Grahame [email protected]
wrote:

Merged #181 #181.


Reply to this email directly or view it on GitHub
#181 (comment).

@ultrasaurus
Copy link
Contributor

👍 🎉 🎈

@siruguri siruguri deleted the siruguri-stats-endpoint branch January 26, 2016 05:51
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants