-
Notifications
You must be signed in to change notification settings - Fork 40
Programmatic access to queries & modularity #106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@jangorecki @Tmonster Would like to solicit your feedback on these edits before I go too far down this path. I fully intend to extend this approach to other dask (and python) tasks to hopefully help there too. But please let me know your opinions so I can make sure to make updates that reflect the needs of the project as well |
|
Thank you for the PR! I'll have a look. I know we usually like to have the scripts formatted in such a way that it is easy to run them by hand using a script format, but these changes seems close enough. Also, dask has also been one of the more difficult solutions to get working, so am happy to have someone improve it. Let me get a PR up that fixes the regression tests and I'll take a closer looks at everything |
|
@Tmonster Awesome 😎 looking forward to your feedback! 100% understand the desire to run it like a script, I think that is a good idea, and it looks like the Happy to take feedback then try to tackle |
|
@CarterFendley can you rebase and push again? The regression tests should run this time and not produce errors everywhere 😅 |
583bd5d to
fbde233
Compare
|
@Tmonster Done, looks like it is waiting for approval to run the workflow |
|
Main reason against those kind of ideas was, and for me personally still is, to be able to copy-paste line-by-line script into console. This eliminates extra surface where might be performance regressions, even if not now, then eventually in future. As I am not maintainer anymore, it is not my decision. |
|
Looks good from my side. Eventually I'd like to get the back to a state where I can paste commands line by line. But it is already a step in the right direction for debugging and maintenance. In some future PR we can get it to line by line debugging |
|
Sounds good! That's an interesting perspective Jan. I definitely want to prevent regressions. I was hoping that a single As I have mentioned, one of my main purposes is to be able to programmatically update things such as the I am happy to instrument code coverage measurements or other things to help assure there are no regressions as you all see fit. Want to assure the stability of the benchmark! |
This PR is intended to start a discussion on how to re-use as much of the benchmark code as possible. Specifically, this proposes the creation of a function similar to
run_query(...)as prototyped in this commit editing dask groupby query logic:There are number of benefits to this approach:
In addition to the primary changes stated above, there are a few other changes:
__main__guard from a wrapper to the main file itself (removegroupby-dask2.py)