-
-
Notifications
You must be signed in to change notification settings - Fork 469
Add ability to export tables #4090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@ghislaineguerin @kgodey I've assigned you to this PR to get a sign off on the UX. |
mathemancer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall method seems reasonable. Nice work on that!
However, I think most of the SQL code and a large part of the Python code could be removed if you can get ServerCursor.fetchmany working (or just use the builtin itersize param and loop through rows one at a time clientside).
If there's a really good reason that doesn't work, what is it?
Also, please just pull apart the msar.list_records_from_table result to get the actual records to return. I don't want to have to maintain that formatted query in multiple places if we can avoid it. Alternatively, you could extract the CTE-generating code and call it from both the record lister and the iterable record lister SQL functions. I prefer the former approach, since it prioritizes the efficiency of the hot path, i.e., listing records for display normally.
db/connection.py
Outdated
| while True: | ||
| cursor.execute(fetch_query) | ||
| rows = cursor.fetchall() | ||
| if not rows: | ||
| break | ||
| yield rows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the fetchmany functionality of the psycopg.ServerCursor class? This seems like it's reinventing the wheel.
See the docs for ServerCursor.
I think it has all the pieces you need, and it'd be more convenient than rolling our own if it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We need a SQL call to build the select query which accepts filters and sorting in the exact format as
listdoes. - We need a SQL call to fetch the names of the columns (as the existing functions rely on oids).
- We need a server side cursor for the records.
If we use ServerCursor class for the records, we'd need to make additional calls for build_select_query and fetch_selectable_column_names. We'd also need to have additional logic to ensure that the column names are in the same order as the columns in the records result.
It's much simpler to fetch everything via a single plpgsql function.
db/sql/00_msar.sql
Outdated
| 'SELECT %1$s FROM %2$I.%3$I %7$s %6$s LIMIT %4$L OFFSET %5$L', | ||
| COALESCE(columns_sql, 'NULL'), | ||
| msar.get_relation_schema_name(tab_id), | ||
| msar.get_relation_name(tab_id), | ||
| limit_, | ||
| offset_, | ||
| msar.build_order_by_expr(tab_id, order_), | ||
| msar.build_where_clause(tab_id, filter_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code replication is avoidable by just calling the record lister, then pulling apart the result. Given that this is not exactly a "Hot Path", I think the very minor inefficiency involved is well worth the increase in maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code replication is avoidable by just calling the record lister, then pulling apart the result.
The record lister queries all the records and stores it into a single json, so calling that function means that we cannot use a cursor to iterate through the results. That would make the entire process inefficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless, maintaining that SQL construction in multiple different functions isn't a good idea. We'll need to find a better way.
|
Update for anyone following the PR: @mathemancer and I had a call discussing the approach to take and we'll be continuing the discussion after the holidays. This PR might remain idle until then. |
…esar into export_tables
… jsonb in table columns & records returning function
…esar into export_tables
74614c6 to
f675ceb
Compare
|
@mathemancer The PR is ready for review again, with the following discussed changes:
This also moves some common expressions and ctes into a different function. |
mathemancer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I left some comments of things I'd improve, but nothing blocking.
I'll leave it to you to merge the specific suggested function (if you want; it's not required), and then merge this PR.
db/sql/00_msar.sql
Outdated
| $$ LANGUAGE SQL STABLE RETURNS NULL ON NULL INPUT; | ||
|
|
||
|
|
||
| CREATE OR REPLACE FUNCTION msar.build_expr_and_ctes_for_enriched_results( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think blobbing all this logic into this function is a good idea.
- It obscures the logic and what the inputs/outputs are from each spot the various returned pieces are used
- It returns a complex object that isn't useful unless you look at the function's source code: It's not possible from the signature to determine what the function actually returns
- It's not documented, exacerbating the above problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a commit to improve the name of this function and some comments: 56481fd.
I hope this should alleviate some of the concerns. However, I'm open to refactoring this later if required.
Fixes #4067
This PR implements downloads for tables using the following approach:
Querying data:
Transferring:
Approaches tested
Querying data:
COPY (SELECT * FROM table) TOinstead ofCOPY table TO(since we need filtering & sorting and support for explorations in the future). This syntax has negligible performance difference compared to reading from a cursor.COPYis not supported inside a db function.Transferring:
What is not implemented in this PR:
UX changes from the design
Screen.Recording.2024-12-18.at.7.32.06.PM.mov
Checklist
Update index.md).developbranch of the repositoryvisible errors.
Developer Certificate of Origin
Developer Certificate of Origin