-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement option 'delete_rows' of argument 'if_exists' in 'DataFrame.to_sql' API. #60376
feat: Implement option 'delete_rows' of argument 'if_exists' in 'DataFrame.to_sql' API. #60376
Conversation
@WillAyd I chose the name @erfannariman tagging you due to your help/interest during the lifecycle of this issue. |
e443d8d
to
33cd0d6
Compare
3c33249
to
1ef5a87
Compare
b71c0d9
to
1843040
Compare
15bda94
to
2eb19e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that the test failures are related. Restarted so let's see...
My remaining feedback is rather minor; overall I think the implementation looks good.
@mroeschke care to take a look?
5f6ab41
to
d1b01d2
Compare
d1b01d2
to
3e8813f
Compare
77dc01c
to
4c8fcda
Compare
8cadb78
to
b0c2eff
Compare
blocked by #60748. Changing to draft. |
Hello @WillAyd and @mroeschke, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work - this is looking a lot cleaner after the pre-cursor
e67c0a2
to
2186e34
Compare
def delete_rows(self, name: str, schema: str | None = None) -> None: | ||
table_name = f"{schema}.{name}" if schema else name | ||
if self.has_table(name, schema): | ||
self.execute(f"DELETE FROM {table_name}").close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a bit unclear why we are calling .close()
in this implementation but not in the others
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a good / recommended practice to close a cursor after usage because you release resources. I believe that's a fact we agree ?
Ok...but putting it aside for a moment the answer is the adbc driver raises an error in case a cursor is not explicitly closed, causing some tests to fail. There's no such check in sqlalchemy / sqlite3, that's why a missing close
is "overlooked" there. Example:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So do sqlalchemy and sqlite3 just leak the cursor then? Or should they be adding this?
I am just confused as to why we are offering potentially different cursor lifetime management across the implementations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case pandas is leaking the cursor because sqlalchemy and sqlite3 do not provide a cool and friendly message alerting the developer. On the other hand it is standard practice to always close it.
I'm in favor of adding .close
(so we guarantee there's no leak) to all calls but we had this discussion previously.
Please let me know if I can help with more context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case pandas is leaking the cursor because sqlalchemy and sqlite3
Oh OK - I thought that previously you tried to .close
on those as well but they would cause other test failures, as the lifecycle of the cursor was tied to the class.
We shouldn't be leaking resources across any of these implementations, but the challenge is that we also need to stay consistent in how we are managing lifecycles. If it is as simple as calling .close (or pref using a context manager) for all implementations, then let's do that. If calling .close on everything but ADBC is causing an error, then we need to align the cursor lifecycle management of the ADBC subclass with the others
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There you go @WillAyd :). Sorry for not sending it as a separate commit. I'm keeping a close eye on CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good. CI passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK great. And none of these offer a context manager right? And we don't want to be using self.pd._sql.execute
either?
I'm still a little unclear as to the difference in usage of self.execute
versus self.pd_sql.execute
within this PR, but I also don't want to bikeshed if that's the path it takes us down
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK great. And none of these offer a context manager right? And we don't want to be using self.pd._sql.execute either?
sqlite3 implementation for a cursor does not provide a close object. This is one of the reasons I decided to standardize to .close
calls. Of course there's a workaround for that (contextlib.closing) and we can discuss if you find it worth it.
I'm still a little unclear as to the difference in usage of self.execute versus self.pd_sql.execute within this PR, but I also don't want to bikeshed if that's the path it takes us down
No worries, I don't think you're bikeshedding. It is a bit confusing for sure. Let me try to break it down:
self.pd_sql.execute
should be used only atSQLiteTable
orSQLTable
classes. These classes don't implement theexecute
method asSQLiteDatabase
andSQLDatabase
do, respectively and that's why we gotta useself.pd_sql
.- Remember we wanted to stop using
con
orcursor
objects directly...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK that's helpful. Sounds like self.pd_sql.execute
is just poorly named, but that's not a problem for this PR to fix
@@ -2069,6 +2080,16 @@ def drop_table(self, table_name: str, schema: str | None = None) -> None: | |||
self.get_table(table_name, schema).drop(bind=self.con) | |||
self.meta.clear() | |||
|
|||
def delete_rows(self, table_name: str, schema: str | None = None) -> None: | |||
schema = schema or self.meta.schema | |||
if self.has_table(table_name, schema): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in the case the table does not exist we are just ignoring any instruction to perform a DELETE? I'm somewhat wary of assuming a user may not want an error here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, me neither.
We can remove this check letting the driver error and eventually raise a DatabaseError
.
What you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought I think this is OK. It follows the same pattern as replace
which will create the table if it does not exist
2186e34
to
f5bc6ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm @mroeschke care to review?
9b015c0
to
908bd2f
Compare
pandas/tests/io/test_sql.py
Outdated
@@ -2698,6 +2700,58 @@ def test_drop_table(conn, request): | |||
assert not insp.has_table("temp_frame") | |||
|
|||
|
|||
@pytest.mark.parametrize("conn_name", all_connectable) | |||
def test_delete_rows_success(conn_name, test_frame1, request): | |||
table_name = "temp_frame" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Could you make table_name
a bit more unique between these two tests?
We have existing issue about eventually having tests consistently clean up tables they create, but in the meantime unique names will ensure your two added tests won't clobber each other if one fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright 👍 . Done !
908bd2f
to
7034f68
Compare
…Frame.to_sql' API.
7034f68
to
a52d2e2
Compare
Thanks @gmcrocetti |
Thanks a lot for reviewing it folks. A special thanks to @WillAyd who reviewed the two PRs multiple times 🙇 . |
doc/source/whatsnew/v3.0.0.rst
file if fixing a bug or adding a new feature.