feat: Implement option 'delete_rows' of argument 'if_exists' in 'DataFrame.to_sql' API. #60376

gmcrocetti · 2024-11-20T13:31:48Z

closes ENH: DataFrame.to_sql with if_exists='replace' should do truncate table instead of drop table #37210
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/v3.0.0.rst file if fixing a bug or adding a new feature.

gmcrocetti · 2024-11-20T13:37:13Z

@WillAyd I chose the name delete_rows instead of delete_replace because the behavior of replace right now is of recreate - as you mentioned - and delete_rows means exactly what is going on behind the scenes.

@erfannariman tagging you due to your help/interest during the lifecycle of this issue.

pandas/tests/io/test_sql.py

pandas/io/sql.py

pandas/tests/io/test_sql.py

WillAyd

I'm not sure that the test failures are related. Restarted so let's see...

My remaining feedback is rather minor; overall I think the implementation looks good.

@mroeschke care to take a look?

pandas/io/sql.py

pandas/tests/io/test_sql.py

pandas/io/sql.py

gmcrocetti · 2025-01-23T00:14:37Z

blocked by #60748. Changing to draft.

gmcrocetti · 2025-02-12T20:53:57Z

Hello @WillAyd and @mroeschke,
I believe the merge of #60748 has unblocked this one. Would you guys mind taking a look ? I'm resolving all conversations so we can start fresh on this.

WillAyd

Nice work - this is looking a lot cleaner after the pre-cursor

pandas/io/sql.py

WillAyd · 2025-02-18T01:27:28Z

pandas/io/sql.py

+    def delete_rows(self, name: str, schema: str | None = None) -> None:
+        table_name = f"{schema}.{name}" if schema else name
+        if self.has_table(name, schema):
+            self.execute(f"DELETE FROM {table_name}").close()


I'm still a bit unclear why we are calling .close() in this implementation but not in the others

It is a good / recommended practice to close a cursor after usage because you release resources. I believe that's a fact we agree ?

Ok...but putting it aside for a moment the answer is the adbc driver raises an error in case a cursor is not explicitly closed, causing some tests to fail. There's no such check in sqlalchemy / sqlite3, that's why a missing close is "overlooked" there. Example:

So do sqlalchemy and sqlite3 just leak the cursor then? Or should they be adding this?

I am just confused as to why we are offering potentially different cursor lifetime management across the implementations

In this case pandas is leaking the cursor because sqlalchemy and sqlite3 do not provide a cool and friendly message alerting the developer. On the other hand it is standard practice to always close it.

I'm in favor of adding .close (so we guarantee there's no leak) to all calls but we had this discussion previously.

Please let me know if I can help with more context

In this case pandas is leaking the cursor because sqlalchemy and sqlite3

Oh OK - I thought that previously you tried to .close on those as well but they would cause other test failures, as the lifecycle of the cursor was tied to the class.

We shouldn't be leaking resources across any of these implementations, but the challenge is that we also need to stay consistent in how we are managing lifecycles. If it is as simple as calling .close (or pref using a context manager) for all implementations, then let's do that. If calling .close on everything but ADBC is causing an error, then we need to align the cursor lifecycle management of the ADBC subclass with the others

There you go @WillAyd :). Sorry for not sending it as a separate commit. I'm keeping a close eye on CI.

All good. CI passed.

OK great. And none of these offer a context manager right? And we don't want to be using self.pd._sql.execute either?

I'm still a little unclear as to the difference in usage of self.execute versus self.pd_sql.execute within this PR, but I also don't want to bikeshed if that's the path it takes us down

OK great. And none of these offer a context manager right? And we don't want to be using self.pd._sql.execute either?

sqlite3 implementation for a cursor does not provide a close object. This is one of the reasons I decided to standardize to .close calls. Of course there's a workaround for that (contextlib.closing) and we can discuss if you find it worth it.

I'm still a little unclear as to the difference in usage of self.execute versus self.pd_sql.execute within this PR, but I also don't want to bikeshed if that's the path it takes us down

No worries, I don't think you're bikeshedding. It is a bit confusing for sure. Let me try to break it down:

self.pd_sql.execute should be used only at SQLiteTable or SQLTable classes. These classes don't implement the execute method as SQLiteDatabase and SQLDatabase do, respectively and that's why we gotta use self.pd_sql.

Remember we wanted to stop using con or cursor objects directly...

OK that's helpful. Sounds like self.pd_sql.execute is just poorly named, but that's not a problem for this PR to fix

WillAyd · 2025-02-18T01:29:53Z

pandas/io/sql.py

@@ -2069,6 +2080,16 @@ def drop_table(self, table_name: str, schema: str | None = None) -> None:
                self.get_table(table_name, schema).drop(bind=self.con)
            self.meta.clear()

+    def delete_rows(self, table_name: str, schema: str | None = None) -> None:
+        schema = schema or self.meta.schema
+        if self.has_table(table_name, schema):


So in the case the table does not exist we are just ignoring any instruction to perform a DELETE? I'm somewhat wary of assuming a user may not want an error here

Yeah, me neither.
We can remove this check letting the driver error and eventually raise a DatabaseError.
What you think ?

On second thought I think this is OK. It follows the same pattern as replace which will create the table if it does not exist

WillAyd

lgtm @mroeschke care to review?

mroeschke · 2025-02-19T02:43:15Z

pandas/tests/io/test_sql.py

@@ -2698,6 +2700,58 @@ def test_drop_table(conn, request):
        assert not insp.has_table("temp_frame")


+@pytest.mark.parametrize("conn_name", all_connectable)
+def test_delete_rows_success(conn_name, test_frame1, request):
+    table_name = "temp_frame"


Nit: Could you make table_name a bit more unique between these two tests?

We have existing issue about eventually having tests consistently clean up tables they create, but in the meantime unique names will ensure your two added tests won't clobber each other if one fails.

Alright 👍 . Done !

…Frame.to_sql' API.

mroeschke · 2025-02-19T17:07:45Z

Thanks @gmcrocetti

gmcrocetti · 2025-02-19T17:43:50Z

Thanks a lot for reviewing it folks. A special thanks to @WillAyd who reviewed the two PRs multiple times 🙇 .
Appreciated !

gmcrocetti force-pushed the issue-37210-to-sql-truncate branch from e443d8d to 33cd0d6 Compare November 20, 2024 14:04

gmcrocetti marked this pull request as draft November 20, 2024 14:04

WillAyd requested changes Nov 20, 2024

View reviewed changes

pandas/tests/io/test_sql.py Outdated Show resolved Hide resolved

pandas/tests/io/test_sql.py Outdated Show resolved Hide resolved

pandas/tests/io/test_sql.py Show resolved Hide resolved

WillAyd added the IO SQL to_sql, read_sql, read_sql_query label Nov 20, 2024

gmcrocetti force-pushed the issue-37210-to-sql-truncate branch 3 times, most recently from 3c33249 to 1ef5a87 Compare November 22, 2024 01:10

gmcrocetti requested a review from WillAyd November 22, 2024 12:26

gmcrocetti marked this pull request as ready for review November 22, 2024 12:26

gmcrocetti force-pushed the issue-37210-to-sql-truncate branch 3 times, most recently from b71c0d9 to 1843040 Compare December 17, 2024 13:45

WillAyd requested changes Dec 26, 2024

View reviewed changes

pandas/io/sql.py Outdated Show resolved Hide resolved

WillAyd requested changes Dec 27, 2024

View reviewed changes

pandas/tests/io/test_sql.py Outdated Show resolved Hide resolved

gmcrocetti force-pushed the issue-37210-to-sql-truncate branch from 15bda94 to 2eb19e7 Compare December 27, 2024 18:03

gmcrocetti requested a review from WillAyd December 27, 2024 18:04

WillAyd requested changes Dec 30, 2024

View reviewed changes

pandas/io/sql.py Show resolved Hide resolved

pandas/tests/io/test_sql.py Outdated Show resolved Hide resolved

mroeschke reviewed Dec 30, 2024

View reviewed changes

pandas/io/sql.py Outdated Show resolved Hide resolved

mroeschke reviewed Dec 30, 2024

View reviewed changes

pandas/io/sql.py Outdated Show resolved Hide resolved

gmcrocetti commented Jan 3, 2025

View reviewed changes

pandas/io/sql.py Show resolved Hide resolved

gmcrocetti requested review from WillAyd and mroeschke January 3, 2025 14:38

gmcrocetti force-pushed the issue-37210-to-sql-truncate branch from 5f6ab41 to d1b01d2 Compare January 3, 2025 14:42

gmcrocetti requested review from MarcoGorelli, Dr-Irv and datapythonista as code owners January 3, 2025 14:42

gmcrocetti force-pushed the issue-37210-to-sql-truncate branch from d1b01d2 to 3e8813f Compare January 3, 2025 14:45

WillAyd reviewed Jan 3, 2025

View reviewed changes

pandas/io/sql.py Outdated Show resolved Hide resolved

gmcrocetti force-pushed the issue-37210-to-sql-truncate branch from 77dc01c to 4c8fcda Compare January 3, 2025 17:43

gmcrocetti force-pushed the issue-37210-to-sql-truncate branch from 8cadb78 to b0c2eff Compare January 17, 2025 20:55

gmcrocetti mentioned this pull request Jan 22, 2025

refactor: deprecate usage of cursor.execute statements in favor of the in class implementation of execute. #60748

Merged

5 tasks

gmcrocetti marked this pull request as draft January 23, 2025 00:14

gmcrocetti mentioned this pull request Feb 12, 2025

ENH: DataFrame.to_sql with if_exists='replace' should do truncate table instead of drop table #37210

Closed

WillAyd reviewed Feb 13, 2025

View reviewed changes

pandas/io/sql.py Outdated Show resolved Hide resolved

gmcrocetti force-pushed the issue-37210-to-sql-truncate branch 4 times, most recently from e67c0a2 to 2186e34 Compare February 17, 2025 18:04

gmcrocetti marked this pull request as ready for review February 17, 2025 19:00

gmcrocetti requested a review from WillAyd February 17, 2025 19:00

WillAyd requested changes Feb 18, 2025

View reviewed changes

gmcrocetti force-pushed the issue-37210-to-sql-truncate branch from 2186e34 to f5bc6ff Compare February 18, 2025 17:24

gmcrocetti requested review from WillAyd and arkpope February 18, 2025 19:11

WillAyd approved these changes Feb 18, 2025

View reviewed changes

gmcrocetti force-pushed the issue-37210-to-sql-truncate branch from 9b015c0 to 908bd2f Compare February 19, 2025 01:27

mroeschke reviewed Feb 19, 2025

View reviewed changes

gmcrocetti force-pushed the issue-37210-to-sql-truncate branch from 908bd2f to 7034f68 Compare February 19, 2025 12:11

feat: implement option 'delete_rows' of argument 'if_exists' in 'Data…

a52d2e2

…Frame.to_sql' API.

gmcrocetti force-pushed the issue-37210-to-sql-truncate branch from 7034f68 to a52d2e2 Compare February 19, 2025 12:11

gmcrocetti requested a review from mroeschke February 19, 2025 12:26

mroeschke approved these changes Feb 19, 2025

View reviewed changes

mroeschke added this to the 3.0 milestone Feb 19, 2025

mroeschke merged commit 4c3b573 into pandas-dev:main Feb 19, 2025
42 checks passed

gmcrocetti mentioned this pull request Feb 25, 2025

docs: include option 'delete_rows' into DataFrame.to_sql #61008

Merged

5 tasks

Uh oh!

feat: Implement option 'delete_rows' of argument 'if_exists' in 'DataFrame.to_sql' API. #60376

feat: Implement option 'delete_rows' of argument 'if_exists' in 'DataFrame.to_sql' API. #60376

Uh oh!

Conversation

gmcrocetti commented Nov 20, 2024

Uh oh!

gmcrocetti commented Nov 20, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gmcrocetti commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gmcrocetti commented Feb 12, 2025

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gmcrocetti Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WillAyd Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

mroeschke Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mroeschke commented Feb 19, 2025

Uh oh!

gmcrocetti commented Feb 19, 2025

Uh oh!

Uh oh!

gmcrocetti commented Jan 23, 2025 •

edited

Loading

gmcrocetti Feb 18, 2025 •

edited

Loading

WillAyd Feb 18, 2025 •

edited

Loading

mroeschke Feb 19, 2025 •

edited

Loading