Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] Try to add python table provider interface #264

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

CrystalZhou0529
Copy link

@CrystalZhou0529 CrystalZhou0529 commented Mar 16, 2025

Expose table providers in Python via FFI

Summary

  1. Created python subcrate for python-related development
  2. Reorganized Cargo dependency and share common dependencies (arrow, datafusion, duckdb) in workspace cargo.toml
  3. Implemented sqlite and duckdb table provider interfaces and tested in examples
  4. Updated duckdbconn.rs to start new tokio runtime when triggered from PyO3
  5. Updated Github CICD testing command

Testing

  • Verified that all existing tests in Rust have passed
  • Verified that Python examples can execute correctly via the following script:
cd python
maturin develop
cd examples
python3 duckdb_demo.py

Follow-up works

  • Improve duckdb enum interface (ongoing)

The rest will be left for future works:

  • Python ODBC table provider + example
  • Python Postgres table provider + example
  • Python MySQL table provider + example
  • Integration testing
  • Finalize Python interface design, i.e. new_memory() or new(":memory:")

Copy link

@timsaucer timsaucer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks very nice. I suggest we merge this in but do not yet publish to pypi. There should be no breaking changes to the rust code, so anyone who depends on it should be okay. I think we will need to bump the minor version number if you haven't done so (I didn't check).

@phillipleblanc Would you be comfortable if we merged in partially complete work on the python side so we can have a series of smaller PRs until we're ready to publish?

core/Cargo.toml Outdated
Comment on lines 11 to 19
arrow = { workspace = true }
arrow-array = { version = "54.2.1", optional = true }
arrow-flight = { version = "54.2.1", optional = true, features = [
"flight-sql-experimental",
"tls",
] }
arrow-schema = { version = "54.2.1", optional = true, features = ["serde"] }
arrow-json = "54.2.1"
arrow-odbc = { version = "=15.1.1", optional = true }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend we put all of the arrow dependencies in one place - the workspace Cargo.toml. That way when someone goes to update them, they don't miss one easily.

core/Cargo.toml Outdated
Comment on lines 31 to 32
datafusion = { version = "45", default-features = false }
datafusion-expr = { version = "45", optional = true }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment as above, recommend putting all datafusion dependencies in workspace Cargo.toml

Comment on lines +1 to +8
[package]
name = "datafusion-table-providers-python"
version = { workspace = true }
readme = { workspace = true }
edition = { workspace = true }
repository = { workspace = true }
license = { workspace = true }
description = { workspace = true }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I'm not sure about - do we want this crate (the python crate) published to crates.io or not? @phillipleblanc do you have an opinion? I don't expect there would be a reason for someone to need it as a dependency.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not publish it for now - if someone needs it later we can always publish it then.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! @CrystalZhou0529 we want a publish = false line like in this example from datafusion: https://github.com/apache/datafusion/blob/main/datafusion-examples/Cargo.toml

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

"Programming Language :: Python",
"Programming Language :: Rust",
]
dependencies = ["datafusion>=43.0.0"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you double check which version of datafusion-python first has the tokio runtime? I think it might be 44. It will be important.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the commit where the tokio runtimes were added on the FFI side: apache/datafusion#13937 and that is in datafusion 45. I think we need that version as the minimum.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@phillipleblanc
Copy link
Collaborator

Overall, this looks very nice. I suggest we merge this in but do not yet publish to pypi. There should be no breaking changes to the rust code, so anyone who depends on it should be okay. I think we will need to bump the minor version number if you haven't done so (I didn't check).

@phillipleblanc Would you be comfortable if we merged in partially complete work on the python side so we can have a series of smaller PRs until we're ready to publish?

Yes, that works for me - just mark this PR as ready to review and tag me when you are ready to merge. Thanks for working on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants