-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] Try to add python table provider interface #264
base: main
Are you sure you want to change the base?
[Draft] Try to add python table provider interface #264
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks very nice. I suggest we merge this in but do not yet publish to pypi. There should be no breaking changes to the rust code, so anyone who depends on it should be okay. I think we will need to bump the minor version number if you haven't done so (I didn't check).
@phillipleblanc Would you be comfortable if we merged in partially complete work on the python side so we can have a series of smaller PRs until we're ready to publish?
core/Cargo.toml
Outdated
arrow = { workspace = true } | ||
arrow-array = { version = "54.2.1", optional = true } | ||
arrow-flight = { version = "54.2.1", optional = true, features = [ | ||
"flight-sql-experimental", | ||
"tls", | ||
] } | ||
arrow-schema = { version = "54.2.1", optional = true, features = ["serde"] } | ||
arrow-json = "54.2.1" | ||
arrow-odbc = { version = "=15.1.1", optional = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend we put all of the arrow dependencies in one place - the workspace Cargo.toml. That way when someone goes to update them, they don't miss one easily.
core/Cargo.toml
Outdated
datafusion = { version = "45", default-features = false } | ||
datafusion-expr = { version = "45", optional = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment as above, recommend putting all datafusion dependencies in workspace Cargo.toml
[package] | ||
name = "datafusion-table-providers-python" | ||
version = { workspace = true } | ||
readme = { workspace = true } | ||
edition = { workspace = true } | ||
repository = { workspace = true } | ||
license = { workspace = true } | ||
description = { workspace = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I'm not sure about - do we want this crate (the python crate) published to crates.io or not? @phillipleblanc do you have an opinion? I don't expect there would be a reason for someone to need it as a dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not publish it for now - if someone needs it later we can always publish it then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! @CrystalZhou0529 we want a publish = false
line like in this example from datafusion: https://github.com/apache/datafusion/blob/main/datafusion-examples/Cargo.toml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
python/pyproject.toml
Outdated
"Programming Language :: Python", | ||
"Programming Language :: Rust", | ||
] | ||
dependencies = ["datafusion>=43.0.0"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you double check which version of datafusion-python first has the tokio runtime? I think it might be 44. It will be important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly it is in 43? Ref: https://github.com/apache/datafusion-python/blob/b8dd97bc8eefcfecfa8dcc864c4898c654b236a9/dev/changelog/pre-43.0.0.md?plain=1#L53
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the commit where the tokio runtimes were added on the FFI side: apache/datafusion#13937 and that is in datafusion 45. I think we need that version as the minimum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
Yes, that works for me - just mark this PR as ready to review and tag me when you are ready to merge. Thanks for working on this! |
Expose table providers in Python via FFI
Summary
duckdbconn.rs
to start new tokio runtime when triggered from PyO3Testing
Follow-up works
The rest will be left for future works:
new_memory()
ornew(":memory:")