-
Notifications
You must be signed in to change notification settings - Fork 638
feat(pyspark)!: forward kwargs in create_table
to pyspark methods
#11120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(pyspark)!: forward kwargs in create_table
to pyspark methods
#11120
Conversation
…k methods Forward kwargs in `create_table` method to respective pypsark methods. Also, we removed `partition_by` and `format` kwargs of `create_table` in favor of forwarding kwargs directly to pyspark methods. This is consistent with `to_delta` and `to_parquet` for pyspark backend. This is breaking for users of `partition_by` kwarg only in `create_table`, which will now need to be passed as `partitionBy`. closes ibis-project#10984
ACTION NEEDED Ibis follows the Conventional Commits specification for release automation. The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. See https://github.com/ibis-project/ibis/blob/main/.releaserc.js The commitlint output is:
|
Still running into testing issues w/ spark connect. Looks like it is an issue specific to delta. I am looking into it, but lmk if anything is immediately obvious. Update: See #11123 for working solution on testing issues |
+1, but let's cut one more feature release (10.5) before shipping this PR in 11.0! Appreciate the work here! |
…#11123) ## Description of changes 1. Upgrade `delta-spark` from v3.2.1 to [v3.3.0](https://github.com/delta-io/delta/releases/tag/v3.3.0) for pyspark backend testing in github actions 2. Upgrade `delta-spark` [maven package](https://mvnrepository.com/artifact/io.delta/delta-spark) from v.2.1.0 to v3.3.0 in spark-connect container 🐳 - Ensure consistency w/ local pyspark testing & [compatibility](https://docs.delta.io/latest/releases.html#compatibility-with-apache-spark) with pyspark v3.5.5 3. Upgrade spark-connect configuration to enable proper delta & catalog functionality ---- - **2** + **3** together resolved spark-connect testing issues from #11120 --------- Co-authored-by: Phillip Cloud <[email protected]>
Description of changes
Follow-up on feat(pyspark): expose merge_schema option in create_table #11071
Forward kwargs in
create_table
method to respective pyspark methods - pyspark.sql.DataFrameWriter.saveAsTableif
obj
is passed or pyspark.sql.Catalog.createTable ifschema
is passed.mode
kwarg is passed, it will take precedence overoverwrite
.Changes are to improve api consistency with
to_delta
andto_parquet
for pyspark backend, while increasing flexibility ofcreate_table
method in correspondence with pyspark.sql.DataFrameWriter.saveAsTableBreaking: We removed
partition_by
andformat
kwargs ofcreate_table
in favorof forwarding kwargs directly to pyspark methods. This is breaking for users of
partition_by
kwarg increate_table
, which will now need to be passed aspartitionBy
Testing:
test_create_table_kwargs
demonstrates & tests some general usage patterns ofcreate_table
method. Note, in example 2, whenappend
is passed with an input table of same schema as db table, this behavior will be similar to that ofinsert
method.create_table
&pyspark.sql.DataFrameWriter.saveAsTable
andinsert
andpyspark.sql.DataFrameWriter.insertInto
for pyspark usersIssues closed