Skip to content

feat(pyspark)!: forward kwargs in create_table to pyspark methods #11120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

jakepenzak
Copy link
Contributor

@jakepenzak jakepenzak commented Apr 13, 2025

Description of changes

  • Follow-up on feat(pyspark): expose merge_schema option in create_table #11071

  • Forward kwargs in create_table method to respective pyspark methods - pyspark.sql.DataFrameWriter.saveAsTable
    if obj is passed or pyspark.sql.Catalog.createTable if schema is passed.

    • If mode kwarg is passed, it will take precedence over overwrite.
  • Changes are to improve api consistency with to_delta and to_parquet for pyspark backend, while increasing flexibility of create_table method in correspondence with pyspark.sql.DataFrameWriter.saveAsTable

  • Breaking: We removed partition_by and format kwargs of create_table in favor
    of forwarding kwargs directly to pyspark methods. This is breaking for users of partition_by kwarg in create_table, which will now need to be passed as partitionBy

  • Testing: test_create_table_kwargs demonstrates & tests some general usage patterns of create_table method. Note, in example 2, when append is passed with an input table of same schema as db table, this behavior will be similar to that of insert method.

Issues closed

…k methods

Forward kwargs in `create_table` method to respective pypsark methods.
Also, we removed `partition_by` and `format` kwargs of `create_table` in favor
of forwarding kwargs directly to pyspark methods. This is consistent
with `to_delta` and `to_parquet` for pyspark backend.

This is breaking for users of `partition_by` kwarg only in `create_table`,
which will now need to be passed as `partitionBy`.

closes ibis-project#10984
@github-actions github-actions bot added tests Issues or PRs related to tests pyspark The Apache PySpark backend labels Apr 13, 2025
Copy link
Contributor

ACTION NEEDED

Ibis follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message.

Please update your PR title and description to match the specification.

See https://github.com/ibis-project/ibis/blob/main/.releaserc.js
for the list of acceptable prefixes, eg "feat:", "fix:", "chore:", etc.

The commitlint output is:

⧗   input: test(pyspark): Fix tests
✖   subject must not be sentence-case, start-case, pascal-case, upper-case [subject-case]

✖   found 1 problems, 0 warnings
ⓘ   Get help: https://github.com/conventional-changelog/commitlint/#what-is-commitlint

@jakepenzak
Copy link
Contributor Author

jakepenzak commented Apr 13, 2025

Still running into testing issues w/ spark connect. Looks like it is an issue specific to delta. I am looking into it, but lmk if anything is immediately obvious.

Update: See #11123 for working solution on testing issues

@cpcloud
Copy link
Member

cpcloud commented Apr 14, 2025

+1, but let's cut one more feature release (10.5) before shipping this PR in 11.0!

Appreciate the work here!

@gforsyth gforsyth added this to the 11.0 milestone Apr 14, 2025
cpcloud added a commit that referenced this pull request Apr 14, 2025
…#11123)

## Description of changes

1. Upgrade `delta-spark` from v3.2.1 to
[v3.3.0](https://github.com/delta-io/delta/releases/tag/v3.3.0) for
pyspark backend testing in github actions
2. Upgrade `delta-spark` [maven
package](https://mvnrepository.com/artifact/io.delta/delta-spark) from
v.2.1.0 to v3.3.0 in spark-connect container 🐳
- Ensure consistency w/ local pyspark testing &
[compatibility](https://docs.delta.io/latest/releases.html#compatibility-with-apache-spark)
with pyspark v3.5.5
3. Upgrade spark-connect configuration to enable proper delta & catalog
functionality

----
- **2** + **3** together resolved spark-connect testing issues from
#11120

---------

Co-authored-by: Phillip Cloud <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pyspark The Apache PySpark backend tests Issues or PRs related to tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: add **options to pyspark backend create_table
3 participants