Fix read option name for change data feed for Delta [databricks]#13532
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR fixes an incorrect option name used when reading Delta tables with change data feed (CDF) functionality. The option name was changed from the invalid "readChangeDataFeed" to the correct "readChangeFeed", and schema validation assertions were added to verify the presence of required CDF columns.
- Fixed incorrect option name from "readChangeDataFeed" to "readChangeFeed" for Delta CDF reads
- Added schema validation assertions to ensure CDF columns are present in the returned DataFrame
- Improved code formatting and added explanatory comment about dropping the commit timestamp column
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Signed-off-by: Jihoon Son <ghoonson@gmail.com>
9613b65 to
8a08a74
Compare
|
build |
|
Does not delta_lake_write_test.py https://github.com/search?q=repo%3ANVIDIA%2Fspark-rapids%20readChangeDataFeed&type=code |
|
We should run CI with |
Good catch! Modified them to use the util function as well. Thanks. |
I'm confused. Does pre-merge CI run delta tests if |
|
build |
The blossom-ci failure looks like some intermittent issue. Re-triggering it. |
|
build |
I assumed yes based on but now that I see runit.sh I am not sure which script is run when. cc @pxLi ? |
|
build |
|
I xfailed the failing low shuffle merge tests and filed #13552 to fix them. @gerashegalov could you please have another look? |
Fixes #12796
Description
We are currently using "readChangeDataFeed" as a read option name to read a table with change data feed in our tests. However, this option name is invalid. We should use "readChangeFeed" instead. This PR fixes the option name and adds asserts to validate the schema of the data read.
I manually ran
delta_lake_delete_test.py,delta_lake_update_test.py, anddelta_lake_merge_test.py, which are impacted by this change.delta_lake_low_shuffle_merge_test.pyis also impacted as well, but it does not run with Delta 3.3. I used Spark 3.5.5 and Delta 3.3.0 for my testing.Checklists
(Please explain in the PR description how the new code paths are tested, such as names of the new/existing tests that cover them.)