Description
Environment
Delta-rs version: 0.16.4
Binding: Python 0.16.4
Environment:
- Cloud provider: None
- OS: MacOS Sonoma 14.4
- Other:
Bug
What happened:
TableMerger.when_matched_update_all() inserts records when there is a match on multiple source columns
TableMerger.when_matched_update() inserts records when there is a match on multiple source columns
What you expected to happen:
TableMerger.when_matched_update_all() should throw an error if a target record matches multiple source records
TableMerger.when_matched_update() should throw an error if a target record matches multiple source records
They should never insert new records.
How to reproduce it:
Using polars 0.20.7
import polars as pl
from deltalake import DeltaTable
base_df = pl.DataFrame(
{
"id": [1, 2],
"attr": ["x", "y"]
}
)
base_df.write_delta("./test_cdc", mode="overwrite")
dt = DeltaTable("./test_cdc")
print(pl.DataFrame(dt.to_pyarrow_table()))
cdc_df = pl.DataFrame(
{
"id": [1,1,1,2,2],
"attr": ["a","b","c","d","e"],
"op": ["U", "U", "U", "U", "U"]
}
)
(
dt.merge(
cdc_df.to_arrow(),
"s.id = t.id",
source_alias="s",
target_alias="t",
)
.when_matched_update(
updates={"t.attr": "s.attr"},
predicate="s.op = 'U'")
.execute()
)
print(pl.DataFrame(dt.to_pyarrow_table()))
Gives the following output
base table
shape: (2, 2)
┌─────┬──────┐
│ id ┆ attr │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪══════╡
│ 1 ┆ x │
│ 2 ┆ y │
└─────┴──────┘
After TableMerger is executed
shape: (5, 2)
┌─────┬──────┐
│ id ┆ attr │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪══════╡
│ 2 ┆ d │
│ 2 ┆ e │
│ 1 ┆ a │
│ 1 ┆ b │
│ 1 ┆ c │
└─────┴──────┘
More details:
Info on the specific case in slack: https://delta-users.slack.com/archives/C013LCAEB98/p1712673309723829