Skip to content

MERGE should raise when multiple source rows match target row #2407

Open
@mrjsj

Description

@mrjsj

Environment

Delta-rs version: 0.16.4

Binding: Python 0.16.4

Environment:

  • Cloud provider: None
  • OS: MacOS Sonoma 14.4
  • Other:

Bug

What happened:
TableMerger.when_matched_update_all() inserts records when there is a match on multiple source columns
TableMerger.when_matched_update() inserts records when there is a match on multiple source columns

What you expected to happen:
TableMerger.when_matched_update_all() should throw an error if a target record matches multiple source records
TableMerger.when_matched_update() should throw an error if a target record matches multiple source records

They should never insert new records.

How to reproduce it:
Using polars 0.20.7

import polars as pl
from deltalake import DeltaTable

base_df = pl.DataFrame(
    {
        "id": [1, 2],
        "attr": ["x", "y"]
    }
)

base_df.write_delta("./test_cdc", mode="overwrite")

dt = DeltaTable("./test_cdc")

print(pl.DataFrame(dt.to_pyarrow_table()))

cdc_df = pl.DataFrame(
    {
        "id": [1,1,1,2,2],
        "attr": ["a","b","c","d","e"],
        "op": ["U", "U", "U", "U", "U"]
    }
)


(
    dt.merge(
        cdc_df.to_arrow(),
        "s.id = t.id",
        source_alias="s",
        target_alias="t",
    )
    .when_matched_update(
        updates={"t.attr": "s.attr"},
        predicate="s.op = 'U'")
    .execute()
)

print(pl.DataFrame(dt.to_pyarrow_table()))

Gives the following output

base table

shape: (2, 2)
┌─────┬──────┐
│ id  ┆ attr │
│ --- ┆ ---  │
│ i64 ┆ str  │
╞═════╪══════╡
│ 1   ┆ x    │
│ 2   ┆ y    │
└─────┴──────┘

After TableMerger is executed

shape: (5, 2)
┌─────┬──────┐
│ id  ┆ attr │
│ --- ┆ ---  │
│ i64 ┆ str  │
╞═════╪══════╡
│ 2   ┆ d    │
│ 2   ┆ e    │
│ 1   ┆ a    │
│ 1   ┆ b    │
│ 1   ┆ c    │
└─────┴──────┘

More details:
Info on the specific case in slack: https://delta-users.slack.com/archives/C013LCAEB98/p1712673309723829

Metadata

Metadata

Assignees

No one assigned

    Labels

    binding/pythonIssues for the Python packagebinding/rustIssues for the Rust cratebugSomething isn't workinggood first issueGood for newcomershelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions