Skip to content

Commit 5020701

Browse files
Liam BranniganLiam Brannigan
Liam Brannigan
authored and
Liam Brannigan
committed
Update merge
1 parent 6717d3c commit 5020701

File tree

1 file changed

+6
-8
lines changed

1 file changed

+6
-8
lines changed

docs/usage/working-with-partitions.md

+6-8
Original file line numberDiff line numberDiff line change
@@ -135,12 +135,12 @@ print(pdf)
135135

136136
## Updating Partitioned Tables with Merge
137137

138-
You can perform merge operations on partitioned tables in the same way you do on non-partitioned ones. If only a subset of existing partitions need to be read then provide a matching predicate that references the partition columns represented in the source data. The predicate then allows `deltalake` to skip reading the partitions not referenced by the predicate.
138+
You can perform merge operations on partitioned tables in the same way you do on non-partitioned ones. If only a subset of existing partitions are present in the source (i.e. new) data then `deltalake` can skip reading the partitions not present in the source data. You can do this by providing a predicate that specifies which partition values are in the source data.
139139

140-
This example shows a merge operation that checks both the partition column (`"country"`) and another column (`"num"`) when merging:
141-
- The merge condition (predicate) matches target rows where both "country" and "num" align with the source.
140+
This example shows an upsert merge operation:
141+
- The merge condition (`predicate`) matches rows between source and target based on the partition column and specifies which partitions are present in the source data
142142
- If a match is found between a source row and a target row, the `"letter"` column is updated with the source data
143-
- Otherwise if no match is found for a source row it inserts the new row, creating a new partition if necessary
143+
- Otherwise if no match is found for a source row then the row is inserted, creating a new partition if necessary
144144

145145
```python
146146
dt = DeltaTable("tmp/partitioned-table")
@@ -150,7 +150,7 @@ source_data = pd.DataFrame({"num": [1, 101], "letter": ["A", "B"], "country": ["
150150
(
151151
dt.merge(
152152
source=source_data,
153-
predicate="target.country = source.country AND target.num = source.num",
153+
predicate="target.country = source.country AND target.country in ('US','CH')",
154154
source_alias="source",
155155
target_alias="target"
156156
)
@@ -170,15 +170,13 @@ print(pdf)
170170
num letter country
171171
0 101 B CH
172172
1 1 A US
173-
2 2 b US
173+
2 2 A US
174174
3 900 m DE
175175
4 1000 n DE
176176
5 10 x CA
177177
6 3 c CA
178178
```
179179

180-
This approach ensures that only rows in the relevant partition ("US") are processed, keeping operations efficient.
181-
182180
## Deleting Partition Data
183181

184182
You may want to delete all rows from a specific partition. For example:

0 commit comments

Comments
 (0)