Skip to content

Commit a5d1766

Browse files
Liam BranniganLiam Brannigan
Liam Brannigan
authored and
Liam Brannigan
committed
Edits
1 parent e008bc8 commit a5d1766

File tree

1 file changed

+11
-23
lines changed

1 file changed

+11
-23
lines changed

docs/usage/working-with-partitions.md

+11-23
Original file line numberDiff line numberDiff line change
@@ -116,17 +116,22 @@ If you have a more fine-grained predicate than a partition filter, you can use t
116116

117117
## Updating Partitioned Tables with Merge
118118

119-
You can perform merge operations on partitioned tables in the same way you do on non-partitioned ones—simply provide a matching predicate that references partition columns if needed.
119+
You can perform merge operations on partitioned tables in the same way you do on non-partitioned ones. Simply provide a matching predicate that references partition columns if needed.
120+
121+
You can match on both the partition column (country) and some other condition. This example shows a merge operation that checks both the partition column (“country”) and a numeric column (“num”) when merging:
122+
- The table is partitioned by “country,” so underlying data is physically split by each country value.
123+
- The merge condition (predicate) matches target rows where both “country” and “num” align with the source.
124+
- When a match occurs, it updates “letter”; otherwise, it inserts the new row.
125+
- This approach ensures that only rows in the relevant partition (“US”) are processed, keeping operations efficient.
120126

121-
For example, you can match on both the partition column (country) and some other condition:
122127
```python
123128
from deltalake import DeltaTable
124129
import pyarrow as pa
125130

126131
dt = DeltaTable("tmp/partitioned-table")
127132

128-
# Source data referencing an existing partition "US"
129-
source_data = pa.table({"num": [100, 101], "letter": ["A", "B"], "country": ["US", "US"]})
133+
# New data that references an existing partition "US"
134+
source_data = pa.table({"num": [1, 101], "letter": ["A", "B"], "country": ["US", "US"]})
130135

131136
(
132137
dt.merge(
@@ -143,23 +148,6 @@ source_data = pa.table({"num": [100, 101], "letter": ["A", "B"], "country": ["US
143148
)
144149
```
145150

146-
If the partition does not exist (say for a new country value), a new partition folder will be created automatically.
147-
148-
(See more in the docs on merging tables.)
149-
150-
## Query Optimizations with Partitions
151-
152-
Partitions allow data skipping for queries that include the partition columns. For example, if your partition column is date, any query with a clause like WHERE date = '2023-01-01' or WHERE date >= '2023-01-01' AND date < '2023-01-10' can skip reading all files not in those partitions.
153-
154-
You can confirm partition-based skipping by:
155-
156-
```python
157-
dt = DeltaTable("path/to/table")
158-
df = dt.to_pandas(partitions=[("date", "=", "2023-01-01")])
159-
```
160-
Using pushdown predicates in DataFusion or DuckDB from Rust/Python.
161-
(See more details in the Querying Delta Tables docs.)
162-
163151
## Deleting Partition Data
164152

165153
You may want to delete all rows from a specific partition. For example:
@@ -169,13 +157,13 @@ dt = DeltaTable("tmp/partitioned-table")
169157
# Delete all rows from the 'US' partition:
170158
dt.delete("country = 'US'")
171159
```
172-
This command logically deletes the data by creating a new transaction. (See docs on deleting rows for more.)
160+
This command logically deletes the data by creating a new transaction.
173161

174162
## Maintaining Partitioned Tables
175163

176164
### Optimize & Vacuum
177165

178-
Partitioned tables can suffer from many small files if frequently appended to. If needed, you can run optimize compaction on a specific partition:
166+
Partitioned tables can accummulate many small files if a partition is frequently appended to. You can compact these into larger files on a specific partition:
179167
```python
180168
dt.optimize(partition_filters=[("country", "=", "US")])
181169
```

0 commit comments

Comments
 (0)