|
| 1 | +--- |
| 2 | +title: "PK Clustering Override" |
| 3 | +weight: 10 |
| 4 | +type: docs |
| 5 | +aliases: |
| 6 | +- /primary-key-table/pk-clustering-override.html |
| 7 | +--- |
| 8 | +<!-- |
| 9 | +Licensed to the Apache Software Foundation (ASF) under one |
| 10 | +or more contributor license agreements. See the NOTICE file |
| 11 | +distributed with this work for additional information |
| 12 | +regarding copyright ownership. The ASF licenses this file |
| 13 | +to you under the Apache License, Version 2.0 (the |
| 14 | +"License"); you may not use this file except in compliance |
| 15 | +with the License. You may obtain a copy of the License at |
| 16 | +
|
| 17 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 18 | +
|
| 19 | +Unless required by applicable law or agreed to in writing, |
| 20 | +software distributed under the License is distributed on an |
| 21 | +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| 22 | +KIND, either express or implied. See the License for the |
| 23 | +specific language governing permissions and limitations |
| 24 | +under the License. |
| 25 | +--> |
| 26 | + |
| 27 | +# PK Clustering Override |
| 28 | + |
| 29 | +By default, data files in a primary key table are physically sorted by the primary key. This is optimal for point |
| 30 | +lookups but can hurt scan performance when queries filter on non-primary-key columns. |
| 31 | + |
| 32 | +**PK Clustering Override** mode changes the physical sort order of data files from the primary key to user-specified |
| 33 | +clustering columns. This significantly improves scan performance for queries that filter or group by clustering columns, |
| 34 | +while still maintaining primary key uniqueness through deletion vectors. |
| 35 | + |
| 36 | +## Quick Start |
| 37 | + |
| 38 | +```sql |
| 39 | +CREATE TABLE my_table ( |
| 40 | + id BIGINT, |
| 41 | + dt STRING, |
| 42 | + city STRING, |
| 43 | + amount DOUBLE, |
| 44 | + PRIMARY KEY (id) NOT ENFORCED |
| 45 | +) WITH ( |
| 46 | + 'pk-clustering-override' = 'true', |
| 47 | + 'clustering.columns' = 'city', |
| 48 | + 'deletion-vectors.enabled' = 'true', |
| 49 | + 'bucket' = '4' |
| 50 | +); |
| 51 | +``` |
| 52 | + |
| 53 | +After this, data files within each bucket will be physically sorted by `city` instead of `id`. Queries like |
| 54 | +`SELECT * FROM my_table WHERE city = 'Beijing'` can skip irrelevant data files by checking their min/max statistics |
| 55 | +on the clustering column. |
| 56 | + |
| 57 | +## How It Works |
| 58 | + |
| 59 | +PK Clustering Override replaces the default LSM compaction with a two-phase clustering compaction: |
| 60 | + |
| 61 | +**Phase 1 — Sort by Clustering Columns**: Newly flushed (level 0) files are read, sorted by the configured clustering |
| 62 | +columns, and rewritten as sorted (level 1) files. A key index tracks each primary key's file and row position to |
| 63 | +maintain uniqueness. |
| 64 | + |
| 65 | +**Phase 2 — Merge Overlapping Sections**: Sorted files are grouped into sections based on clustering column range |
| 66 | +overlap. Overlapping sections are merged together. Adjacent small sections are also consolidated to reduce file count |
| 67 | +and IO amplification. Non-overlapping large files are left untouched. |
| 68 | + |
| 69 | +During both phases, deduplication is handled via deletion vectors: |
| 70 | + |
| 71 | +- **Deduplicate mode**: When a key already exists in an older file, the old row is marked as deleted. |
| 72 | +- **First-row mode**: When a key already exists, the new row is marked as deleted, keeping the first-seen value. |
| 73 | + |
| 74 | +When the number of files to merge exceeds `sort-spill-threshold`, smaller files are first spilled to row-based |
| 75 | +temporary files to reduce memory consumption, preventing OOM during multi-way merge. |
| 76 | + |
| 77 | +## Requirements |
| 78 | + |
| 79 | +| Option | Requirement | |
| 80 | +|--------|-------------| |
| 81 | +| `pk-clustering-override` | `true` | |
| 82 | +| `clustering.columns` | Must be set (one or more non-primary-key columns) | |
| 83 | +| `deletion-vectors.enabled` | Must be `true` | |
| 84 | +| `merge-engine` | `deduplicate` (default) or `first-row` only | |
| 85 | +| `sequence.fields` | Must **not** be set | |
| 86 | +| `record-level.expire-time` | Must **not** be set | |
| 87 | + |
| 88 | +## Related Options |
| 89 | + |
| 90 | +| Option | Default | Description | |
| 91 | +|--------|---------|-------------| |
| 92 | +| `clustering.columns` | (none) | Comma-separated column names used as the physical sort order for data files. | |
| 93 | +| `sort-spill-threshold` | (auto) | When the number of merge readers exceeds this value, smaller files are spilled to row-based temp files to reduce memory usage. | |
| 94 | +| `sort-spill-buffer-size` | `64 mb` | Buffer size used for external sort during Phase 1 rewrite. | |
| 95 | + |
| 96 | +## When to Use |
| 97 | + |
| 98 | +PK Clustering Override is beneficial when: |
| 99 | + |
| 100 | +- Analytical queries frequently filter or aggregate on non-primary-key columns (e.g., `WHERE city = 'Beijing'`). |
| 101 | +- The table uses `deduplicate` or `first-row` merge engine. |
| 102 | +- You want data files physically co-located by a business dimension rather than the primary key. |
| 103 | + |
| 104 | +It is **not** suitable when: |
| 105 | + |
| 106 | +- Point lookups by primary key are the dominant access pattern (default LSM sort is already optimal). |
| 107 | +- You need `partial-update` or `aggregation` merge engine. |
| 108 | +- `sequence.fields` or `record-level.expire-time` is required. |
0 commit comments