|
| 1 | +--- |
| 2 | +title: "PARTITION BY" |
| 3 | +description: "Specifying the sort order of the data." |
| 4 | +aliases: |
| 5 | + - /guides/partition-by/ |
| 6 | + - /sql/patterns/partition-by/ |
| 7 | +menu: |
| 8 | + main: |
| 9 | + parent: 'sql-patterns' |
| 10 | +--- |
| 11 | + |
| 12 | +A few types of Materialize collections are durably written to storage: [materialized views](/sql/create-materialized-view/), [tables](/sql/create-table), and [sources](/sql/create-source). |
| 13 | + |
| 14 | +Internally, Materialize stores these durable collections in an [LSM-tree](https://en.wikipedia.org/wiki/Log-structured_merge-tree)-like structure. Each collection is made up of a set of |
| 15 | +**runs** of data, each run is sorted and then split up into individual **parts**, and those parts are written to object storage and retrieved only when necessary to satisfy a query. Materialize will also periodically **compact** the data it stores to consolidate small parts into larger ones or discard deleted rows. |
| 16 | + |
| 17 | +Materialize lets you specify the ordering it will use to sort these runs of data internally. A well-chosen sort order can unlock optimizations like [filter pushdown](#filter-pushdown), which in turn can make queries and other operations more efficient. |
| 18 | + |
| 19 | +## Syntax |
| 20 | + |
| 21 | +The option `PARTITION BY <column list>` declares that a materialized view, table, or source table should be ordered by the listed columns. For example, a table that stores an append-only collection of events may look like: |
| 22 | + |
| 23 | +```mzsql |
| 24 | +CREATE TABLE events (timestamp created_at, jsonb body) |
| 25 | +WITH ( |
| 26 | + PARTITION BY (created_at) |
| 27 | +); |
| 28 | +``` |
| 29 | + |
| 30 | +When multiple columns are specified, rows are sorted lexicographically: |
| 31 | +`PARTITION BY (created_date, created_time)` would sort first by the created date, then order rows with equal created date by time. Durable collections without a `PARTITION BY` option might be stored in any order. |
| 32 | + |
| 33 | +Note that this declaration has no impact on the order in which records are returned by queries. If you want to return results in a specific order, use an |
| 34 | +`ORDER BY` clause on your select statement. |
| 35 | + |
| 36 | +## Filter pushdown |
| 37 | + |
| 38 | +As mentioned above, Materialize stores durable collections as a set of parts. If it can prove that a particular part in a collection is not needed to answer a particular query, it can skip fetching it, saving a substantial amount of time and computation. For example: |
| 39 | + |
| 40 | +- If a materialized view has a temporal filter that preserves only the last two days of data, we'd like to filter out parts that contain data from a month ago. |
| 41 | +- If a select statement filters for only data from a particular country, we'd like to avoid fetching parts that only contain data from other countries. |
| 42 | + |
| 43 | +This optimization tends to be important for append-only timeseries datasets, but it can be useful for any dataset where most queries look at only a particular "range" of the data. If the query is a select statement, it can make that select statement much quicker to return; if the query is an index or a materialized view, it can make it much faster to bootstrap. |
| 44 | + |
| 45 | +Materialize will always try to filter out parts in this way, but that filtering is usually only effective when similar rows are stored together. If you want to make sure that filter pushdown reliably kicks in for your query, you should: |
| 46 | + |
| 47 | +- Use a |
| 48 | + `PARTITION BY` clause on the relevant column to ensure that data with similar values for that column are stored together. |
| 49 | +- Add a filter to your query that only returns true for a particular range of values in that column. |
| 50 | + |
| 51 | +Filters that consist of arithmetic, date math, and comparisons are generally eligible for pushdown, including all the examples in this page. However, more complex filters might not be. You can check whether the filters in your query can be pushed down by using [the |
| 52 | +`filter_pushdown` option](/sql/explain-plan/#output-modifiers) in an `EXPLAIN` statement. |
| 53 | + |
| 54 | +Some common functions, such as casting from a string to a timestamp, can prevent filter pushdown for a query. For similar functions that _do_ allow pushdown, see [the pushdown functions documentation](/sql/functions/pushdown/). |
| 55 | + |
| 56 | +## Requirements |
| 57 | + |
| 58 | +Materialize imposes some restrictions on the list of columns in the partition-by clause. |
| 59 | + |
| 60 | +- This clause must list a prefix of the columns in the collection. For example, if you're creating a table that partitions by a single column, that column must be the first column in the table's schema definition. |
| 61 | +- Only certain types of columns are supported. This includes: |
| 62 | + - all fixed-width integer types, including `smallint`, `integer`, and `bigint`; |
| 63 | + - date and time types, including `date`, `time`, `timestamp`, `timestamptz`, and `mz_timestamp`; |
| 64 | + - string types like `text` and `bytea`; |
| 65 | + - `boolean` and `uuid`; |
| 66 | + - `record` types where all fields types are supported. |
| 67 | + |
| 68 | +We intend to relax some of these restrictions in the future. |
| 69 | + |
| 70 | +## Examples |
| 71 | + |
| 72 | +These examples create real objects. After you have tried the examples, make sure to drop these objects and spin down any resources you may have created. |
| 73 | + |
| 74 | +### Partitioning by timestamp |
| 75 | + |
| 76 | +For timeseries or "event"-type collections, it's often useful to partition the data by timestamp. |
| 77 | + |
| 78 | +1. First, create a table called `events`. |
| 79 | + ```mzsql |
| 80 | + -- Create a table of timestamped events. Note that the `event_ts` column is |
| 81 | + -- first in the column list and in the parition-by clause. |
| 82 | + CREATE TABLE events ( |
| 83 | + event_ts timestamp, |
| 84 | + content text |
| 85 | + ) WITH ( |
| 86 | + PARTITION BY (event_ts) |
| 87 | + ); |
| 88 | + ``` |
| 89 | +
|
| 90 | +1. Insert a few records, one "older" record and one more recent. |
| 91 | + ```mzsql |
| 92 | + INSERT INTO events VALUES (now()::timestamp - '1 minute', 'hello'); |
| 93 | + INSERT INTO events VALUES (now(), 'world'); |
| 94 | + ``` |
| 95 | +
|
| 96 | +1. If we run a select statement against the data sometime in the next couple of minutes, we return one row but not the other. |
| 97 | + ```mzsql |
| 98 | + SELECT * FROM events WHERE event_ts > mz_now() - '2 minutes'; |
| 99 | + ``` |
| 100 | +
|
| 101 | +1. An `EXPLAIN FILTER PUSHDOWN` statement shows that we're able to fetch a subset of the parts in our collection... only those parts that contain data with recent timestamps. |
| 102 | + ```mzsql |
| 103 | + EXPLAIN FILTER PUSHDOWN FOR |
| 104 | + SELECT * FROM events WHERE event_ts + '2 minutes' > mz_now(); |
| 105 | + ``` |
| 106 | +
|
| 107 | +If you wait a few minutes longer until there are no events that match the temporal filter, you'll notice that not only does the query return zero rows, but the explain shows that we fetched zero parts. |
| 108 | +
|
| 109 | +Note that the exact numbers you see here may very: parts can be much larger than a single row, and the actual level of filtering may vary for small datasets as data is compacted together internally. However, datasets of a few gigabytes or larger should reliably see benefits from this optimization. |
| 110 | +
|
| 111 | +### Partitioning by primary key |
| 112 | +
|
| 113 | +Other datasets don't have a strong timeseries component, but they do have a clear notion of type or category. For example, let's suppose we manage a collection of music venues spread across the world, but regularly want to target queries to just those that exist in a single country. |
| 114 | +
|
| 115 | +1. First, create a table called `venues`, partitioned by country. |
| 116 | + ```mzsql |
| 117 | + -- Create a table for our venue data. |
| 118 | + -- Once again, the partition column is listed first. |
| 119 | + CREATE TABLE venues ( |
| 120 | + country_code text, |
| 121 | + id uuid, |
| 122 | + ) WITH ( |
| 123 | + PARTITION BY (country_code) |
| 124 | + ); |
| 125 | + ``` |
| 126 | +
|
| 127 | +1. Insert a few records, one "older" record and one more recent. |
| 128 | + ```mzsql |
| 129 | + INSERT INTO venues VALUES ('US', uuid_generate_v5('venue', 'Rock World')); |
| 130 | + INSERT INTO venues VALUES ('CA', uuid_generate_v5('venue', 'Friendship Cove')); |
| 131 | + ``` |
| 132 | +
|
| 133 | +1. We can filter down our list of venues to just those in a specific country. |
| 134 | + ```mzsql |
| 135 | + SELECT * FROM venues WHERE country_code = 'US'; |
| 136 | + ``` |
| 137 | +
|
| 138 | +1. An `EXPLAIN FILTER PUSHDOWN` statement shows that we're only fetching a subset of parts we've stored. |
| 139 | + ```mzsql |
| 140 | + EXPLAIN FILTER PUSHDOWN FOR |
| 141 | + SELECT * FROM venues WHERE country_code = 'US'; |
| 142 | + ``` |
| 143 | +
|
| 144 | +As before, we're not guaranteed to see much or any benefit from filter pushdown on small collections... but for datasets of over a few gigabytes, we should reliably be able to filter down to a subset of the parts we'd otherwise need to fetch. |
0 commit comments