Skip to content

Commit fb753f7

Browse files
committed
First draft of PARTITION BY documentation
1 parent 37dd04e commit fb753f7

File tree

1 file changed

+144
-0
lines changed

1 file changed

+144
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
---
2+
title: "PARTITION BY"
3+
description: "Specifying the sort order of the data."
4+
aliases:
5+
- /guides/partition-by/
6+
- /sql/patterns/partition-by/
7+
menu:
8+
main:
9+
parent: 'sql-patterns'
10+
---
11+
12+
A few types of Materialize collections are durably written to storage: [materialized views](/sql/create-materialized-view/), [tables](/sql/create-table), and [sources](/sql/create-source).
13+
14+
Internally, Materialize stores these durable collections in an [LSM-tree](https://en.wikipedia.org/wiki/Log-structured_merge-tree)-like structure. Each collection is made up of a set of
15+
**runs** of data, each run is sorted and then split up into individual **parts**, and those parts are written to object storage and retrieved only when necessary to satisfy a query. Materialize will also periodically **compact** the data it stores to consolidate small parts into larger ones or discard deleted rows.
16+
17+
Materialize lets you specify the ordering it will use to sort these runs of data internally. A well-chosen sort order can unlock optimizations like [filter pushdown](#filter-pushdown), which in turn can make queries and other operations more efficient.
18+
19+
## Syntax
20+
21+
The option `PARTITION BY <column list>` declares that a materialized view, table, or source table should be ordered by the listed columns. For example, a table that stores an append-only collection of events may look like:
22+
23+
```mzsql
24+
CREATE TABLE events (timestamp created_at, jsonb body)
25+
WITH (
26+
PARTITION BY (created_at)
27+
);
28+
```
29+
30+
When multiple columns are specified, rows are sorted lexicographically:
31+
`PARTITION BY (created_date, created_time)` would sort first by the created date, then order rows with equal created date by time. Durable collections without a `PARTITION BY` option might be stored in any order.
32+
33+
Note that this declaration has no impact on the order in which records are returned by queries. If you want to return results in a specific order, use an
34+
`ORDER BY` clause on your select statement.
35+
36+
## Filter pushdown
37+
38+
As mentioned above, Materialize stores durable collections as a set of parts. If it can prove that a particular part in a collection is not needed to answer a particular query, it can skip fetching it, saving a substantial amount of time and computation. For example:
39+
40+
- If a materialized view has a temporal filter that preserves only the last two days of data, we'd like to filter out parts that contain data from a month ago.
41+
- If a select statement filters for only data from a particular country, we'd like to avoid fetching parts that only contain data from other countries.
42+
43+
This optimization tends to be important for append-only timeseries datasets, but it can be useful for any dataset where most queries look at only a particular "range" of the data. If the query is a select statement, it can make that select statement much quicker to return; if the query is an index or a materialized view, it can make it much faster to bootstrap.
44+
45+
Materialize will always try to filter out parts in this way, but that filtering is usually only effective when similar rows are stored together. If you want to make sure that filter pushdown reliably kicks in for your query, you should:
46+
47+
- Use a
48+
`PARTITION BY` clause on the relevant column to ensure that data with similar values for that column are stored together.
49+
- Add a filter to your query that only returns true for a particular range of values in that column.
50+
51+
Filters that consist of arithmetic, date math, and comparisons are generally eligible for pushdown, including all the examples in this page. However, more complex filters might not be. You can check whether the filters in your query can be pushed down by using [the
52+
`filter_pushdown` option](/sql/explain-plan/#output-modifiers) in an `EXPLAIN` statement.
53+
54+
Some common functions, such as casting from a string to a timestamp, can prevent filter pushdown for a query. For similar functions that _do_ allow pushdown, see [the pushdown functions documentation](/sql/functions/pushdown/).
55+
56+
## Requirements
57+
58+
Materialize imposes some restrictions on the list of columns in the partition-by clause.
59+
60+
- This clause must list a prefix of the columns in the collection. For example, if you're creating a table that partitions by a single column, that column must be the first column in the table's schema definition.
61+
- Only certain types of columns are supported. This includes:
62+
- all fixed-width integer types, including `smallint`, `integer`, and `bigint`;
63+
- date and time types, including `date`, `time`, `timestamp`, `timestamptz`, and `mz_timestamp`;
64+
- string types like `text` and `bytea`;
65+
- `boolean` and `uuid`;
66+
- `record` types where all fields types are supported.
67+
68+
We intend to relax some of these restrictions in the future.
69+
70+
## Examples
71+
72+
These examples create real objects. After you have tried the examples, make sure to drop these objects and spin down any resources you may have created.
73+
74+
### Partitioning by timestamp
75+
76+
For timeseries or "event"-type collections, it's often useful to partition the data by timestamp.
77+
78+
1. First, create a table called `events`.
79+
```mzsql
80+
-- Create a table of timestamped events. Note that the `event_ts` column is
81+
-- first in the column list and in the parition-by clause.
82+
CREATE TABLE events (
83+
event_ts timestamp,
84+
content text
85+
) WITH (
86+
PARTITION BY (event_ts)
87+
);
88+
```
89+
90+
1. Insert a few records, one "older" record and one more recent.
91+
```mzsql
92+
INSERT INTO events VALUES (now()::timestamp - '1 minute', 'hello');
93+
INSERT INTO events VALUES (now(), 'world');
94+
```
95+
96+
1. If we run a select statement against the data sometime in the next couple of minutes, we return one row but not the other.
97+
```mzsql
98+
SELECT * FROM events WHERE event_ts > mz_now() - '2 minutes';
99+
```
100+
101+
1. An `EXPLAIN FILTER PUSHDOWN` statement shows that we're able to fetch a subset of the parts in our collection... only those parts that contain data with recent timestamps.
102+
```mzsql
103+
EXPLAIN FILTER PUSHDOWN FOR
104+
SELECT * FROM events WHERE event_ts + '2 minutes' > mz_now();
105+
```
106+
107+
If you wait a few minutes longer until there are no events that match the temporal filter, you'll notice that not only does the query return zero rows, but the explain shows that we fetched zero parts.
108+
109+
Note that the exact numbers you see here may very: parts can be much larger than a single row, and the actual level of filtering may vary for small datasets as data is compacted together internally. However, datasets of a few gigabytes or larger should reliably see benefits from this optimization.
110+
111+
### Partitioning by primary key
112+
113+
Other datasets don't have a strong timeseries component, but they do have a clear notion of type or category. For example, let's suppose we manage a collection of music venues spread across the world, but regularly want to target queries to just those that exist in a single country.
114+
115+
1. First, create a table called `venues`, partitioned by country.
116+
```mzsql
117+
-- Create a table for our venue data.
118+
-- Once again, the partition column is listed first.
119+
CREATE TABLE venues (
120+
country_code text,
121+
id uuid,
122+
) WITH (
123+
PARTITION BY (country_code)
124+
);
125+
```
126+
127+
1. Insert a few records, one "older" record and one more recent.
128+
```mzsql
129+
INSERT INTO venues VALUES ('US', uuid_generate_v5('venue', 'Rock World'));
130+
INSERT INTO venues VALUES ('CA', uuid_generate_v5('venue', 'Friendship Cove'));
131+
```
132+
133+
1. We can filter down our list of venues to just those in a specific country.
134+
```mzsql
135+
SELECT * FROM venues WHERE country_code = 'US';
136+
```
137+
138+
1. An `EXPLAIN FILTER PUSHDOWN` statement shows that we're only fetching a subset of parts we've stored.
139+
```mzsql
140+
EXPLAIN FILTER PUSHDOWN FOR
141+
SELECT * FROM venues WHERE country_code = 'US';
142+
```
143+
144+
As before, we're not guaranteed to see much or any benefit from filter pushdown on small collections... but for datasets of over a few gigabytes, we should reliably be able to filter down to a subset of the parts we'd otherwise need to fetch.

0 commit comments

Comments
 (0)