Skip to content

Commit 1d643a2

Browse files
committed
Implement encode/decode for ID columns + tests, docs
This PR allows users to specify the special value of 'func' for p_epoch to use custom functions to encode/decode time-ordered integers other than the classic seconds, ms, us or ns since the UNIX epoch. Resolves #729.
1 parent 949bf56 commit 1d643a2

15 files changed

+2819
-9
lines changed

doc/pg_partman.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -207,12 +207,12 @@ RETURNS boolean
207207
* An ACCESS EXCLUSIVE lock is taken on the parent table during the running of this function. No data is moved when running this function, so lock should be brief
208208
* A default partition and template table are created by default unless otherwise configured
209209
* `p_parent_table` - the existing parent table. MUST be schema qualified, even if in public schema
210-
* `p_control` - the column that the partitioning will be based on. Must be a time, integer, text or uuid based column. When control is of type text/uuid, p_time_encoder and p_time_decoder must be set.
210+
* `p_control` - the column that the partitioning will be based on. Must be a time, integer, text or UUID based column. When control is of type text/UUID, p_time_encoder and p_time_decoder must be set.
211211
* `p_interval` - the time or integer range interval for each partition. No matter the partitioning type, value must be given as text.
212212
+ *\<interval\>* - Any valid value for the interval data type. Do not type cast the parameter value, just leave as text.
213213
+ *\<integer\>* - For ID based partitions, the integer value range of the ID that should be set per partition. Enter this as an integer in text format ('100' not 100). If the interval is >=2, then the `p_type` must be `range`. If the interval equals 1, then the `p_type` must be `list`. Also note that while numeric values are supported for id-based partitioning, the interval must still be a whole number integer.
214214
* `p_type` - the type of partitioning to be done. Currently only **range** and **list** are supported. See `p_interval` parameter for special conditions concerning type.
215-
* `p_epoch` - tells `pg_partman` that the control column is an integer type, but actually represents and epoch time value. Valid values for this option are: 'seconds', 'milliseconds', 'microseconds', 'nanoseconds', and 'none'. The default is 'none'. All table names will be time-based. In addition to a normal index on the control column, be sure you create a functional, time-based index on the control column (to_timestamp(controlcolumn)) as well so this works efficiently.
215+
* `p_epoch` - tells `pg_partman` that the control column is an integer type, but actually represents an epoch time value or integer containing an encoded timestamp. Valid values for this option are: 'seconds', 'milliseconds', 'microseconds', 'nanoseconds', 'func', and 'none'. The default is 'none'. All table names will be time-based. For 'func', encode/decode functions between the integer type used and `timestamptz` are required. In addition to a normal index on the control column, be sure you create a functional, time-based index on the control column (to_timestamp(controlcolumn)) as well so this works efficiently.
216216
* `p_premake` - is how many additional partitions to always stay ahead of the current partition. Default value is 4. This will keep at minimum 5 partitions made, including the current one. For example, if today was Sept 6th, and `premake` was set to 4 for a daily partition, then partitions would be made for the 6th as well as the 7th, 8th, 9th and 10th. Note some intervals may occasionally cause an extra partition to be premade or one to be missed due to leap years, differing month lengths, etc. This usually won't hurt anything and should self-correct (see **About** section concerning timezones and non-UTC). If partitioning ever falls behind the `premake` value, normal running of `run_maintenance()` and data insertion should automatically catch things up.
217217
* `p_start_partition` - allows the first partition of a set to be specified instead of it being automatically determined. Must be a valid timestamp (for time-based) or positive integer (for id-based) value. Be aware, though, the actual parameter data type is text. For time-based partitioning, all partitions starting with the given timestamp up to CURRENT_TIMESTAMP (plus `premake`) will be created. For id-based partitioning, only the partition starting at the given value (plus `premake`) will be made. Note that for subpartitioning, this only applies during initial setup and not during ongoing maintenance.
218218
* `p_default_table` - boolean flag to determine whether a default table is created. Defaults to true.
@@ -222,8 +222,8 @@ RETURNS boolean
222222
* `p_jobmon` - allow `pg_partman` to use the `pg_jobmon` extension to monitor that partitioning is working correctly. Defaults to TRUE.
223223
* `p_date_trunc_interval` - By default, pg_partman's time-based partitioning will truncate the child table starting values to line up at the beginning of typical boundaries (midnight for daily, day 1 for monthly, Jan 1 for yearly, etc). If a partitioning interval that does not fall on those boundaries is desired, this option may be required to ensure the child table has the expected boundaries (especially if you also set `p_start_partition`). The valid values allowed for this parameter are the interval values accepted by PostgreSQL's built-in `date_trunc()` function (day, week, month, etc). For example, if you set a 9-week interval, by default pg_partman would truncate the tables by month (since the interval is greater than one month but less than 1 year) and unexpectedly start on the first of the month in some cases. Set this parameter value to `week`, so that the child table start values are properly truncated on a weekly basis to line up with the 9-week interval. If you are using a custom time interval, please experiment with this option to get the expected set of child tables you desire or use a more typical partitioning interval to simplify partition management.
224224
* `p_control_not_null` - By default, this value is true and the control column must be set to NOT NULL. Setting this to false allows the control column to be NULL. Allowing this is not advised without very careful review and an explicit use-case defined as it can cause excessive data in the DEFAULT child partition.
225-
* `p_time_encoder` - name of function that encodes a timestamp into a string representing your partition bounds. Setting this implicitly enables time based partitioning and is mandatory for text/uuid control column types. This enables partitioning tables using time based identifiers like uuidv7, ulid, snowflake ids and others. The function must handle NULL input safely. See test-time-daily.sql and test-uuid-daily for usage examples.
226-
* `p_time_decoder` - name of function that decodes a text/uuid control value into a timestamp. Setting this implicitly enables time based partitioning and is mandatory for text/uuid control column types. This enables partitioning tables using time based identifiers like uuidv7, ulid, snowflake ids and others. The function must handle NULL input safely. See test-time-daily.sql and test-uuid-daily for usage examples.
225+
* `p_time_encoder` - name of function that encodes a `timestamp` into a string or integer representing your partition bounds. Setting this implicitly enables time based partitioning and is mandatory for text/UUID control column types, or integer control column with `p_epoch` = 'func'. This enables partitioning tables using time based identifiers like UUIDv7, ULID, snowflake ids and others. The function must handle NULL input safely. See test-time-daily.sql and test-UUID-daily for usage examples.
226+
* `p_time_decoder` - name of function that decodes a text/UUID control value into a `timestamptz`. Setting this implicitly enables time based partitioning and is mandatory for text/UUID control column types, or integer control column with `p_epoch` = 'func'. This enables partitioning tables using time based identifiers like UUIDv7, ULID, snowflake ids and others. The function must handle NULL input safely. See test-time-daily.sql and test-uuid-daily for usage examples.
227227

228228

229229
<a id="create_sub_parent"></a>

doc/pg_partman_howto.md

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ Example Guide On Setting Up Native Partitioning
44
- [Simple Time Based: 1 Partition Per Day](#simple-time-based-1-partition-per-day)
55
- [Simple Time Based with UUIDv7 type: 1 Partition Per Day](#simple-time-based-with-uuidv7-type-1-partition-per-day)
66
- [Simple Time Based with Text Type: 1 Partition Per Day](#simple-time-based-with-text-type-1-partition-per-day)
7+
- [Simple Time Based with Snowflake IDs: 1 Partition Per Hour](#simple-time-based-with-snowflake-ids-1-partition-per-hour)
78
- [Simple Serial ID: 1 Partition Per 10 ID Values](#simple-serial-id-1-partition-Per-10-id-values)
89
- [Partitioning an Existing Table](#partitioning-an-existing-table)
910
* [Offline Partitioning](#offline-partitioning)
@@ -285,6 +286,107 @@ Indexes:
285286
"time_taptest_table_p20240815_pkey" PRIMARY KEY, btree (col3)
286287
Access method: heap
287288
```
289+
290+
### Simple Time Based with Snowflake IDs: 1 Partition Per Hour
291+
This example demonstrates how to use an integer control column that contains integers that encode a timestamp together with other data.
292+
293+
```sql
294+
CREATE SCHEMA IF NOT EXISTS partman_test;
295+
296+
CREATE TABLE partman_test.time_taptest_table(
297+
col1 BIGINT NOT NULL PRIMARY KEY,
298+
col2 text default 'stuff')
299+
PARTITION BY RANGE (col1);
300+
```
301+
302+
```sql
303+
\d+ partman_test.time_taptest_table
304+
Partitioned table "partman_test.time_taptest_table"
305+
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
306+
--------+--------+-----------+----------+---------------+----------+-------------+--------------+-------------
307+
col1 | bigint | | not null | | plain | | |
308+
col2 | text | | | 'stuff'::text | extended | | |
309+
Partition key: RANGE (col1)
310+
Indexes:
311+
"time_taptest_table_pkey" PRIMARY KEY, btree (col1)
312+
Number of partitions: 0
313+
```
314+
315+
Snowflake IDs are used in some distributed systems to generate unique, time-ordered IDs without centralization or coordination between nodes. X, Discord, Mastodon and Instagram are known to use these identifiers, and this example will use [Discord's scheme](https://discord.com/developers/docs/reference#snowflakes). The timestamp is encoded in the top 42 bits of a 64-bit integer, and the rest is for worker data and a counter. Discord also measures time from 2015 UTC instead of the UNIX epoch of 1970 UTC, a gap of 1420070400 seconds. The BIGINT type is limited to 63 bits since it is a signed integer, but 63 bits is sufficient to hold Discord IDs until September 2084.
316+
317+
The following functions respectively encode and decode snowflake IDs from/to timestamps. Note that when encoding the timestamp, the worker/counter bits are zero, so the returned value is useful as a partition boundary, not as a real ID.
318+
319+
```sql
320+
CREATE FUNCTION public.timestamp_to_snowflake(p_timestamp timestamptz, OUT encoded bigint)
321+
RETURNS bigint
322+
LANGUAGE plpgsql IMMUTABLE STRICT PARALLEL SAFE
323+
AS $$
324+
BEGIN
325+
SELECT 1000*(EXTRACT(epoch FROM p_timestamp) - 1420070400)::BIGINT << 22 INTO encoded;
326+
END
327+
$$;
328+
329+
CREATE FUNCTION public.snowflake_to_timestamp(p_snowflake bigint, OUT ts timestamptz)
330+
RETURNS TIMESTAMPTZ
331+
LANGUAGE plpgsql IMMUTABLE STRICT PARALLEL SAFE
332+
AS $$
333+
BEGIN
334+
SELECT TO_TIMESTAMP((p_snowflake >> 22)/1000 + 1420070400) INTO ts;
335+
END
336+
$$;
337+
```
338+
339+
Now we will instruct partman to use the snowflake encoder and decoder functions with the special value 'func' for `p_epoch`.
340+
341+
```sql
342+
SELECT partman.create_parent('partman_test.time_taptest_table'
343+
, p_control := 'col1'
344+
, p_interval := '1 hour'
345+
, p_epoch := 'func'
346+
, p_time_encoder := 'public.timestamp_to_snowflake'
347+
, p_time_decoder := 'public.snowflake_to_timestamp'
348+
);
349+
create_parent
350+
---------------
351+
t
352+
(1 row)
353+
```
354+
355+
```sql
356+
\d+ partman_test.time_taptest_table
357+
Partitioned table "partman_test.time_taptest_table"
358+
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
359+
--------+--------+-----------+----------+---------------+----------+-------------+--------------+-------------
360+
col1 | bigint | | not null | | plain | | |
361+
col2 | text | | | 'stuff'::text | extended | | |
362+
Partition key: RANGE (col1)
363+
Indexes:
364+
"time_taptest_table_pkey" PRIMARY KEY, btree (col1)
365+
Partitions: partman_test.time_taptest_table_p20250107_030000 FOR VALUES FROM ('1326022498713600000') TO ('1326037598208000000'),
366+
partman_test.time_taptest_table_p20250107_040000 FOR VALUES FROM ('1326037598208000000') TO ('1326052697702400000'),
367+
partman_test.time_taptest_table_p20250107_050000 FOR VALUES FROM ('1326052697702400000') TO ('1326067797196800000'),
368+
partman_test.time_taptest_table_p20250107_060000 FOR VALUES FROM ('1326067797196800000') TO ('1326082896691200000'),
369+
partman_test.time_taptest_table_p20250107_070000 FOR VALUES FROM ('1326082896691200000') TO ('1326097996185600000'),
370+
partman_test.time_taptest_table_p20250107_080000 FOR VALUES FROM ('1326097996185600000') TO ('1326113095680000000'),
371+
partman_test.time_taptest_table_p20250107_090000 FOR VALUES FROM ('1326113095680000000') TO ('1326128195174400000'),
372+
partman_test.time_taptest_table_p20250107_100000 FOR VALUES FROM ('1326128195174400000') TO ('1326143294668800000'),
373+
partman_test.time_taptest_table_p20250107_110000 FOR VALUES FROM ('1326143294668800000') TO ('1326158394163200000'),
374+
partman_test.time_taptest_table_default DEFAULT
375+
```
376+
```sql
377+
\d+ partman_test.time_taptest_table_p20250107_030000
378+
Table "partman_test.time_taptest_table_p20250107_030000"
379+
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
380+
--------+--------+-----------+----------+---------------+----------+-------------+--------------+-------------
381+
col1 | bigint | | not null | | plain | | |
382+
col2 | text | | | 'stuff'::text | extended | | |
383+
Partition of: partman_test.time_taptest_table FOR VALUES FROM ('1326022498713600000') TO ('1326037598208000000')
384+
Partition constraint: ((col1 IS NOT NULL) AND (col1 >= '1326022498713600000'::bigint) AND (col1 < '1326037598208000000'::bigint))
385+
Indexes:
386+
"time_taptest_table_p20250107_030000_pkey" PRIMARY KEY, btree (col1)
387+
Access method: heap
388+
```
389+
288390
### Simple Serial ID: 1 Partition Per 10 ID Values
289391
For this use-case, the template table is not created manually before calling `create_parent()`. So it shows that if a primary/unique key is added later, it does not apply to the currently existing child tables. That will have to be done manually.
290392

sql/functions/create_parent.sql

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,9 @@ IF v_control_type NOT IN ('time', 'id', 'text', 'uuid') THEN
182182
RAISE EXCEPTION 'Only date/time, text/uuid or integer types are allowed for the control column.';
183183
ELSIF v_control_type IN ('text', 'uuid') AND (p_time_encoder IS NULL OR p_time_decoder IS NULL) THEN
184184
RAISE EXCEPTION 'p_time_encoder and p_time_decoder needs to be set for text/uuid type control column.';
185-
ELSIF v_control_type NOT IN ('text', 'uuid') AND (p_time_encoder IS NOT NULL OR p_time_decoder IS NOT NULL) THEN
185+
ELSIF v_control_type = 'id' AND p_epoch = 'func' AND (p_time_encoder IS NULL OR p_time_decoder IS NULL) THEN
186+
RAISE EXCEPTION 'p_time_encoder and p_time_decoder functions need to be set for p_epoch=func to work.';
187+
ELSIF v_control_type NOT IN ('text', 'uuid', 'id') AND (p_time_encoder IS NOT NULL OR p_time_decoder IS NOT NULL) THEN
186188
RAISE EXCEPTION 'p_time_encoder and p_time_decoder can only be used with text/uuid type control column.';
187189
END IF;
188190

sql/functions/create_partition_time.sql

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ ex_hint text;
1414
ex_message text;
1515
v_control text;
1616
v_control_type text;
17+
v_time_decoder text;
1718
v_time_encoder text;
1819
v_datetime_string text;
1920
v_epoch text;
@@ -45,6 +46,8 @@ v_sub_timestamp_max timestamptz;
4546
v_sub_timestamp_min timestamptz;
4647
v_template_table text;
4748
v_time timestamptz;
49+
v_partition_id_start bigint;
50+
v_partition_id_end bigint;
4851
v_partition_text_start text;
4952
v_partition_text_end text;
5053

@@ -54,6 +57,7 @@ BEGIN
5457
*/
5558

5659
SELECT control
60+
, time_decoder
5761
, time_encoder
5862
, partition_interval::interval -- this shared field also used in partition_id as bigint
5963
, epoch
@@ -62,6 +66,7 @@ SELECT control
6266
, template_table
6367
, inherit_privileges
6468
INTO v_control
69+
, v_time_decoder
6570
, v_time_encoder
6671
, v_partition_interval
6772
, v_epoch
@@ -123,6 +128,7 @@ v_partition_expression := CASE
123128
WHEN v_epoch = 'milliseconds' THEN format('to_timestamp((%I/1000)::float)', v_control)
124129
WHEN v_epoch = 'microseconds' THEN format('to_timestamp((%I/1000000)::float)', v_control)
125130
WHEN v_epoch = 'nanoseconds' THEN format('to_timestamp((%I/1000000000)::float)', v_control)
131+
WHEN v_epoch = 'func' THEN format('%s(%I)', v_time_decoder, v_control)
126132
ELSE format('%I', v_control)
127133
END;
128134
RAISE DEBUG 'create_partition_time: v_partition_expression: %', v_partition_expression;
@@ -237,7 +243,17 @@ FOREACH v_time IN ARRAY p_partition_times LOOP
237243
, v_partition_text_start
238244
, v_partition_text_end);
239245
END IF;
246+
ELSIF v_epoch = 'func' THEN
247+
EXECUTE format('SELECT %s(%L)', v_time_encoder, v_partition_timestamp_start) INTO v_partition_id_start;
248+
EXECUTE format('SELECT %s(%L)', v_time_encoder, v_partition_timestamp_end) INTO v_partition_id_end;
240249

250+
EXECUTE format('ALTER TABLE %I.%I ATTACH PARTITION %I.%I FOR VALUES FROM (%L) TO (%L)'
251+
, v_parent_schema
252+
, v_parent_tablename
253+
, v_parent_schema
254+
, v_partition_name
255+
, v_partition_id_start
256+
, v_partition_id_end);
241257
ELSE
242258
-- Must attach with integer based values for built-in constraint and epoch
243259
IF v_epoch = 'seconds' THEN

sql/functions/partition_data_time.sql

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ v_source_schemaname text;
3838
v_source_tablename text;
3939
v_rowcount bigint;
4040
v_start_control timestamptz;
41+
v_time_decoder text;
4142
v_total_rows bigint := 0;
4243

4344
BEGIN
@@ -49,10 +50,12 @@ SELECT partition_interval::interval
4950
, control
5051
, datetime_string
5152
, epoch
53+
, time_decoder
5254
INTO v_partition_interval
5355
, v_control
5456
, v_datetime_string
5557
, v_epoch
58+
, v_time_decoder
5659
FROM @[email protected]_config
5760
WHERE parent_table = p_parent_table;
5861
IF NOT FOUND THEN
@@ -133,6 +136,7 @@ v_partition_expression := CASE
133136
WHEN v_epoch = 'milliseconds' THEN format('to_timestamp((%I/1000)::float)', v_control)
134137
WHEN v_epoch = 'microseconds' THEN format('to_timestamp((%I/1000000)::float)', v_control)
135138
WHEN v_epoch = 'nanoseconds' THEN format('to_timestamp((%I/1000000000)::float)', v_control)
139+
WHEN v_epoch = 'func' THEN format('%s(%I)', v_time_decoder, v_control)
136140
ELSE format('%I', v_control)
137141
END;
138142

0 commit comments

Comments
 (0)