Skip to content

Commit b0ec36d

Browse files
committed
docs: create table update syntax (part 2)
1 parent 7fe4869 commit b0ec36d

File tree

4 files changed

+147
-1
lines changed

4 files changed

+147
-1
lines changed

doc/user/config.toml

+5
Original file line numberDiff line numberDiff line change
@@ -198,6 +198,11 @@ weight = 30
198198
# allow <a name="link-target">, the old syntax no longer works
199199
unsafe = true
200200

201+
[markup]
202+
[markup.highlight]
203+
noClasses = false
204+
style = "monokai"
205+
201206
[[deployment.targets]]
202207
name = "production"
203208
url = "s3://materialize-website?region=us-east-1"

doc/user/content/sql/create-table.md

+47-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: "CREATE TABLE"
3-
description: "`CREATE TABLE` creates a table that is persisted in durable storage."
3+
description: "Reference page for `CREATE TABLE`. `CREATE TABLE` creates a table that is persisted in durable storage."
44
pagerank: 40
55
menu:
66
# This should also have a "non-content entry" under Reference, which is
@@ -186,6 +186,52 @@ See also [Materialize SQL data types](/sql/types/).
186186

187187
{{</ tab >}}
188188

189+
{{< tab "Source-populated tables (Kafka/Redpanda source)" >}}
190+
191+
To create a table from a source, where the source maps to an external
192+
Kafka/Redpanda system:
193+
194+
{{< note >}}
195+
196+
Users cannot write to source-populated tables; i.e., users cannot perform
197+
[`INSERT`](/sql/insert/)/[`UPDATE`](/sql/update/)/[`DELETE`](/sql/delete/)
198+
operations on source-populated tables.
199+
200+
{{</ note >}}
201+
202+
```mzsql
203+
CREATE TABLE <table_name> FROM SOURCE <source_name> [(REFERENCE <ref_object>)]
204+
[FORMAT <format> | KEY FORMAT <format> VALUE FORMAT <format>]
205+
-- <format> can be:
206+
-- AVRO USING CONFLUENT SCHEMA REGISTRY CONNECTION <conn_name>
207+
-- [KEY STRATEGY
208+
-- INLINE <schema> | ID <schema_registry_id> | LATEST ]
209+
-- [VALUE STRATEGY
210+
-- INLINE <schema> | ID <schema_registry_id> | LATEST ]
211+
-- | PROTOBUF USING CONFLUENT SCHEMA REGISTRY CONNECTION <conn_name>
212+
-- | PROTOBUF MESSAGE <msg_name> USING SCHEMA <encoded_schema>
213+
-- | CSV WITH HEADER ( <col_name>[, ...]) [DELIMITED BY <char>]
214+
-- | CSV WITH <num> COLUMNS DELIMITED BY <char>
215+
-- | JSON | TEXT | BYTES
216+
]
217+
[INCLUDE
218+
KEY [AS <name>] | PARTITION [AS <name>] | OFFSET [AS <name>]
219+
| TIMESTAMP [AS <name>] | HEADERS [AS <name>] | HEADER <key_name> AS <name> [BYTES]
220+
[, ...]
221+
]
222+
[ENVELOPE
223+
NONE -- Default. Uses the append-only envelope.
224+
| DEBEZIUM
225+
| UPSERT [(VALUE DECODING ERRORS = INLINE [AS name])]
226+
]
227+
;
228+
```
229+
230+
{{% yaml-table data="syntax_options/create_table_options_source_populated_kafka"
231+
%}}
232+
233+
234+
{{</ tab >}}
189235

190236
{{</ tabs >}}
191237

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
columns:
2+
- column: "Parameter"
3+
- column: "Description"
4+
rows:
5+
- "Parameter": "`<table_name>`"
6+
"Description": |
7+
8+
The name of the table to create. Names for tables must follow the [naming
9+
guidelines](/sql/identifiers/#naming-restrictions).
10+
11+
- "Parameter": "`<source_name>`"
12+
"Description": |
13+
14+
The name of the [source](/sql/create-source/kafka/) created for the Kafka topic.
15+
16+
- "Parameter": "**(REFERENCE <ref_object>)**"
17+
"Description": |
18+
19+
*Optional.* If specified, the topic (which should match the topic
20+
specified in the source) from which to create the table. You can create
21+
multiple tables from the same reference object.
22+
23+
To find the reference objects available in your
24+
[source](/sql/create-source/), you can use the following query,
25+
substituting your source name for `<source_name>`:
26+
27+
<br>
28+
29+
```mzsql
30+
SELECT refs.*
31+
FROM mz_internal.mz_source_references refs, mz_sources s
32+
WHERE s.name = '<source_name>' -- substitute with your source name
33+
AND refs.source_id = s.id;
34+
```
35+
36+
- "Parameter": |
37+
**FORMAT \<format\> |
38+
KEY FORMAT \<format\> VALUE FORMAT \<format\>**
39+
"Description": |
40+
41+
*Optional.* If specified, use the specified format to decode the data. The following `<format>`s are supported:
42+
43+
| Format | Description |
44+
|--------|-------------|
45+
| `AVRO USING CONFLUENT SCHEMA REGISTRY CONNECTION <csr_connection> [KEY STRATEGY <strategy> VALUE STRATEGY <strategy>]` | Decode the data as Avro, specifying the [Confluent Schema Registry connection](/sql/create-connection/#confluent-schema-registry) to use. You can also specify the `KEY STRATEGY` and `VALUE STRATEGY` to use: <table> <thead> <tr> <th>Strategy</th> <th>Description</th> </tr> </thead> <tbody> <tr> <td><code>LATEST</code></td> <td>(Default) Use the latest writer schema from the schema registry as the reader schema.</td> </tr> <tr> <td><code>ID</code></td> <td>Use a specific schema from the registry.</td> </tr> <tr> <td><code>INLINE</code></td> <td>Use the inline schema.</td> </tr> </tbody> </table>|
46+
| `PROTOBUF USING CONFLUENT SCHEMA REGISTRY CONNECTION <csr_connection>` | Decode the data as Protocol Buffers, specifying the [Confluent Schema Registry connection](/sql/create-connection/#confluent-schema-registry) to use. |
47+
| `PROTOBUF MESSAGE <msg_name> USING SCHEMA <encoded_schema>` | Decode the data as Protocol Buffers, specifying the `<msg_name>` and the inline `<encoded_schema>` descriptor to use. |
48+
| `JSON` | Decode the data as JSON. |
49+
| `TEXT` | Decode the data as TEXT. |
50+
| `BYTES` | Decode the data as BYTES. |
51+
| `CSV WITH HEADER ( <col_name>[, ...]) [DELIMITED BY <char>]` | Parse the data as CSV with a header row. Materialize uses this header to infer both the number of columns and their names. The header is **not** ingested as data. The optional `DELIMITED BY <char>` clause specifies the delimiter character. <br><br>The data is decoded as [`text`](/sql/types/text). You can convert the data to other to other types using explicit [casts](/sql/functions/cast/) when creating views.|
52+
| `CSV WITH <num> COLUMNS DELIMITED BY <char>` | Parse the data as CSV with a specified number of columns and a specified delimiter. The columns are named `column1`, `column2`...`columnN`. <br><br> The data is decoded as [`text`](/sql/types/text). You can convert the data to other to other types using explicit [casts](/sql/functions/cast/) when creating views.|
53+
54+
{{< include-md file="shared-content/kafka-format-envelope-compat-table.md"
55+
>}}
56+
57+
For more information, see [Creating a source](/sql/create-source/kafka/#creating-a-source).
58+
59+
- "Parameter": |
60+
**INCLUDE \<include_option\>**
61+
"Description": |
62+
63+
*Optional.* If specified, include the additional information as column(s) in the table. The following `<include_option>`s are supported:
64+
65+
| Option | Description |
66+
|--------|-------------|
67+
| **KEY [AS \<name\>]** | Include a column containing the Kafka message key. If the key is encoded using a format that includes schemas the column will take its name from the schema. For unnamed formats (e.g. `TEXT`), the column will be named `key`. The column can be renamed with the optional **AS** *name* statement.
68+
| **PARTITION [AS \<name\>]** | Include a `partition` column containing the Kafka message partition. The column can be renamed with the optional **AS** *name* clause.
69+
| **OFFSET [AS \<name\>]** | Include an `offset` column containing the Kafka message offset. The column can be renamed with the optional **AS** *name* clause.
70+
| **TIMESTAMP [AS \<name\>]** | Include a `timestamp` column containing the Kafka message timestamp. The column can be renamed with the optional **AS** *name* clause. <br><br>Note that the timestamp of a Kafka message depends on how the topic and its producers are configured. See the [Confluent documentation](https://docs.confluent.io/3.0.0/streams/concepts.html?#time) for details.
71+
| **HEADERS [AS \<name\>]** | Include a `headers` column containing the Kafka message headers as a list of records of type `(key text, value bytea)`. The column can be renamed with the optional **AS** *name* clause.
72+
| **HEADER \<key\> AS \<name\> [**BYTES**]** | Include a *name* column containing the Kafka message header *key* parsed as a UTF-8 string. To expose the header value as `bytea`, use the `BYTES` option.
73+
74+
- "Parameter": |
75+
**ENVELOPE \<envelope\>**
76+
"Description": |
77+
78+
*Optional.* If specified, use the specified envelope. The following `<envelope>`s are supported:
79+
80+
| Envelope | Description |
81+
|----------|-------------|
82+
| **ENVELOPE NONE** | *Default*. Use an append-only envelope. This means that records will only be appended and cannot be updated or deleted.
83+
| **ENVELOPE DEBEZIUM** | Use the Debezium envelope, which uses a diff envelope to handle CRUD operations. This envelope can lead to **high memory utilization** in the cluster maintaining the source. Materialize can automatically offload processing to disk as needed. See [spilling to disk](/sql/create-source/kafka/#spilling-to-disk) for details. For more information, see [Using Debezium](/sql/create-source/kafka/#using-debezium).
84+
| **ENVELOPE UPSERT** [**(VALUE DECODING ERRORS = INLINE)**] | Use the upsert envelope, which uses message keys to handle CRUD operations. To handle value decoding errors, use the `(VALUE DECODING ERRORS = INLINE)` option. For more information, see [Handling upserts](/sql/create-source/kafka/#handling-upserts) and [Value decoding errors](/sql/create-source/kafka/#value-decoding-errors).
85+
86+
{{< include-md file="shared-content/kafka-format-envelope-compat-table.md" >}}
87+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
The following table specifies the format and envelope compatibility:
2+
3+
| Format | Append-only envelope | Upsert envelope | Debezium envelope |
4+
|--------|:--------------------:|:---------------:|:-----------------:|
5+
| Avro ||||
6+
| Protobuf | ✓ | ✓
7+
| JSON/Text/Bytes | ✓ | ✓
8+
| CSV || |

0 commit comments

Comments
 (0)