You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: site/docs/expressions/table_functions.md
+84-30Lines changed: 84 additions & 30 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,9 @@ Table functions (0-input, currently supported) are **leaf operators** in the que
12
12
- Take a **fixed number of constant arguments** (literals or expressions that can be evaluated without input data)
13
13
- Produce **zero or more records** as output (a relation/table)
14
14
- Do **not consume an input relation** - they generate data from constants
15
-
- Have either a **derived schema** (determinable from function signature) or an **explicit schema** (depends on runtime data)
15
+
- Have either a **derived schema** (determinable from the YAML `return` field and argument types) or an **explicit schema** (determined at runtime when YAML omits the `return` field)
16
+
17
+
See [Schema Determination](#schema-determination) for details on how schemas are specified.
16
18
17
19
Future extensions may add support for transformation table functions that consume and transform input relations by adding an optional input field to `TableFunctionRel`.
18
20
@@ -21,76 +23,126 @@ Future extensions may add support for transformation table functions that consum
21
23
Table functions are defined in YAML extension files, similar to scalar, aggregate, and window functions. A table function signature specifies:
22
24
23
25
-**Arguments**: The parameters the function accepts (must be constant expressions)
24
-
-**Schema**: The output schema of the generated relation
26
+
-**Schema**: The output schema of the generated relation (may or may not be specified in YAML)
25
27
-**Determinism**: Whether the function produces the same output for the same inputs
26
28
-**Session Dependency**: Whether the function depends on session state
27
29
28
30
## Schema Determination
29
31
30
-
Like scalar functions' return types, table function schemas follow a clear pattern:
32
+
Table function schemas can be specified in two ways, depending on whether the YAML definition includes a `return` field:
33
+
34
+
### Derived Schemas (`derived: true`)
35
+
36
+
When a table function's YAML definition **includes a `return` field**, the schema can be deterministically derived from the function signature and the types of the bound arguments.
37
+
38
+
- In the plan: Set `derived: true` and include the schema from YAML in `table_schema` with any type parameters resolved
39
+
- The schema is fully determinable from type information alone
# Concrete type example - schema is always {value: i64}
49
+
- name: "generate_series"
50
+
impls:
51
+
- args:
52
+
- name: start
53
+
value: i64
54
+
- name: stop
55
+
value: i64
56
+
- name: step
57
+
value: i64
58
+
return:
59
+
names:
60
+
- value
61
+
struct:
62
+
types:
63
+
- i64
64
+
65
+
# Type-parameterized example - schema is {element: T} where T comes from list<T>
66
+
- name: "unnest"
67
+
impls:
68
+
- args:
69
+
- name: input
70
+
value: "list<T>"
71
+
return:
72
+
names:
73
+
- element
74
+
struct:
75
+
types:
76
+
- T
77
+
```
78
+
79
+
### Explicit Schemas (`derived: false`)
80
+
81
+
When a table function's YAML definition **omits the `return` field**, the schema depends on runtime data content and cannot be determined from type information alone.
82
+
83
+
- In the plan: Set `derived: false` and provide the schema in `table_schema`
84
+
- The plan producer determines the schema (e.g., by inspecting file contents, database metadata, etc.)
85
+
86
+
**Example scenario**: A function like `read_parquet(path)` where the schema depends on the actual Parquet file's structure.
31
87
32
88
!!! note "Required Constraint"
33
89
**If a table function's YAML definition includes a `return` field, the `derived` field MUST be set to `true` in the plan, and the `table_schema` field MUST match the YAML definition (with any type parameters resolved based on the bound argument types).**
34
90
35
-
**Derived schemas (`derived: true`)** - The schema can be **deterministically derived from the function signature**, including:
Both cases use `derived: true` because the schema is fully determinable from the function signature and bound argument types.
91
+
### Plan Examples
40
92
41
-
**Explicit schemas (`derived: false`)** - The schema **depends on runtime data content** and cannot be determined from the function signature alone.
93
+
Now let's see how these two cases appear in actual Substrait plans:
42
94
43
-
### Derived Schema Examples
95
+
#### Derived Schema Examples
44
96
45
-
For functions where the schema is determinable from the function signature (either concrete or type-parameterized), set `derived: true`. The `table_schema` field contains the schema derived from the YAML definition:
97
+
For functions where the YAML includes a `return` field, set `derived: true`. The schema is derived from the YAML definition, with any type parameters resolved based on argument types.
derived: false // Schema was determined by the plan producer
145
+
derived: false // No return field in YAML - schema from runtime inspection
94
146
table_schema: {
95
147
names: ["id", "name", "age"]
96
148
struct: {
@@ -113,18 +165,20 @@ Table functions are represented as their own relation type, `TableFunctionRel`.
113
165
- **function_reference**: Points to a function anchor referencing the table function definition
114
166
- **arguments**: Must be constant expressions (currently; literals or expressions evaluable without input data)
115
167
- **derived**: Boolean flag indicating schema source:
116
-
-`true` - Schema determinable from function signature (concrete types or type parameters). **Required when the YAML definition includes a `return` field.**
117
-
-`false` - Schema depends on runtime data content. **Only allowed when the YAML definition omits the `return` field.**
118
-
-**table_schema**: The output schema (always present). Must match the YAML definition if derived is true (with type parameters resolved). Contains the actual schema whether derived from YAML or provided by the producer.
168
+
- `true` - Schema is determinable from the YAML `return` field and argument types (includes both concrete and type-parameterized schemas)
169
+
- `false` - Schema depends on runtime data content (no `return` field in YAML)
170
+
- **table_schema**: The output schema (always present). For `derived: true`, must match the YAML `return` field (with type parameters resolved). For `derived: false`, provided by the plan producer.
119
171
- **common**: Standard relation properties (emit, hints, etc.)
120
172
121
-
**The key distinction:** Set `derived: true` if the schema can be determined by looking at the function signature and argument types in the YAML definition. Set `derived: false` only if the YAML definition omits the `return` field because it requires inspecting runtime data content.
173
+
**Quick reference for setting `derived`:**
174
+
- YAML has `return` field → `derived: true`
175
+
- YAML omits `return` field → `derived: false`
122
176
123
177
Table functions can be used anywhere a relation is expected - as a leaf node, or as input to other relational operators like `FilterRel`, `ProjectRel`, etc.
124
178
125
179
## Examples
126
180
127
-
### Example 1: Generating a Sequence
181
+
### Example 1: Generating a Sequence (Derived Schema - Concrete Types)
128
182
129
183
Generate integers from 1 to 100:
130
184
@@ -158,7 +212,7 @@ value
158
212
100
159
213
```
160
214
161
-
### Example 2: Unnest a Literal Array
215
+
### Example 2: Unnest a Literal Array (Derived Schema - Type-Parameterized)
0 commit comments