Skip to content

Commit b090c4e

Browse files
committed
docs: small improvements to docs for table functions
1 parent 26fec6b commit b090c4e

File tree

1 file changed

+84
-30
lines changed

1 file changed

+84
-30
lines changed

site/docs/expressions/table_functions.md

Lines changed: 84 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,9 @@ Table functions (0-input, currently supported) are **leaf operators** in the que
1212
- Take a **fixed number of constant arguments** (literals or expressions that can be evaluated without input data)
1313
- Produce **zero or more records** as output (a relation/table)
1414
- Do **not consume an input relation** - they generate data from constants
15-
- Have either a **derived schema** (determinable from function signature) or an **explicit schema** (depends on runtime data)
15+
- Have either a **derived schema** (determinable from the YAML `return` field and argument types) or an **explicit schema** (determined at runtime when YAML omits the `return` field)
16+
17+
See [Schema Determination](#schema-determination) for details on how schemas are specified.
1618

1719
Future extensions may add support for transformation table functions that consume and transform input relations by adding an optional input field to `TableFunctionRel`.
1820

@@ -21,76 +23,126 @@ Future extensions may add support for transformation table functions that consum
2123
Table functions are defined in YAML extension files, similar to scalar, aggregate, and window functions. A table function signature specifies:
2224

2325
- **Arguments**: The parameters the function accepts (must be constant expressions)
24-
- **Schema**: The output schema of the generated relation
26+
- **Schema**: The output schema of the generated relation (may or may not be specified in YAML)
2527
- **Determinism**: Whether the function produces the same output for the same inputs
2628
- **Session Dependency**: Whether the function depends on session state
2729

2830
## Schema Determination
2931

30-
Like scalar functions' return types, table function schemas follow a clear pattern:
32+
Table function schemas can be specified in two ways, depending on whether the YAML definition includes a `return` field:
33+
34+
### Derived Schemas (`derived: true`)
35+
36+
When a table function's YAML definition **includes a `return` field**, the schema can be deterministically derived from the function signature and the types of the bound arguments.
37+
38+
- In the plan: Set `derived: true` and include the schema from YAML in `table_schema` with any type parameters resolved
39+
- The schema is fully determinable from type information alone
40+
41+
This includes both:
42+
- **Concrete types**: Schema is fixed (e.g., `generate_series` always produces `{value: i64}`)
43+
- **Type-parameterized**: Schema depends on argument types (e.g., `unnest(list<T>)` produces `{element: T}` where `T` is resolved from the argument)
44+
45+
**Example YAML definitions** (from `functions_table.yaml`):
46+
47+
```yaml
48+
# Concrete type example - schema is always {value: i64}
49+
- name: "generate_series"
50+
impls:
51+
- args:
52+
- name: start
53+
value: i64
54+
- name: stop
55+
value: i64
56+
- name: step
57+
value: i64
58+
return:
59+
names:
60+
- value
61+
struct:
62+
types:
63+
- i64
64+
65+
# Type-parameterized example - schema is {element: T} where T comes from list<T>
66+
- name: "unnest"
67+
impls:
68+
- args:
69+
- name: input
70+
value: "list<T>"
71+
return:
72+
names:
73+
- element
74+
struct:
75+
types:
76+
- T
77+
```
78+
79+
### Explicit Schemas (`derived: false`)
80+
81+
When a table function's YAML definition **omits the `return` field**, the schema depends on runtime data content and cannot be determined from type information alone.
82+
83+
- In the plan: Set `derived: false` and provide the schema in `table_schema`
84+
- The plan producer determines the schema (e.g., by inspecting file contents, database metadata, etc.)
85+
86+
**Example scenario**: A function like `read_parquet(path)` where the schema depends on the actual Parquet file's structure.
3187

3288
!!! note "Required Constraint"
3389
**If a table function's YAML definition includes a `return` field, the `derived` field MUST be set to `true` in the plan, and the `table_schema` field MUST match the YAML definition (with any type parameters resolved based on the bound argument types).**
3490

35-
**Derived schemas (`derived: true`)** - The schema can be **deterministically derived from the function signature**, including:
36-
- **Static schemas**: Fixed output regardless of argument values (e.g., `generate_series` always produces `{value: i64}`)
37-
- **Type-parameterized schemas**: Schema depends on argument types (e.g., `unnest(list<T>)` produces `{element: T}`)
38-
39-
Both cases use `derived: true` because the schema is fully determinable from the function signature and bound argument types.
91+
### Plan Examples
4092

41-
**Explicit schemas (`derived: false`)** - The schema **depends on runtime data content** and cannot be determined from the function signature alone.
93+
Now let's see how these two cases appear in actual Substrait plans:
4294

43-
### Derived Schema Examples
95+
#### Derived Schema Examples
4496

45-
For functions where the schema is determinable from the function signature (either concrete or type-parameterized), set `derived: true`. The `table_schema` field contains the schema derived from the YAML definition:
97+
For functions where the YAML includes a `return` field, set `derived: true`. The schema is derived from the YAML definition, with any type parameters resolved based on argument types.
4698

47-
**Static schema example:**
99+
**Concrete type example** (`generate_series`):
48100
```
49101
TableFunctionRel {
50102
function_reference: <generate_series>
51103
arguments: [
52104
{ value: { literal: { i64: 1 } } },
53-
{ value: { literal: { i64: 100 } } }
105+
{ value: { literal: { i64: 100 } } },
106+
{ value: { literal: { i64: 1 } } }
54107
]
55-
derived: true // Schema came from YAML definition
108+
derived: true // Schema from YAML return field
56109
table_schema: {
57110
names: ["value"]
58111
struct: {
59-
types: [{ i64: {} }]
112+
types: [{ i64: {} }] // Matches YAML definition exactly
60113
}
61114
}
62115
}
63116
```
64117
65-
**Type-parameterized schema example:**
118+
**Type-parameterized example** (`unnest`):
66119
```
67120
TableFunctionRel {
68121
function_reference: <unnest>
69122
arguments: [
70123
{ value: { literal: { list: [...] } } } // list<string>
71124
]
72-
derived: true // Schema from YAML with T resolved to string
125+
derived: true // Schema from YAML with T resolved
73126
table_schema: {
74127
names: ["element"]
75128
struct: {
76-
types: [{ string: {} }] // T resolved to string from list<string>
129+
types: [{ string: {} }] // T resolved to string from list<string> argument
77130
}
78131
}
79132
}
80133
```
81134
82-
### Explicit Schema Examples
135+
#### Explicit Schema Example
83136
84-
For functions where the schema depends on runtime data content, set `derived: false` and provide the schema in `table_schema`:
137+
For functions where the YAML omits the `return` field, set `derived: false` and provide the schema determined by the plan producer:
85138
86139
```
87140
TableFunctionRel {
88-
common: { ... }
89-
function_reference: <some_function>
141+
function_reference: <read_parquet>
90142
arguments: [
91-
// Function arguments
143+
{ value: { literal: { string: "data.parquet" } } }
92144
]
93-
derived: false // Schema was determined by the plan producer
145+
derived: false // No return field in YAML - schema from runtime inspection
94146
table_schema: {
95147
names: ["id", "name", "age"]
96148
struct: {
@@ -113,18 +165,20 @@ Table functions are represented as their own relation type, `TableFunctionRel`.
113165
- **function_reference**: Points to a function anchor referencing the table function definition
114166
- **arguments**: Must be constant expressions (currently; literals or expressions evaluable without input data)
115167
- **derived**: Boolean flag indicating schema source:
116-
- `true` - Schema determinable from function signature (concrete types or type parameters). **Required when the YAML definition includes a `return` field.**
117-
- `false` - Schema depends on runtime data content. **Only allowed when the YAML definition omits the `return` field.**
118-
- **table_schema**: The output schema (always present). Must match the YAML definition if derived is true (with type parameters resolved). Contains the actual schema whether derived from YAML or provided by the producer.
168+
- `true` - Schema is determinable from the YAML `return` field and argument types (includes both concrete and type-parameterized schemas)
169+
- `false` - Schema depends on runtime data content (no `return` field in YAML)
170+
- **table_schema**: The output schema (always present). For `derived: true`, must match the YAML `return` field (with type parameters resolved). For `derived: false`, provided by the plan producer.
119171
- **common**: Standard relation properties (emit, hints, etc.)
120172
121-
**The key distinction:** Set `derived: true` if the schema can be determined by looking at the function signature and argument types in the YAML definition. Set `derived: false` only if the YAML definition omits the `return` field because it requires inspecting runtime data content.
173+
**Quick reference for setting `derived`:**
174+
- YAML has `return` field → `derived: true`
175+
- YAML omits `return` field → `derived: false`
122176
123177
Table functions can be used anywhere a relation is expected - as a leaf node, or as input to other relational operators like `FilterRel`, `ProjectRel`, etc.
124178
125179
## Examples
126180
127-
### Example 1: Generating a Sequence
181+
### Example 1: Generating a Sequence (Derived Schema - Concrete Types)
128182
129183
Generate integers from 1 to 100:
130184
@@ -158,7 +212,7 @@ value
158212
100
159213
```
160214
161-
### Example 2: Unnest a Literal Array
215+
### Example 2: Unnest a Literal Array (Derived Schema - Type-Parameterized)
162216
163217
Unnest a literal list into rows:
164218

0 commit comments

Comments
 (0)