Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 102 additions & 0 deletions extensions/functions_table.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
%YAML 1.2
---
# Table functions: Functions that produce relations (zero or more records).
# Currently, only 0-input functions are supported - these take constant arguments
# and generate data as leaf operators.
urn: extension:io.substrait:functions_table
table_functions:
- name: "generate_series"
description: >-
Generates a series of integer values from start to stop, incrementing by step.

Takes constant arguments and produces zero or more records containing a single
integer value. The series includes both the start and stop values if they fall
on a step boundary. If step is positive, stops when the value exceeds stop.
If step is negative, stops when the value is less than stop. Returns empty if
step is zero or if the step direction doesn't allow reaching stop from start.
impls:
- args:
- name: start
value: i64
description: The starting value of the series
- name: stop
value: i64
description: The ending value of the series (inclusive)
- name: step
value: i64
description: The increment between values
constant: true
deterministic: true
sessionDependent: false
return:
names:
- value
struct:
types:
- i64
- args:
- name: start
value: i32
description: The starting value of the series
- name: stop
value: i32
description: The ending value of the series (inclusive)
- name: step
value: i32
description: The increment between values
constant: true
deterministic: true
sessionDependent: false
return:
names:
- value
struct:
types:
- i32
- args:
- name: start
value: i64
description: The starting value of the series
- name: stop
value: i64
description: The ending value of the series (inclusive)
deterministic: true
sessionDependent: false
return:
names:
- value
struct:
types:
- i64
- args:
- name: start
value: i32
description: The starting value of the series
- name: stop
value: i32
description: The ending value of the series (inclusive)
deterministic: true
sessionDependent: false
return:
names:
- value
struct:
types:
- i32
- name: "unnest"
description: Expands a list literal into a set of rows, one row per element.
impls:
- args:
- name: input
value: "list<T>"
description: The list to unnest
deterministic: true
sessionDependent: false
# Schema references type parameter T from list<T>
# The field type is derived from the list element type
return:
names:
- element
struct:
types:
- T
46 changes: 46 additions & 0 deletions proto/substrait/algebra.proto
Original file line number Diff line number Diff line change
Expand Up @@ -552,6 +552,51 @@ message ExpandRel {
}
}

// Invokes a table-valued function that produces a relation (zero or more records).
//
//
// Table functions produce a table with either:
// - A schema that can be derived based on argument types (type-parameterized functions)
// - A schema that depends on runtime data (use derived: false)
//
// Future extensions may add an optional input field to support transformation
// table functions that operate on input relations.
message TableFunctionRel {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a more appropriate name for this GeneratorTableFunctionRel as introduced by
@jacques-n here?

In my mind there are two possible paths we can go down which determine the appropriate name.

  1. We could explore expanding this proto (by e.g. adding a relations input field) so that it appropriately models both kinds of table functions. In this case, keeping the name as TableFunctionRel makes sense.
  2. We could introduce a brand new relation to represent table functions which take relations as input. In this case, it would make sense to rename this relation to GeneratorTableRel or some name which distinguishes it from the other kind of table function.

RelCommon common = 1;

// Points to a function_anchor defined in this plan, which must refer
// to a table function in the associated YAML file. Avoid using
// anchor/reference zero.
uint32 function_reference = 2;

// The arguments to be bound to the function. This must have exactly the
// number of arguments specified in the function definition from the YAML file,
// and the argument types must also match exactly:
//
// - Value arguments must be bound using FunctionArgument.value.
// Currently (0-input functions only), expressions must be constants
// (literals or expressions evaluable without input data).
// - Type arguments must be bound using FunctionArgument.type.
// - Enum arguments must be bound using FunctionArgument.enum with a
// string that case-insensitively matches one of the allowed options.
repeated FunctionArgument arguments = 3;

// The derived fields indicates whether or not the YAML file produced the schema:
Copy link
Member Author

@benbellick benbellick Oct 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every other function type has a FunctionOption field around here. Do we need that for table functions as well?

message TableFunctionRel {
    ...
    repeated FunctionOption options = 4;
    ...
}

// - If true, the table_schema was produced purely from the type expressions in the
// YAML file + the types of the provided arguments
// - If false, the table_schema was produced by the plan producer
//
// This value is required to be true if and only if a schema is provided in the YAML
// definition of this function.
bool derived = 4;

// The schema of the output relation. This schema is required to match the implied schema
// by the YAML definition, if a schema is present in the definition.
NamedStruct table_schema = 5;

substrait.extensions.AdvancedExtension advanced_extension = 10;
}

// A relation with output field names.
//
// This is for use at the root of a `Rel` tree.
Expand Down Expand Up @@ -581,6 +626,7 @@ message Rel {
WriteRel write = 19;
DdlRel ddl = 20;
UpdateRel update = 22;
TableFunctionRel table_function = 23;
// Physical relations
HashJoinRel hash_join = 13;
MergeJoinRel merge_join = 14;
Expand Down
13 changes: 13 additions & 0 deletions proto/substrait/function.proto
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,19 @@ message FunctionSignature {
}
}

message Table {
repeated Argument arguments = 2;
repeated string name = 3;
Description description = 4;

bool deterministic = 7;
bool session_dependent = 8;

NamedStruct schema = 9;

repeated Implementation implementations = 10;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other functions have variadic final argument logic like:

message TableFunction {
    ...
    oneof final_variable_behavior {
      FinalArgVariadic variadic = 10;
      FinalArgNormal normal = 11;
    }
    ...
}

However, I skipped it for now for simplicity. Is it necessary on a first pass?

}

message Description {
string language = 1;
string body = 2;
Expand Down
Loading
Loading