-
Notifications
You must be signed in to change notification settings - Fork 187
feat: introduction of simple table functions #876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
1635a4d
092ba9d
26fec6b
b090c4e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,102 @@ | ||
| %YAML 1.2 | ||
| --- | ||
| # Table functions: Functions that produce relations (zero or more records). | ||
| # Currently, only 0-input functions are supported - these take constant arguments | ||
| # and generate data as leaf operators. | ||
| urn: extension:io.substrait:functions_table | ||
| table_functions: | ||
| - name: "generate_series" | ||
| description: >- | ||
| Generates a series of integer values from start to stop, incrementing by step. | ||
|
|
||
| Takes constant arguments and produces zero or more records containing a single | ||
| integer value. The series includes both the start and stop values if they fall | ||
| on a step boundary. If step is positive, stops when the value exceeds stop. | ||
| If step is negative, stops when the value is less than stop. Returns empty if | ||
| step is zero or if the step direction doesn't allow reaching stop from start. | ||
| impls: | ||
| - args: | ||
| - name: start | ||
| value: i64 | ||
| description: The starting value of the series | ||
| - name: stop | ||
| value: i64 | ||
| description: The ending value of the series (inclusive) | ||
| - name: step | ||
| value: i64 | ||
| description: The increment between values | ||
| constant: true | ||
| deterministic: true | ||
| sessionDependent: false | ||
| return: | ||
| names: | ||
| - value | ||
| struct: | ||
| types: | ||
| - i64 | ||
| - args: | ||
| - name: start | ||
| value: i32 | ||
| description: The starting value of the series | ||
| - name: stop | ||
| value: i32 | ||
| description: The ending value of the series (inclusive) | ||
| - name: step | ||
| value: i32 | ||
| description: The increment between values | ||
| constant: true | ||
| deterministic: true | ||
| sessionDependent: false | ||
| return: | ||
| names: | ||
| - value | ||
| struct: | ||
| types: | ||
| - i32 | ||
| - args: | ||
| - name: start | ||
| value: i64 | ||
| description: The starting value of the series | ||
| - name: stop | ||
| value: i64 | ||
| description: The ending value of the series (inclusive) | ||
| deterministic: true | ||
| sessionDependent: false | ||
| return: | ||
| names: | ||
| - value | ||
| struct: | ||
| types: | ||
| - i64 | ||
| - args: | ||
| - name: start | ||
| value: i32 | ||
| description: The starting value of the series | ||
| - name: stop | ||
| value: i32 | ||
| description: The ending value of the series (inclusive) | ||
| deterministic: true | ||
| sessionDependent: false | ||
| return: | ||
| names: | ||
| - value | ||
| struct: | ||
| types: | ||
| - i32 | ||
| - name: "unnest" | ||
| description: Expands a list literal into a set of rows, one row per element. | ||
| impls: | ||
| - args: | ||
| - name: input | ||
| value: "list<T>" | ||
| description: The list to unnest | ||
| deterministic: true | ||
| sessionDependent: false | ||
| # Schema references type parameter T from list<T> | ||
| # The field type is derived from the list element type | ||
| return: | ||
| names: | ||
| - element | ||
| struct: | ||
| types: | ||
| - T |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -552,6 +552,51 @@ message ExpandRel { | |
| } | ||
| } | ||
|
|
||
| // Invokes a table-valued function that produces a relation (zero or more records). | ||
| // | ||
| // | ||
| // Table functions produce a table with either: | ||
| // - A schema that can be derived based on argument types (type-parameterized functions) | ||
| // - A schema that depends on runtime data (use derived: false) | ||
| // | ||
| // Future extensions may add an optional input field to support transformation | ||
| // table functions that operate on input relations. | ||
| message TableFunctionRel { | ||
| RelCommon common = 1; | ||
|
|
||
| // Points to a function_anchor defined in this plan, which must refer | ||
| // to a table function in the associated YAML file. Avoid using | ||
| // anchor/reference zero. | ||
| uint32 function_reference = 2; | ||
|
|
||
| // The arguments to be bound to the function. This must have exactly the | ||
| // number of arguments specified in the function definition from the YAML file, | ||
| // and the argument types must also match exactly: | ||
| // | ||
| // - Value arguments must be bound using FunctionArgument.value. | ||
| // Currently (0-input functions only), expressions must be constants | ||
| // (literals or expressions evaluable without input data). | ||
| // - Type arguments must be bound using FunctionArgument.type. | ||
| // - Enum arguments must be bound using FunctionArgument.enum with a | ||
| // string that case-insensitively matches one of the allowed options. | ||
| repeated FunctionArgument arguments = 3; | ||
|
|
||
| // The derived fields indicates whether or not the YAML file produced the schema: | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Every other function type has a message TableFunctionRel {
...
repeated FunctionOption options = 4;
...
} |
||
| // - If true, the table_schema was produced purely from the type expressions in the | ||
| // YAML file + the types of the provided arguments | ||
| // - If false, the table_schema was produced by the plan producer | ||
| // | ||
| // This value is required to be true if and only if a schema is provided in the YAML | ||
| // definition of this function. | ||
| bool derived = 4; | ||
|
|
||
| // The schema of the output relation. This schema is required to match the implied schema | ||
| // by the YAML definition, if a schema is present in the definition. | ||
| NamedStruct table_schema = 5; | ||
|
|
||
| substrait.extensions.AdvancedExtension advanced_extension = 10; | ||
| } | ||
|
|
||
| // A relation with output field names. | ||
| // | ||
| // This is for use at the root of a `Rel` tree. | ||
|
|
@@ -581,6 +626,7 @@ message Rel { | |
| WriteRel write = 19; | ||
| DdlRel ddl = 20; | ||
| UpdateRel update = 22; | ||
| TableFunctionRel table_function = 23; | ||
| // Physical relations | ||
| HashJoinRel hash_join = 13; | ||
| MergeJoinRel merge_join = 14; | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -106,6 +106,19 @@ message FunctionSignature { | |
| } | ||
| } | ||
|
|
||
| message Table { | ||
| repeated Argument arguments = 2; | ||
| repeated string name = 3; | ||
| Description description = 4; | ||
|
|
||
| bool deterministic = 7; | ||
| bool session_dependent = 8; | ||
|
|
||
| NamedStruct schema = 9; | ||
|
|
||
| repeated Implementation implementations = 10; | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Other functions have variadic final argument logic like: message TableFunction {
...
oneof final_variable_behavior {
FinalArgVariadic variadic = 10;
FinalArgNormal normal = 11;
}
...
}However, I skipped it for now for simplicity. Is it necessary on a first pass? |
||
| } | ||
|
|
||
| message Description { | ||
| string language = 1; | ||
| string body = 2; | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a more appropriate name for this
GeneratorTableFunctionRelas introduced by@jacques-n here?
In my mind there are two possible paths we can go down which determine the appropriate name.
TableFunctionRelmakes sense.GeneratorTableRelor some name which distinguishes it from the other kind of table function.