-
Notifications
You must be signed in to change notification settings - Fork 187
feat: introduction of simple table functions #876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| // string that case-insensitively matches one of the allowed options. | ||
| repeated FunctionArgument arguments = 3; | ||
|
|
||
| // The derived fields indicates whether or not the YAML file produced the schema: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every other function type has a FunctionOption field around here. Do we need that for table functions as well?
message TableFunctionRel {
...
repeated FunctionOption options = 4;
...
}
text/simple_extensions_schema.yaml
Outdated
| $ref: "#/$defs/sessionDependent" | ||
| deterministic: | ||
| $ref: "#/$defs/deterministic" | ||
| schema: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to come up with a more explicit way to mark the inability of deriving an output schema for a particular table function? Or is it sufficient to just let the absence of the schema field indicate this?
| // | ||
| // Future extensions may add an optional input field to support transformation | ||
| // table functions that operate on input relations. | ||
| message TableFunctionRel { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a more appropriate name for this GeneratorTableFunctionRel as introduced by
@jacques-n here?
In my mind there are two possible paths we can go down which determine the appropriate name.
- We could explore expanding this proto (by e.g. adding a relations input field) so that it appropriately models both kinds of table functions. In this case, keeping the name as
TableFunctionRelmakes sense. - We could introduce a brand new relation to represent table functions which take relations as input. In this case, it would make sense to rename this relation to
GeneratorTableRelor some name which distinguishes it from the other kind of table function.
proto/substrait/function.proto
Outdated
| } | ||
| } | ||
|
|
||
| message TableFunction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this to be consistent with the other function definitions, though AFAICT, this isn't really used anywhere. I found #193 from a few years ago which suggests that this should continue to be updated, but I'm not sure if that is still the prevailing philosophy.
|
|
||
| NamedStruct schema = 9; | ||
|
|
||
| repeated Implementation implementations = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other functions have variadic final argument logic like:
message TableFunction {
...
oneof final_variable_behavior {
FinalArgVariadic variadic = 10;
FinalArgNormal normal = 11;
}
...
}However, I skipped it for now for simplicity. Is it necessary on a first pass?
|
|
||
| - **Transformation table functions**: Functions that take an input relation and transform it (by adding an optional `input` field to `TableFunctionRel`) | ||
| - **Set-returning functions**: Functions that process input records and produce multiple output records per input | ||
| - **Lateral joins**: Applying table functions to each row of an input relation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we would be able to model something like:
SELECT unnest(col) FROM table;However, my understanding is that this is really syntactic sugar for:
SELECT u.value
FROM table t
CROSS JOIN LATERAL unnest(t.col) AS u(value);If that is correct, then we really need LateralJoinRel to properly model this usage of unnest.
It is probably worth then clarifying somewhere in the documentation introduced in this PR that the arguments to a table function can't be field references.
| # If the return value for the function implementation is multiple lines long, | ||
| # print each line separately. This is the case for some functions in | ||
| # functions_arithmetic_decimal.yaml | ||
| if "\n" in impl["return"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: this could probably use a little work to make it a bit nicer. I'll work on that later once we have some general agreement on the spec changes.
The change here is for now just to get CI/CD to pass.
62ca59b to
2b04636
Compare
introduces a notion of YAML-defined functions which are represented as a new `TableFunctionRel`. Closes #823
This is for consistency with other functions
2b04636 to
26fec6b
Compare
| - **Determinism**: Whether the function produces the same output for the same inputs | ||
| - **Session Dependency**: Whether the function depends on session state | ||
|
|
||
| ## Schema Determination |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: I think this section could use a little bit of work:
- add some examples with YAML files to demonstrate the difference between derived / explicit
- Make the fact that there are these two kinds a bit clearer.
f707684 to
b090c4e
Compare
TableFunctionRel(take in no relations)Closes #823