-
Notifications
You must be signed in to change notification settings - Fork 9
feat: extended expression builders #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: extended expression builders #71
Conversation
ACTION NEEDED Substrait follows the Conventional Commits The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
elif kind == "string": | ||
literal = stalg.Expression.Literal(string=value, nullable=type.string.nullability == stp.Type.NULLABILITY_NULLABLE) | ||
else: | ||
raise Exception(f"Unknown literal type - {type}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are also some unimplemented types here (such as dates, times, binary, uuid, etc.). Is it worth providing guidance here as to how to request/implement support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good point. I'll follow up with those implementations shortly and let's discuss this in that PR, if that's okay. Honestly I'm not sure how a literal builder for a "complex" type (something like IntervalCompound for example) should look like.
|
||
UnboundExpression = Callable[[stp.NamedStruct, FunctionRegistry], stee.ExtendedExpression] | ||
|
||
def literal(value: Any, type: stp.Type, alias: str = None) -> UnboundExpression: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to create a literal builder that works for both expressions and extended expressions (avoiding code duplication)? Perhaps we need to add more support to the other builder parts to accept either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a simple Expression builder makes a lot of sense, especially for literals. I'm simply trying to keep things uniform for starters as I don't have a clear idea how to tackle Expression builders for non-literal types.
src/substrait/extended_expression.py
Outdated
|
||
return resolve | ||
|
||
def column(name: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is working over a name instead of a column number. Substrait doesn't need names but I can see this being useful as a convenience. There should be a version that accepts a field number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added field number resolver as well.
|
||
UnboundExpression = Callable[[stp.NamedStruct, FunctionRegistry], stee.ExtendedExpression] | ||
|
||
def literal(value: Any, type: stp.Type, alias: str = None) -> UnboundExpression: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comments for these methods (for instance, note how names are used).
src/substrait/extended_expression.py
Outdated
|
||
def column(name: str): | ||
def resolve(base_schema: stp.NamedStruct, registry: FunctionRegistry) -> stee.ExtendedExpression: | ||
column_index = list(base_schema.names).index(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this resolution logic be handled in a separate function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll factor out the logic in a separate util as soon as another use for it appears. Does that sound okay?
|
||
return resolve | ||
|
||
def scalar_function(uri: str, function: str, *expressions: UnboundExpression, alias: str = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this accept regular expressions and extended expressions as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are 3 possible types to be used here: Expression
, ExtendedExpression
and UnboundExpression
. My thinking about pros/cons of each option is:
-
The downside of using a regular proto
Expression
type is thatExpression
s can't carry additional information regarding resources that are used in the expression tree, meaning that if you use an extension function or a cte (once this substrait PR is merged) that context is thrown away with a regularExpression
type.The same applies to a more Rel-level builders as well, for example a possible join builder can have one of these following signatures:
join(left: Rel, right: Rel, join_type: str, condition: Expression)
orjoin(left: Plan, right: Plan, join_type: str, condition: ExtendedExpression)
. I think we should prefer the second one because it allows for each object to independently hold full context of the extenstions/ctes used while building each object and allows them to be merged by the builder function later on. -
ExtendedExpression
is perfectly okay to use here with the only downside being that in order to build an ExtendedExpression the user needs to be aware of precisely in what context the expression is supposed to be used, in other words onceExtendedExpression
is built, you've already tied it to a specific Rel schema where it can be used. -
UnboundExpression
is the currying equivalent ofExtendedExpression
which allows the user to defer resolving precise Rel schema until later when it's actually used in the context of a specific Rel.
I guess we can probably introduce a type like Union[Expression, ExtendedExpression, UnboundExpression]
and use it in similar builder signatures, if we want to cover as many use cases as possible. We will have to then do some light type matching on the builder side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use case I'd like to enable is modifying an existing plan. I suppose you could convert an existing plan into a series of extended expressions. We don't need to do this right away though. A set of builders that work from scratch is a reasonable place to start and we can look at the other use cases later.
else: | ||
return 1 | ||
|
||
def merge_extension_uris(*extension_uris: Iterable[ste.SimpleExtensionURI]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment explaining what merge does (maintains order but removes duplicates) would be useful here.
Adds extended expression builder functions:
column
,literal
andscalar_function