-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Add function metadata ability to push down struct argument in optimizer #25175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,6 +24,7 @@ | |
import com.facebook.presto.spi.function.Signature; | ||
import com.facebook.presto.spi.function.SqlInvokedScalarFunction; | ||
import com.facebook.presto.spi.function.SqlType; | ||
import com.facebook.presto.spi.function.TypeVariableConstraint; | ||
import com.google.common.collect.ImmutableList; | ||
import com.google.common.collect.ImmutableSet; | ||
|
||
|
@@ -106,6 +107,7 @@ private static SqlScalarFunction parseParametricScalar(ScalarHeaderAndMethods sc | |
Map<SpecializedSignature, ParametricScalarImplementation.Builder> signatures = new HashMap<>(); | ||
for (Method method : scalar.getMethods()) { | ||
ParametricScalarImplementation implementation = ParametricScalarImplementation.Parser.parseImplementation(header, method, constructor); | ||
checkPushdownSubfieldArgIndex(implementation, header, method); | ||
if (!signatures.containsKey(implementation.getSpecializedSignature())) { | ||
ParametricScalarImplementation.Builder builder = new ParametricScalarImplementation.Builder( | ||
implementation.getSignature(), | ||
|
@@ -155,4 +157,21 @@ public Set<Method> getMethods() | |
return methods; | ||
} | ||
} | ||
|
||
private static void checkPushdownSubfieldArgIndex(ParametricScalarImplementation implementation, ScalarImplementationHeader header, Method method) | ||
{ | ||
Optional<Integer> pushdownSubfieldArgIndex = header.getHeader().getPushdownSubfieldArgIndex(); | ||
if (pushdownSubfieldArgIndex.isPresent()) { | ||
Map<String, String> typeConstraintMapping = new HashMap<>(); | ||
Signature signature = implementation.getSignature(); | ||
for (TypeVariableConstraint constraint : signature.getTypeVariableConstraints()) { | ||
typeConstraintMapping.put(constraint.getName(), constraint.getVariadicBound()); | ||
} | ||
|
||
checkCondition(pushdownSubfieldArgIndex.get() >= 0, FUNCTION_IMPLEMENTATION_ERROR, "Method [%s] has negative pushdown subfield arg index", method); | ||
checkCondition(signature.getArgumentTypes().size() > pushdownSubfieldArgIndex.get(), FUNCTION_IMPLEMENTATION_ERROR, "Method [%s] has out of range pushdown subfield arg index", method); | ||
String typeVariableName = signature.getArgumentTypes().get(pushdownSubfieldArgIndex.get()).toString(); | ||
checkCondition(typeVariableName.equals(com.facebook.presto.common.type.StandardTypes.ROW) || typeConstraintMapping.get(typeVariableName).equals(com.facebook.presto.common.type.StandardTypes.ROW), FUNCTION_IMPLEMENTATION_ERROR, "Method [%s] does not have a struct or row type as pushdown subfield arg", method); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same comment about maybe this is too restrictive and need to add some other conditions. |
||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -33,4 +33,6 @@ | |
boolean deterministic() default true; | ||
|
||
boolean calledOnNullInput() default false; | ||
|
||
int pushdownSubfieldArgIndex() default -1; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This interface method, and the ones on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes they are specified in the annotation. If it is not specified, then it is -1 by default Annotation looks like this
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add some documentation on how this needs to be used to our documentation? Also, why not add an annotation to the argument itself? Something like, Also, your example reference internal queries, can you add pastes of the explain plans? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you also add some tests? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of using annotation on the argument, I have chosen to pass the argindex in the codegen decorator because this allows it to be inside the FunctionMetadata. I added some tests to TestHiveLogicalPlanner. One more change I will make is to perform some validation that the argIndex specified does correspond to a rowtype. And throw a warning when the code path is not reached due to invalid index There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will add this description to the documentation once this change is merged in pushdownSubfieldArgIndex is used to specify which parameter in the scalar function corresponds to the parameter that should have its subfields pushed down to filter plan nodes during query planning and optimization. This is helpful to reduce the amount of scanning done for queries involving structs, and it ensures that only utilized subfields of the struct scanned and unused subfields can be pruned from the query plan. In the below example, the pushdownSubfieldArgIndex is set to 0, which means that the first parameter of the custom_struct_with_passthrough function will have its subfields be pushed down to filter plan nodes. So for a query such as
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a non-trivial example you can add? And can you please add the documentation with the current PR? Correct me if I'm wrong, this field basically says to the planner, "I'm OK if the function receives a pruned version of the struct, because this function just transparently utilizes and returns this argument." But I'm really struggling to understand under what circumstance that would ever apply (or, how you would ever come up with a non-trivial example). How does If this acknowledged to be esoteric, wouldn't it just make sense to add it as a field in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fb_reshape_row is basically a specialized cast. it makes it easier to work with rows that might have had subfields (or with any layer of nesting) added to them, types changed, etc by being able to take a row of one form and cast it to some other shape. it doesn't care if any fields are null/empty/missing. |
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or has no typeConstraintMapping entry (maybe no constraint on the type) or typeConstraintMapping value is null and TypeVariableConstraint.nonDecimalNumericRequired is false (i.e. there's no constraint preventing row type?) ?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is too restrictive. The argument index is only checked for these conditions if the pushdownSubfieldArgIndex.isPresent().
Also, the typeVariableName should always be present because signature.getArgumentTypes() always contains the same number of parameters as the function signature. There is always a type constraint imposed on a positional argument.
I think you might have it confused with typeVariableConstraints. typeVariableConstraints might be empty if we don't use an alias for a type (T for RowType). typeConstraintMapping is built using typeVariableConstraints.
So my check ensures that the parameter is either directly named a "row" type, or it has a type alias that maps to a "row" type
In summary: typeVariableConstraints is retreived from this annotation:
@TypeParameter(value = "T", boundedBy = ROW)
argumentTypes is retrieved from the function definition
public static Block customStructWithPassthrough(@SqlType("T") Block struct)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make it clearer, heres an example of what the variables look like when the function takes in more than one parameter