-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add registry for extension functions #68
Conversation
This is currently blocked by #69 |
6628bfe
to
1efb3f0
Compare
src/substrait/function_registry.py
Outdated
for fpath in importlib_files("substrait.extensions").glob( # type: ignore | ||
"functions*.yaml" | ||
): | ||
uri = f"https://github.com/substrait-io/substrait/blob/main/extensions/{fpath.name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If someone has their own set of extension files they would be unable to use this routine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only meant to preload extensions from substrait repo by default for convenience, they would still be able to ignore these and call register_extension_yaml
method for their own extension files instead. Do you mean that they would waste resources by loading default extensions if they know they won't use it? I guess we can make it configurable in the constructor:
def __init__(self, load_default_extensions: bool = True)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If someone had their own directory of YAML files and wanted to use this routine they would instead have to do their own directory visiting sequence. Not a big deal but if we start going down the dialect path more (as discussed at the community meeting) this may be more common in the future (which means this is fine for now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. The reason I went with register_extension_yaml
and register_extension_dict
only was because I needed them for default extensions and tests, respectively. Beyond that, it's a bit unclear right now whether people will have a local directory of yamls available or maybe just a collection of http paths, so I decided to leave it up to the users. Having said that, we can always add other helper methods to make popular cases convenient, of course.
elif typ.startswith("u!"): | ||
return typ | ||
else: | ||
return _normalized_key_names[typ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the short name is used instead? Consider throwing a specific exception here so it's more clear what an end user should do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is using short names in extension yamls legal? I changed dict search to throw a specific exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quick glance of functions_arithmetic.yaml contains a lot of i8 and i16 arguments. If you put together a test which verifies that you pass all of the core YAML extensions you should be good to go.
My general practice when implementing a spec is to be exact as possible on output and lenient on input. But since we have the nearly full test set that should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instantiating the registry loads all extensions by default, so they are all implicitly being tested. I'll work on more explicit tests, as well, though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work!
I'm okay with putting this PR in as is but I'd like to see the test that runs against the extensions in Substrait Core to shake out any issues.
adds FunctionRegistry that handles lookup and type inference of extension functions.
uses derivation expressions under the hood to generate return types.
pyyaml is now part of
extensions
extra.