Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add registry for extension functions #68

Merged
merged 7 commits into from
Mar 14, 2025

Conversation

tokoko
Copy link
Contributor

@tokoko tokoko commented Mar 2, 2025

adds FunctionRegistry that handles lookup and type inference of extension functions.
uses derivation expressions under the hood to generate return types.
pyyaml is now part of extensions extra.

@tokoko
Copy link
Contributor Author

tokoko commented Mar 8, 2025

This is currently blocked by #69

@tokoko tokoko force-pushed the function-registry branch from 6628bfe to 1efb3f0 Compare March 10, 2025 18:11
for fpath in importlib_files("substrait.extensions").glob( # type: ignore
"functions*.yaml"
):
uri = f"https://github.com/substrait-io/substrait/blob/main/extensions/{fpath.name}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone has their own set of extension files they would be unable to use this routine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only meant to preload extensions from substrait repo by default for convenience, they would still be able to ignore these and call register_extension_yaml method for their own extension files instead. Do you mean that they would waste resources by loading default extensions if they know they won't use it? I guess we can make it configurable in the constructor:

def __init__(self, load_default_extensions: bool = True)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone had their own directory of YAML files and wanted to use this routine they would instead have to do their own directory visiting sequence. Not a big deal but if we start going down the dialect path more (as discussed at the community meeting) this may be more common in the future (which means this is fine for now).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. The reason I went with register_extension_yaml and register_extension_dict only was because I needed them for default extensions and tests, respectively. Beyond that, it's a bit unclear right now whether people will have a local directory of yamls available or maybe just a collection of http paths, so I decided to leave it up to the users. Having said that, we can always add other helper methods to make popular cases convenient, of course.

elif typ.startswith("u!"):
return typ
else:
return _normalized_key_names[typ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the short name is used instead? Consider throwing a specific exception here so it's more clear what an end user should do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is using short names in extension yamls legal? I changed dict search to throw a specific exception.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick glance of functions_arithmetic.yaml contains a lot of i8 and i16 arguments. If you put together a test which verifies that you pass all of the core YAML extensions you should be good to go.

My general practice when implementing a spec is to be exact as possible on output and lenient on input. But since we have the nearly full test set that should be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instantiating the registry loads all extensions by default, so they are all implicitly being tested. I'll work on more explicit tests, as well, though

Copy link
Member

@EpsilonPrime EpsilonPrime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work!

I'm okay with putting this PR in as is but I'd like to see the test that runs against the extensions in Substrait Core to shake out any issues.

@EpsilonPrime EpsilonPrime merged commit 912fc09 into substrait-io:main Mar 14, 2025
17 checks passed
@tokoko tokoko deleted the function-registry branch March 14, 2025 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants