Skip to content

Conversation

@benbellick
Copy link
Member

@benbellick benbellick commented Sep 18, 2025

closes #164

This PR handles the migration from URI -> URN. It does so by forcing at extension load time the existence of both a URI and a URN for a simple extension. It then internally converts everything to be represented via URNs.

When FromProto is called on plans, resolution on extensions is done by URN reference if it exists. Otherwise, an attempt is made to do it via URI, which only works if there is an appropriate mapping URI -> URN (so that internally it can be represented as a URN).

When ToProto is called, information on both URI and URN are produced.

The above changes are reflected in the tests. Once the migration is complete, we essentially want to return the codebase to an intermediate commit of this PR, which reflects the codebase in a state where only URN is properly handled. To see only the commits after the previously linked commit, i.e. see all of the changes that actually reflect the logic to make the migration preserve URIs, see here.

I took great care to make the git history readable, so I recommend reviewing the PR one commit at a time.

Basically just a grep replace with adjustments to tests to get
everything passing. This does _not_ handle backwards compatibility,
but rather just fully deletes any notion really of URI. The next
commit will be to add in the backwards compat feature.
The strategy here is to leverage the Collection to perform the uri ->
urn resolution. When an extension is encountered in a plan, it checks
for the URN anchor reference, and uses it if it is found. If it is not
found, then it looks for a URI anchor reference and a corresponding
URI -> URN mapping in the collection. If that is also not found, then
we are in an error state.
@benbellick benbellick changed the title Gracefully handle URI->URN migration Gracefully handle URI->URN migration (Attempt 2) Sep 18, 2025
@benbellick benbellick marked this pull request as ready for review September 18, 2025 19:33
@vbarua vbarua self-requested a review September 18, 2025 21:38
@codecov
Copy link

codecov bot commented Sep 18, 2025

Codecov Report

❌ Patch coverage is 66.66667% with 80 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.79%. Comparing base (1803339) to head (c6bdd57).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
extensions/extension_mgr.go 65.90% 50 Missing and 10 partials ⚠️
extensions/variants.go 56.25% 7 Missing ⚠️
extensions/simple_extension.go 50.00% 6 Missing ⚠️
expr/expression.go 70.00% 2 Missing and 1 partial ⚠️
plan/plan.go 57.14% 2 Missing and 1 partial ⚠️
plan/builders.go 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #166      +/-   ##
==========================================
+ Coverage   66.35%   66.79%   +0.44%     
==========================================
  Files          45       45              
  Lines       11688    11839     +151     
==========================================
+ Hits         7755     7908     +153     
+ Misses       3602     3586      -16     
- Partials      331      345      +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@vbarua vbarua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found time to do a partial review. Will finish the rest tomorrow.

// ID is the unique identifier for a substrait object
type ID struct {
URI string
URN string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge comment note: This is a breaking change

Args() FuncParameterList
Options() map[string]Option
URI() string
URN() string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pr merge note: this is a breaking change


type Function interface {
ResolveURI(uri string) []FunctionVariant
ResolveURN(urn string) []FunctionVariant
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pr merge note: this is a breaking change

@vbarua vbarua changed the title Gracefully handle URI->URN migration (Attempt 2) feat: enable handling of URNs alongside URIs Sep 18, 2025
@benbellick benbellick requested a review from vbarua September 22, 2025 15:18
{
MappingType: &extensionspb.SimpleExtensionDeclaration_ExtensionFunction_{
ExtensionFunction: &extensionspb.SimpleExtensionDeclaration_ExtensionFunction{
ExtensionUrnReference: 0, // Zero reference - should be treated as "no reference"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zero reference - should be treated as "no reference"

I don't think this is true? The spec recommend against using 0 as an anchor, but doesn't prohibit it as a far as I know.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... this could pose an issue. We can't differentiate between a zero reference and an absent reference. I was making the implicit assumption that zero reference could be assumed to be an absent reference, but that may not be the case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed a fix offline. In order of precedence:

  1. non-zero urn ref
  2. non-zero uri ref
  3. zero urn ref (if it resolves and uri doesn't)
  4. zero uri ref (if it resolves and urn doesn't)
  5. both if both resolve but throw error if they differ.

Copy link
Member

@vbarua vbarua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks reasonable. Did leave another pair of comments about 0 value references.

Because of the inability to differentiate zero references from absent references,
we resolve in the following order of precedence
1. via non-zero urn
2. via non-zero uri
3. via both zero urn and zero uri if both resolve to a value (but check they are the same)
4. via zero urn if only it resolves to a value
5. via zero uri if only it resolves to a value
Otherwise, we encounter an error case.
@benbellick benbellick requested a review from vbarua September 23, 2025 15:12
Copy link
Member

@vbarua vbarua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM. The tests are very thorough. Did leave one suggestion for an error message.

@vbarua vbarua merged commit fd45ef9 into substrait-io:main Sep 25, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate extensions from URIs to URNs

2 participants