- 
                Notifications
    
You must be signed in to change notification settings  - Fork 25
 
feat: enable handling of URNs alongside URIs #166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Basically just a grep replace with adjustments to tests to get everything passing. This does _not_ handle backwards compatibility, but rather just fully deletes any notion really of URI. The next commit will be to add in the backwards compat feature.
The strategy here is to leverage the Collection to perform the uri -> urn resolution. When an extension is encountered in a plan, it checks for the URN anchor reference, and uses it if it is found. If it is not found, then it looks for a URI anchor reference and a corresponding URI -> URN mapping in the collection. If that is also not found, then we are in an error state.
7cccd68    to
    9869bdc      
    Compare
  
    
          Codecov Report❌ Patch coverage is  
 Additional details and impacted files@@            Coverage Diff             @@
##             main     #166      +/-   ##
==========================================
+ Coverage   66.35%   66.79%   +0.44%     
==========================================
  Files          45       45              
  Lines       11688    11839     +151     
==========================================
+ Hits         7755     7908     +153     
+ Misses       3602     3586      -16     
- Partials      331      345      +14     ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found time to do a partial review. Will finish the rest tomorrow.
| // ID is the unique identifier for a substrait object | ||
| type ID struct { | ||
| URI string | ||
| URN string | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge comment note: This is a breaking change
| Args() FuncParameterList | ||
| Options() map[string]Option | ||
| URI() string | ||
| URN() string | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pr merge note: this is a breaking change
| 
               | 
          ||
| type Function interface { | ||
| ResolveURI(uri string) []FunctionVariant | ||
| ResolveURN(urn string) []FunctionVariant | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pr merge note: this is a breaking change
| { | ||
| MappingType: &extensionspb.SimpleExtensionDeclaration_ExtensionFunction_{ | ||
| ExtensionFunction: &extensionspb.SimpleExtensionDeclaration_ExtensionFunction{ | ||
| ExtensionUrnReference: 0, // Zero reference - should be treated as "no reference" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Zero reference - should be treated as "no reference"
I don't think this is true? The spec recommend against using 0 as an anchor, but doesn't prohibit it as a far as I know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... this could pose an issue. We can't differentiate between a zero reference and an absent reference. I was making the implicit assumption that zero reference could be assumed to be an absent reference, but that may not be the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed a fix offline. In order of precedence:
- non-zero urn ref
 - non-zero uri ref
 - zero urn ref (if it resolves and uri doesn't)
 - zero uri ref (if it resolves and urn doesn't)
 - both if both resolve but throw error if they differ.
 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks reasonable. Did leave another pair of comments about 0 value references.
Because of the inability to differentiate zero references from absent references, we resolve in the following order of precedence 1. via non-zero urn 2. via non-zero uri 3. via both zero urn and zero uri if both resolve to a value (but check they are the same) 4. via zero urn if only it resolves to a value 5. via zero uri if only it resolves to a value Otherwise, we encounter an error case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM. The tests are very thorough. Did leave one suggestion for an error message.
closes #164
This PR handles the migration from URI -> URN. It does so by forcing at extension load time the existence of both a URI and a URN for a simple extension. It then internally converts everything to be represented via URNs.
When
FromProtois called on plans, resolution on extensions is done by URN reference if it exists. Otherwise, an attempt is made to do it via URI, which only works if there is an appropriate mapping URI -> URN (so that internally it can be represented as a URN).When
ToProtois called, information on both URI and URN are produced.The above changes are reflected in the tests. Once the migration is complete, we essentially want to return the codebase to an intermediate commit of this PR, which reflects the codebase in a state where only URN is properly handled. To see only the commits after the previously linked commit, i.e. see all of the changes that actually reflect the logic to make the migration preserve URIs, see here.
I took great care to make the git history readable, so I recommend reviewing the PR one commit at a time.