- 
                Notifications
    
You must be signed in to change notification settings  - Fork 25
 
Description
In Go, it is possible to have a protocol buffer namespace conflict as all protocol buffer declarations in a binary are added to a global namespace.
This turns out to cause issues if you want to utilize substrait-go AND you also generate protobuf bindings directly from https://github.com/substrait-io/substrait for use in extensions, specifically if the versions don't line up. This is especially tricky in a multi-language environment, as the various libraries use different versions of the spec and updating all of them in lockstep is quite challenging.
The protobuf docs specifically indicate that vendoring protobufs as we do in substrait-go is a common cause of these issues and to be avoided. The docs indicate that
Users should avoid vendoring and instead depend on a centralized Go package for that .proto file.
Proposal
We should remove the vendored protobufs in substrait-go and replace them with a dedicated package. We can introduce https://github.com/substrait-io/substrait-protobuf as a dedicated repo for this generated code. To start with, it will only contain generate code for Go, but could potentially include other languages (i.e Java, Rust, etc) which the utility libraries can then depend on.
The idea with substrait-protobuf would be that once a week, we would regenerate the protobufs based on the most recently released tag of the spec. This is possible using buf with something like
buf generate https://github.com/substrait-io/substrait.git\#tag\=v0.64.0To start with, this would be a manual process, but the intent would be to automate this.
Steps
Immediate
- Agree on substrait-protobuf as the repo for generated code, and review the structure of the generated Go code in https://github.com/substrait-io/substrait-protobuf/tree/main/substraitpb-go
 - Update substrait-go to depend on the protos in substrait-protobuf
 - Update the 
go_packageof the protos themselves to point to substrait-protobuf 
Longer Term
- Automate the generation, release and tagging of new spec versions.
 
Alternatives
Generated Code in substrait
An alternative would be to host the generated code in https://github.com/substrait-io/substrait, however that would mean that we would need to regenerate code every time a protobuf change is made. That would make PR changes much noiser for the specification change. In the interest of making spec changes as easy and clear and possible, it is the opinion of the author that a separate repo is worth it even though it will require additional automation work.