Skip to content

Resolve protobuf namespacing conflics #128

@vbarua

Description

@vbarua

In Go, it is possible to have a protocol buffer namespace conflict as all protocol buffer declarations in a binary are added to a global namespace.

This turns out to cause issues if you want to utilize substrait-go AND you also generate protobuf bindings directly from https://github.com/substrait-io/substrait for use in extensions, specifically if the versions don't line up. This is especially tricky in a multi-language environment, as the various libraries use different versions of the spec and updating all of them in lockstep is quite challenging.

The protobuf docs specifically indicate that vendoring protobufs as we do in substrait-go is a common cause of these issues and to be avoided. The docs indicate that

Users should avoid vendoring and instead depend on a centralized Go package for that .proto file.

Proposal

We should remove the vendored protobufs in substrait-go and replace them with a dedicated package. We can introduce https://github.com/substrait-io/substrait-protobuf as a dedicated repo for this generated code. To start with, it will only contain generate code for Go, but could potentially include other languages (i.e Java, Rust, etc) which the utility libraries can then depend on.

The idea with substrait-protobuf would be that once a week, we would regenerate the protobufs based on the most recently released tag of the spec. This is possible using buf with something like

buf generate https://github.com/substrait-io/substrait.git\#tag\=v0.64.0

To start with, this would be a manual process, but the intent would be to automate this.

Steps

Immediate

  1. Agree on substrait-protobuf as the repo for generated code, and review the structure of the generated Go code in https://github.com/substrait-io/substrait-protobuf/tree/main/substraitpb-go
  2. Update substrait-go to depend on the protos in substrait-protobuf
  3. Update the go_package of the protos themselves to point to substrait-protobuf

Longer Term

  • Automate the generation, release and tagging of new spec versions.

Alternatives

Generated Code in substrait

An alternative would be to host the generated code in https://github.com/substrait-io/substrait, however that would mean that we would need to regenerate code every time a protobuf change is made. That would make PR changes much noiser for the specification change. In the interest of making spec changes as easy and clear and possible, it is the opinion of the author that a separate repo is worth it even though it will require additional automation work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions