- 
                Notifications
    
You must be signed in to change notification settings  - Fork 25
 
Description
A protobuf encoded Substrait plan contains function and type anchors, which are used to reference functions and types within a plan. The usage of these anchors improves plan readability by avoiding the need to inline functions and types everywhere in the plan. They may also, potentially, help reduce plan size if a function is use multiple types in a plan. These are good optimization for the serialization format.
In substrait-go, ScalarFunctions, AggregateFunctions, WindowFunctions structs contain both an anchor AND a reference to the underlying function variant, which effectively capture the same information (i.e. What specific function is this?).
The UserDefinedType struct does not contain an inlined type definition, so it is only meaningful when paired with a struct implementing the Set interface, which maps anchors to types. As a result, when working with or processing types in substrait-go we must always have access to Set. This is easily overlooked. For example, the underlying issue in #147 was that the ResolveType method didn't have access to a or use a Set to register user-defined types when handling functions that returned user-define types.
One way to address this would be to fully inline a type definition into the UserDefineType struct, like we do for functions. We should do this, but then we should go further. A question to ask ourselves here is, why do we even need the anchors at the library-level? The anchor is redundant information except when we are serializing functions and deserializing information. We are carrying this extra bit of information through all the system that's only needed for serde, and periodically we shoot ourselves in the foot because we fail to update it.
What we could do instead is fully remove the anchors from the substrait-go domain model and make them purely a deserialization and serialization concern. Effectively, when converting a protobuf plan to its Go representation we use the anchors in the protobuf plan to look up the associated function and user-defined types and inline full definitions into the Go plan. When converting from a Go plan to protobuf Plan, we can generate anchors dynamically when we process functions and user-defined types. This is effectively the strategy used by substrait-java in its ProtoPlanConverter and PlanProtoConverter.
This approach allows us to isolate the usage of anchors to the serde domain, and keep the pure Go API simpler.