- 
                Notifications
    
You must be signed in to change notification settings  - Fork 16
 
Description
This project seems to upgrade to the latest version of Substrait at a pretty regular basis. This is nice for users who want to use the latest version but also makes it difficult to find and use the version of substrait-python for a specific version of Substrait.
For example, substrait-mlir uses Substrait v0.42.1 and substrait-python v0.12.1 for testing some end-to-end Python use cases. Now, we can't easily upgrade to pyarrow v16 or higher because those versions require substrait-python v0.15 or higher (due to how the generated proto files are packaged, AFAIU), which includes a different version of Substrait than what we use.
Note that Substrait regularly introduces breaking changes, in particular, to the text format, so it is not generally possible to use the protobuf definitions of a newer version.
To remedy the situation, I can imagine three things:
- Add a table to the README and/or PyPI that shows the correspondance between the version(s) of 
substrait-pythonand Substrait. This would make it easier to find the right version ofsubstrait-python. In some situations, that may be all that is necessary. Currently, the only way I know how to do this is to install several versions ofsubstrait-pythonand printsubstrait.__substrait_version__, which isn't ideal. - We might want to investigate whether it isn't possible to package several versions of Substrait inside of 
substrait-python, for example, undersubstrait.v0_74_0andsubstrait.v0_42_1, possibly with the latest version living insubstraitas well. - Another possibility would be to separate the packaging of the generated proto files from the other functionality of this repository. 
substrait-mlirand, AFAIU,pyarrowonly need the former. Such a package could either aim to be "released once" such that, for every version of Substrait, there would be a release of that package that would never need to be updated (ideally with the same versioning scheme as Substrait itself), or use the multi-version scheme from the previous point.