Skip to content

feat: python native JSON format parser #53

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 17, 2024
Merged

Conversation

amol-
Copy link
Contributor

@amol- amol- commented Apr 10, 2024

Add substrait.json.load_json and substrait.json.parse_json functions able to load
the JSON representation of a Substrait plan to a substrait.proto.Plan object.

It also adds the substrait-cpp repository as a git submodule to reuse the test files.
This is reasonable because we might end up using the cpp library in the future
to create bindings to other features too, so it's helpful to already have it as a submodule.

Copy link

ACTION NEEDED

Substrait follows the Conventional Commits
specification
for
release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@EpsilonPrime
Copy link
Member

You can also use the Protobuf json_format module to load/save these files directly:

https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html

@amol-
Copy link
Contributor Author

amol- commented Apr 10, 2024

You can also use the Protobuf json_format module to load/save these files directly:

https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html

That's great, I see it also deals with the keyname transposition preserving_proto_field_name, do you know if it handles the base64 translation too?

Co-authored-by: Matthijs Brobbel <[email protected]>
@amol- amol- changed the title feat: Python native JSON format parser feat: python native JSON format parser Apr 10, 2024
@EpsilonPrime
Copy link
Member

You can also use the Protobuf json_format module to load/save these files directly:
https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html

That's great, I see it also deals with the keyname transposition preserving_proto_field_name, do you know if it handles the base64 translation too?

It should handle binary content correctly as it already knows the final types of the data. That's probably why it doesn't matter that length and offset are strings.

I doubt it handles the comments I put at the top of those files though - that's nonstandard but I found it useful while developing to understand what the plans were doing.

@amol-
Copy link
Contributor Author

amol- commented Apr 10, 2024

You can also use the Protobuf json_format module to load/save these files directly:
https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html

That's great, I see it also deals with the keyname transposition preserving_proto_field_name, do you know if it handles the base64 translation too?

It should handle binary content correctly as it already knows the final types of the data. That's probably why it doesn't matter that length and offset are strings.

I doubt it handles the comments I put at the top of those files though - that's nonstandard but I found it useful while developing to understand what the plans were doing.

Ok, I'll try to switch the implementation to use google.protobuf.json_format.Parse and see if anything breaks

@amol-
Copy link
Contributor Author

amol- commented Apr 10, 2024

Ok, I'll try to switch the implementation to use google.protobuf.json_format.Parse and see if anything breaks

Moved to using json_format.Parse, all tests passing like before

@amol- amol- marked this pull request as ready for review April 10, 2024 16:28
@@ -0,0 +1,25 @@
from google.protobuf import json_format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three Google ways of loading/saving protobuffers:

binary
JSON format
text format

It's worth considering how you want to handle the four different types (including the Substrait text format). The C++ package implements methods to read/write any.

def _strip_json_comments(jsonfile):
# The JSON files in the cpp testsuite are prefixed with
# a comment containing the SQL that matches the json plan.
# As Python JSON parser doesn't support comments,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically none of the parsers support the comments including the json_format library.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, there are parsers that support comments, usually in the C/C++ style, but yes in general the specification had avoided comments to make sure they couldn't be abused to implement parser directives.

@amol-
Copy link
Contributor Author

amol- commented Apr 17, 2024

Is this ok to merge? I'd like to use this for further work I'm planning to contribute to Substrait so it would be helpful to have it published

@gforsyth
Copy link
Member

Not a blocker, but adding the substrait-cpp repo as a submodule just for the json test files feels a little bloated.

@amol-
Copy link
Contributor Author

amol- commented Apr 17, 2024

Not a blocker, but adding the substrait-cpp repo as a submodule just for the json test files feels a little bloated.

I could have copied the data, but given that we are already planning to add it as a dependency for textplan, it made sense to avoid a copy and use the test files directly from the submodule

@gforsyth
Copy link
Member

I could have copied the data, but given that we are already planning to add it as a dependency for textplan, it made sense to avoid a copy and use the test files directly from the submodule

That's reasonable. I prefer subtree to submodule but I think that ship has sailed.

@gforsyth gforsyth merged commit 783e68a into substrait-io:main Apr 17, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants