-
Notifications
You must be signed in to change notification settings - Fork 9
feat: python native JSON format parser #53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ACTION NEEDED Substrait follows the Conventional Commits The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
You can also use the Protobuf json_format module to load/save these files directly: https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html |
That's great, I see it also deals with the keyname transposition |
Co-authored-by: Matthijs Brobbel <[email protected]>
It should handle binary content correctly as it already knows the final types of the data. That's probably why it doesn't matter that length and offset are strings. I doubt it handles the comments I put at the top of those files though - that's nonstandard but I found it useful while developing to understand what the plans were doing. |
Ok, I'll try to switch the implementation to use |
Moved to using |
@@ -0,0 +1,25 @@ | |||
from google.protobuf import json_format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are three Google ways of loading/saving protobuffers:
binary
JSON format
text format
It's worth considering how you want to handle the four different types (including the Substrait text format). The C++ package implements methods to read/write any.
def _strip_json_comments(jsonfile): | ||
# The JSON files in the cpp testsuite are prefixed with | ||
# a comment containing the SQL that matches the json plan. | ||
# As Python JSON parser doesn't support comments, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically none of the parsers support the comments including the json_format library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, there are parsers that support comments, usually in the C/C++ style, but yes in general the specification had avoided comments to make sure they couldn't be abused to implement parser directives.
Is this ok to merge? I'd like to use this for further work I'm planning to contribute to Substrait so it would be helpful to have it published |
Not a blocker, but adding the |
I could have copied the data, but given that we are already planning to add it as a dependency for textplan, it made sense to avoid a copy and use the test files directly from the submodule |
That's reasonable. I prefer |
Add
substrait.json.load_json
andsubstrait.json.parse_json
functions able to loadthe JSON representation of a Substrait plan to a
substrait.proto.Plan
object.It also adds the
substrait-cpp
repository as a git submodule to reuse the test files.This is reasonable because we might end up using the cpp library in the future
to create bindings to other features too, so it's helpful to already have it as a submodule.