-
Notifications
You must be signed in to change notification settings - Fork 9
feat: python native JSON format parser #53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
cae8153
dfe234f
7154e69
e6fb60a
d77f615
904db71
fe7d6c1
00e9b4b
a813599
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -32,4 +32,4 @@ jobs: | |
python -m pip install ".[test]" | ||
- name: Run tests | ||
run: | | ||
python -m pytest | ||
python -m pytest tests | ||
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,6 @@ | ||
[submodule "third_party/substrait"] | ||
path = third_party/substrait | ||
url = https://github.com/substrait-io/substrait | ||
[submodule "third_party/substrait-cpp"] | ||
path = third_party/substrait-cpp | ||
url = https://github.com/substrait-io/substrait-cpp |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
from google.protobuf import json_format | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are three Google ways of loading/saving protobuffers: binary It's worth considering how you want to handle the four different types (including the Substrait text format). The C++ package implements methods to read/write any. |
||
|
||
from substrait.proto import Plan | ||
|
||
|
||
def load_json(filename): | ||
"""Load a Substrait Plan from a json file""" | ||
with open(filename, encoding="utf-8") as f: | ||
return parse_json(f.read()) | ||
|
||
|
||
def parse_json(text): | ||
"""Generate a Substrait Plan from its JSON definition""" | ||
return json_format.Parse(text=text, message=Plan()) | ||
|
||
|
||
def write_json(plan, filename): | ||
"""Write a Substrait Plan to a json file""" | ||
with open(filename, "w+") as f: | ||
f.write(dump_json(plan)) | ||
|
||
|
||
def dump_json(plan): | ||
"""Dump a Substrait Plan to a string in JSON format""" | ||
return json_format.MessageToJson(plan) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
import os | ||
import pathlib | ||
import tempfile | ||
import json | ||
|
||
from substrait.proto import Plan | ||
from substrait.json import load_json, parse_json, dump_json, write_json | ||
|
||
import pytest | ||
|
||
|
||
JSON_FIXTURES = ( | ||
pathlib.Path(os.path.dirname(__file__)) | ||
/ ".." | ||
EpsilonPrime marked this conversation as resolved.
Show resolved
Hide resolved
|
||
/ "third_party" | ||
/ "substrait-cpp" | ||
/ "src" | ||
/ "substrait" | ||
/ "textplan" | ||
/ "data" | ||
) | ||
JSON_TEST_FILE = sorted(JSON_FIXTURES.glob("*.json")) | ||
JSON_TEST_FILENAMES = [path.name for path in JSON_TEST_FILE] | ||
EpsilonPrime marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
@pytest.mark.parametrize("jsonfile", JSON_TEST_FILE, ids=JSON_TEST_FILENAMES) | ||
def test_json_load(jsonfile): | ||
with open(jsonfile) as f: | ||
jsondata = _strip_json_comments(f) | ||
parsed_plan = parse_json(jsondata) | ||
|
||
# Save to a temporary file so we can test load_json | ||
# on content stripped of comments. | ||
with tempfile.TemporaryDirectory() as tmpdir: | ||
# We use a TemporaryDirectory as on Windows NamedTemporaryFile | ||
# doesn't allow for easy reopening of the file. | ||
with open(pathlib.Path(tmpdir) / "jsonfile.json", "w+") as stripped_file: | ||
stripped_file.write(jsondata) | ||
loaded_plan = load_json(stripped_file.name) | ||
|
||
# The Plan constructor itself will throw an exception | ||
# in case there is anything wrong in parsing the JSON | ||
# so we can take for granted that if the plan was created | ||
# it is a valid plan in terms of protobuf definition. | ||
assert type(loaded_plan) is Plan | ||
|
||
# Ensure that when loading from file or from string | ||
# the outcome is the same | ||
assert parsed_plan == loaded_plan | ||
|
||
|
||
@pytest.mark.parametrize("jsonfile", JSON_TEST_FILE, ids=JSON_TEST_FILENAMES) | ||
def test_json_roundtrip(jsonfile): | ||
with open(jsonfile) as f: | ||
jsondata = _strip_json_comments(f) | ||
|
||
parsed_plan = parse_json(jsondata) | ||
assert parse_json(dump_json(parsed_plan)) == parsed_plan | ||
|
||
# Test with write/load | ||
with tempfile.TemporaryDirectory() as tmpdir: | ||
filename = pathlib.Path(tmpdir) / "jsonfile.json" | ||
write_json(parsed_plan, filename) | ||
assert load_json(filename) == parsed_plan | ||
|
||
|
||
def _strip_json_comments(jsonfile): | ||
# The JSON files in the cpp testsuite are prefixed with | ||
# a comment containing the SQL that matches the json plan. | ||
# As Python JSON parser doesn't support comments, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Technically none of the parsers support the comments including the json_format library. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, there are parsers that support comments, usually in the C/C++ style, but yes in general the specification had avoided comments to make sure they couldn't be abused to implement parser directives. |
||
# we have to strip them to make the content readable | ||
return "\n".join(l for l in jsonfile.readlines() if l[0] != "#") |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
#!/bin/bash | ||
|
||
echo "Updating substrait-cpp submodule..." | ||
git submodule update --remote third_party/substrait-cpp | ||
|
Uh oh!
There was an error while loading. Please reload this page.