Skip to content

Commit 52d50fa

Browse files
authored
feat: Somewhat Useful Exceptions for Unsupported Objects (#7)
* Support for Null Objects * Raise Exceptions on Unsupported Objects * Formatting * Support For Glue Type Override * Type Override Tests * Typing & Documentation, Fix Checks * Attempt to Fix CI Caching Issue * Stop Running CI On Fork Branch
1 parent 3bde935 commit 52d50fa

File tree

4 files changed

+158
-5
lines changed

4 files changed

+158
-5
lines changed

.github/workflows/ci.yml

+1
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ jobs:
2323
steps:
2424
- uses: actions/checkout@v4
2525
- name: Set up Python ${{ matrix.python-version }}
26+
id: setup-python
2627
uses: actions/setup-python@v5
2728
with:
2829
python-version: ${{ matrix.python-version }}

README.md

+57-1
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ When using `AWS Kinesis Firehose` in a configuration that receives JSONs and wri
1717
one needs to define a `AWS Glue` table so Firehose knows what schema to use when creating the parquet files.
1818

1919
AWS Glue lets you define a schema using `Avro` or `JSON Schema` and then to create a table from that schema,
20-
but as of *May 2022`
20+
but as of *May 2022*
2121
there are limitations on AWS that tables that are created that way can't be used with Kinesis Firehose.
2222

2323
<https://stackoverflow.com/questions/68125501/invalid-schema-error-in-aws-glue-created-via-terraform>
@@ -100,6 +100,62 @@ Alternatively you can run CLI with `-o` flag to set output file location:
100100
pydantic-glue -f example.py -c Foo -o example.json -l
101101
```
102102

103+
## Override the type for the AWS Glue Schema
104+
105+
Wherever there is a `type` key in the input JSON Schema, an additional key `glue_type` may be
106+
defined to override the type that is used in the AWS Glue Schema. This is, for example, useful for
107+
a pydantic model that has a field of type `int` that is unix epoch time, while the column type you
108+
would like in Glue is of type `timestamp`.
109+
110+
Additional JSON Schema keys to a pydantic model can be added by using the
111+
[`Field` function](https://docs.pydantic.dev/latest/api/fields/#pydantic.fields.Field)
112+
with the argument `json_schema_extra` like so:
113+
114+
```python
115+
from pydantic import BaseModel, Field
116+
117+
class A(BaseModel):
118+
epoch_time: int = Field(
119+
...,
120+
json_schema_extra={
121+
"glue_type": "timestamp",
122+
},
123+
)
124+
```
125+
126+
The resulting JSON Schema will be:
127+
128+
```json
129+
{
130+
"properties": {
131+
"epoch_time": {
132+
"glue_type": "timestamp",
133+
"title": "Epoch Time",
134+
"type": "integer"
135+
}
136+
},
137+
"required": [
138+
"epoch_time"
139+
],
140+
"title": "A",
141+
"type": "object"
142+
}
143+
```
144+
145+
And the result after processing with pydantic-glue:
146+
147+
```json
148+
{
149+
"//": "Generated by pydantic-glue at 2022-05-25 12:35:55.333570. DO NOT EDIT",
150+
"columns": {
151+
"epoch_time": "timestamp",
152+
}
153+
}
154+
```
155+
156+
Recursing through object properties terminates when you supply a `glue_type` to use. If the type is
157+
complex, you must supply the full complex type yourself.
158+
103159
## How it works?
104160

105161
* `pydantic` gets converted to JSON Schema

pydantic_glue/handler.py

+17-2
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,18 @@
44

55

66
def dispatch(v: dict[str, Any]) -> str:
7+
8+
glue_type = v.get("glue_type", None)
9+
10+
if glue_type is not None:
11+
return str(glue_type)
12+
713
if "anyOf" in v:
814
return handle_union(v)
915

1016
t = v["type"]
1117

1218
if t == "object":
13-
if "additionalProperties" in v:
14-
return handle_map(v)
1519
return handle_object(v)
1620

1721
if t == "array":
@@ -55,6 +59,17 @@ def map_dispatch(o: dict[str, Any]) -> list[tuple[str, str]]:
5559

5660

5761
def handle_object(o: dict[str, Any]) -> str:
62+
if "additionalProperties" in o:
63+
if o["additionalProperties"] is True:
64+
raise Exception("Glue Cannot Support a Map Without Types")
65+
elif o["additionalProperties"]:
66+
if "properties" in o:
67+
raise NotImplementedError("Merging types of properties and additionalProperties")
68+
return handle_map(o)
69+
70+
if "properties" not in o:
71+
raise Exception("Object without properties or additionalProperties can't be represented")
72+
5873
res = [f"{k}:{v}" for (k, v) in map_dispatch(o)]
5974
return f"struct<{','.join(res)}>"
6075

tests/unit/test_convert.py

+83-2
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,8 @@
22
import json
33
from typing import Optional, Union
44

5-
from pydantic import BaseModel
5+
import pytest
6+
from pydantic import BaseModel, Field, field_serializer
67
from pydantic_glue import convert
78

89

@@ -34,6 +35,31 @@ class A(BaseModel):
3435
assert convert(json.dumps(A.model_json_schema())) == expected
3536

3637

38+
def test_single_dict_as_string_column():
39+
class A(BaseModel):
40+
as_string: dict
41+
42+
@field_serializer("as_string")
43+
def dump_json(self, value: dict) -> str:
44+
return json.dumps(value)
45+
46+
expected = [("as_string", "string")]
47+
assert convert(json.dumps(A.model_json_schema(mode="serialization"))) == expected
48+
49+
50+
def test_single_type_override_column():
51+
class A(BaseModel):
52+
special: int = Field(
53+
...,
54+
json_schema_extra={
55+
"glue_type": "gluetype",
56+
},
57+
)
58+
59+
expected = [("special", "gluetype")]
60+
assert convert(json.dumps(A.model_json_schema())) == expected
61+
62+
3763
def test_single_date_column():
3864
class A(BaseModel):
3965
modifiedOn: datetime.date
@@ -155,7 +181,62 @@ class A(BaseModel):
155181
("other", "string"),
156182
]
157183

158-
assert convert(json.dumps(A.model_json_schema())) == expected
184+
assert (
185+
convert(
186+
json.dumps(A.model_json_schema()),
187+
)
188+
== expected
189+
)
190+
191+
192+
def test_custom_type():
193+
class A(BaseModel):
194+
unixtime: int = Field(
195+
...,
196+
json_schema_extra={
197+
"glue_type": "timestamp",
198+
},
199+
)
200+
optional_unixtime: Optional[int] = Field(
201+
...,
202+
json_schema_extra={
203+
"glue_type": "timestamp",
204+
},
205+
)
206+
clobber_union_unixtime: Optional[Union[int, str]] = Field(
207+
...,
208+
json_schema_extra={
209+
"glue_type": "timestamp",
210+
},
211+
)
212+
correct_union_unixtime: Optional[Union[int, str]] = Field(
213+
...,
214+
json_schema_extra={
215+
"glue_type": "union<timestamp,string>",
216+
},
217+
)
218+
219+
expected = [
220+
("unixtime", "timestamp"),
221+
("optional_unixtime", "timestamp"),
222+
("clobber_union_unixtime", "timestamp"),
223+
("correct_union_unixtime", "union<timestamp,string>"),
224+
]
225+
226+
assert (
227+
convert(
228+
json.dumps(A.model_json_schema(mode="serialization")),
229+
)
230+
== expected
231+
)
232+
233+
234+
def test_invalid_object_raises():
235+
class A(BaseModel):
236+
map_serialized_as_object: dict
237+
238+
with pytest.raises(Exception):
239+
convert(json.dumps(A.model_json_schema()))
159240

160241

161242
def test_union_of_string_and_int():

0 commit comments

Comments
 (0)