-
Notifications
You must be signed in to change notification settings - Fork 3
[Issue #185] Support mapping-based transformations #193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Adds methods to serialize and deserialize schemas using a mapping config
Switches to using `Self` so that it will display the correct model in intellisense type hints
push: | ||
paths: | ||
- 'lib/python-sdk/**' | ||
- '.github/workflows/ci-python-sdk.yml' | ||
pull_request: | ||
paths: | ||
- 'lib/python-sdk/**' | ||
- '.github/workflows/ci-python-sdk.yml' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removes the push
trigger because it was causing duplicate runs of the CI checks on each PR. Not a big deal, but HHS admins recently asked us to try to reduce GitHub action usage wherever we can.
{ | ||
"python.analysis.typeCheckingMode": "basic" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enables Python type checking when using VSCode
@classmethod | ||
def from_json(cls, json_str: str) -> "CommonGrantsBaseModel": | ||
def from_json(cls, json_str: str) -> Self: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This improves the intellisense output, so that the class method displays the name of the sub-class being instantiated (if it inherits from this base model) instead of always displaying "CommonGrantsBaseModel".
DEFAULT_HANDLERS: dict[str, handle_func] = { | ||
"field": handle_field, | ||
"const": handle_const, | ||
"match": handle_match, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now we just define a base set of reserved words and their transformation functions, but this pattern makes it really easy to extend the mapping with additional key words.
mapping: dict, | ||
depth: int = 0, | ||
max_depth: int = 500, | ||
handlers: dict[str, handle_func] = DEFAULT_HANDLERS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allows library users to provide their own custom handlers if they'd like to extend the base set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. I left a small number of suggestions and questions.
Consider adding a README to explain usage of transform_from_mapping()
.
Returns: | ||
A new dictionary containing the transformed data according to the mapping | ||
""" | ||
if depth > max_depth: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of enforcing a max depth? I agree 500 is an absurd depth, but curious what the intent is. Fear of runaway recursion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a note explaining max depth here: docs(py-sdk): Add note explaining max_depth
It's mainly a check to avoid exceeding python's max recursion limit (1000) since this transformation function might be running on third-party (untrusted) mapping inputs.
In a future iteration I might consider refactoring this so we're using a loop instead of recursion and stress testing how depth is incremented.
for k in node: | ||
if k in handlers: | ||
return handlers[k](data, node) | ||
return {k: transform_node(v, depth + 1) for k, v in node.items()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot going on in these 4 lines! Maybe add a brief comment to explain what's happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, I added some comments and example to show what happens at each step in this commit: refactor(py-sdk): Adds docs to transformation functions
- Uses more descriptive names for each handler function - Changes how the known keys for handler functions are fetched
- Adds an example to the docstrings - Adds more code comments to the recursive transform_node() step
The transformation function already handles literal values so an extra reserved word just adds confusion and makes the mapping more verbose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Nice work!
Summary
Adds support for mapping based transformations to the Python SDK.
Changes proposed
transform_from_mapping()
function that transforms arbitrary JSON from one format to another using a mapping format that serves as a kind of lightweight DSL.validate_with_mapping()
to theCommonGrantsBaseModel
to deserialize a dictionary after transforming it with a mapping.dump_with_mapping()
toCommonGrantsBaseModel
to serialize and transform a model to a python dictionary.push
trigger from theci-python-sdk.yaml
to avoid duplicate runs of the CI checks on each PR.Context for reviewers
python-sdk
:cd lib/python-sdk/
Additional information