Skip to content

[Issue #185] Support mapping-based transformations #193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
May 27, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 32 additions & 36 deletions .github/workflows/ci-python-sdk.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,10 @@
name: Python SDK CI

on:
push:
paths:
- 'lib/python-sdk/**'
- '.github/workflows/ci-python-sdk.yml'
pull_request:
paths:
- 'lib/python-sdk/**'
- '.github/workflows/ci-python-sdk.yml'
- "lib/python-sdk/**"
- ".github/workflows/ci-python-sdk.yml"

jobs:
build:
Expand All @@ -18,33 +14,33 @@ jobs:
working-directory: lib/python-sdk

steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install Poetry
run: |
curl -sSL https://install.python-poetry.org | python3 -
- name: Configure Poetry
run: |
poetry config virtualenvs.create true
poetry config virtualenvs.in-project true
- name: Install dependencies
run: poetry install
- name: Run linting
run: poetry run ruff check .
- name: Run type checking
run: poetry run mypy .
- name: Run tests
run: poetry run pytest
- name: Build package
run: poetry build
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"

- name: Install Poetry
run: |
curl -sSL https://install.python-poetry.org | python3 -

- name: Configure Poetry
run: |
poetry config virtualenvs.create true
poetry config virtualenvs.in-project true

- name: Install dependencies
run: poetry install

- name: Run linting
run: poetry run ruff check .

- name: Run type checking
run: poetry run mypy .

- name: Run tests
run: poetry run pytest

- name: Build package
run: poetry build
3 changes: 3 additions & 0 deletions lib/python-sdk/.vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"python.analysis.typeCheckingMode": "basic"
}
86 changes: 81 additions & 5 deletions lib/python-sdk/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,14 @@ poetry add common-grants-python-sdk
from datetime import datetime, date, UTC
from uuid import uuid4

from common_grants.schemas.fields import Money, Event
from common_grants.schemas.models.opp_base import OpportunityBase
from common_grants.schemas.models.opp_funding import OppFunding
from common_grants.schemas.models.opp_status import OppStatus, OppStatusOptions
from common_grants.schemas.models.opp_timeline import OppTimeline
from common_grants_sdk.schemas.fields import Money, Event
from common_grants_sdk.schemas.models import (
OpportunityBase,
OppFunding,
OppStatus,
OppStatusOptions,
OppTimeline,
)

# Create a new opportunity
opportunity = OpportunityBase(
Expand Down Expand Up @@ -90,3 +93,76 @@ loaded_opportunity = OpportunityBase.from_json(json_data)
- `OppStatus`: Opportunity status tracking
- `OppTimeline`: Key dates and milestones

## Transformation utilities

The SDK includes a utility for transforming data according to a mapping specification.

The mapping supports both literal values and transformations keyed by the following reserved words:

- `field`: Extracts a value from the data using a dot-notation path
- `switch`: Performs a case-based lookup based on a field value

Here's an example of how to use the transformation utility to reshape arbitrary data:

```python
from uuid import uuid4

from common_grants_sdk.utils.transformation import transform_from_mapping

source_data = {
"opportunity_id": 12345,
"opportunity_title": "Research into ABC",
"opportunity_status": "posted",
"summary": {
"award_ceiling": 100000,
"award_floor": 10000,
"forecasted_close_date": "2025-07-15",
"forecasted_post_date": "2025-05-01",
},
}

mapping = {
"id": { "field": "opportunity_id" },
"title": { "field": "opportunity_title" },
"status": {
"switch": {
"field": "opportunity_status",
"case": {
"posted": "open",
"closed": "closed",
},
"default": "custom",
}
},
"funding": {
"minAwardAmount": {
"amount": { "field": "summary.award_floor" },
"currency": "USD",
},
"maxAwardAmount": {
"amount": { "field": "summary.award_ceiling" },
"currency": "USD",
},
},
"keyDates": {
"appOpens": { "field": "summary.forecasted_post_date" },
"appDeadline": { "field": "summary.forecasted_close_date" },
},
}

transformed_data = transform_from_mapping(source_data, mapping)

assert transformed_data == {
"id": uuid4(),
"title": "Research into ABC",
"status": "open",
"funding": {
"minAwardAmount": { "amount": 10000, "currency": "USD" },
"maxAwardAmount": { "amount": 100000, "currency": "USD" },
},
"keyDates": {
"appOpens": "2025-05-01",
"appDeadline": "2025-07-15",
},
}
```
20 changes: 17 additions & 3 deletions lib/python-sdk/common_grants_sdk/schemas/base.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
from pydantic import BaseModel, ConfigDict
import json
from typing import Self

from pydantic import BaseModel, ConfigDict

from common_grants_sdk.utils.transformation import transform_from_mapping


class CommonGrantsBaseModel(BaseModel):
Expand All @@ -18,13 +22,17 @@ def dump_json(self) -> str:
"""Convert model to JSON string (alias for model_dump_json for backward compatibility)."""
return self.model_dump_json()

def dump_with_mapping(self, mapping: dict) -> dict:
"""Convert model to dictionary with mapping."""
return transform_from_mapping(self.model_dump(mode="json"), mapping)

@classmethod
def from_json(cls, json_str: str) -> "CommonGrantsBaseModel":
def from_json(cls, json_str: str) -> Self:
"""Create model instance from JSON string (alias for model_validate_json for backward compatibility)."""
return cls.model_validate_json(json_str)

@classmethod
def from_dict(cls, data: dict) -> "CommonGrantsBaseModel":
def from_dict(cls, data: dict) -> Self:
"""Create model instance from dictionary (alias for model_validate for backward compatibility)."""
# If the data already contains datetime objects, use model_validate directly
try:
Expand All @@ -35,3 +43,9 @@ def from_dict(cls, data: dict) -> "CommonGrantsBaseModel":
data, default=str
) # Use str for non-serializable objects
return cls.model_validate_json(json_str)

@classmethod
def validate_with_mapping(cls, data: dict, mapping: dict) -> Self:
"""Validate model with mapping."""
new_data = transform_from_mapping(data, mapping)
return cls.model_validate(new_data)
174 changes: 174 additions & 0 deletions lib/python-sdk/common_grants_sdk/utils/transformation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
"""
This module provides a utility function for transforming data using a mapping.

The transform_from_mapping function takes a data dictionary and a mapping dictionary.
The mapping dictionary describes how to transform the data dictionary into a new dictionary.
"""

from typing import Any, Callable

handle_func = Callable[[dict, Any], Any]


def get_from_path(data: dict, path: str, default: Any = None) -> Any:
"""
Gets a value from a dictionary using dot notation.

Args:
data: The dictionary to extract the value from
path: A dot-separated string representing the path to the value
default: The default value to return if the path doesn't exist

Returns:
The value at the specified path, or the default value if the path doesn't exist
"""
parts = path.split(".")
for part in parts:
if isinstance(data, dict) and part in data:
data = data[part]
else:
return default
return data


def pluck_field_value(data: dict, field_path: str) -> Any:
"""
Handles a field transformation by extracting a value from the data using the specified field path.

Args:
data: The source data dictionary
field_path: A dot-separated string representing the path to the value

Returns:
The value from the specified field path in the data
"""
return get_from_path(data, field_path)


def switch_on_value(data: dict, switch_spec: dict) -> Any:
"""
Handles a match transformation by looking up a value in a case dictionary.

Args:
data: The source data dictionary
switch_spec: A dictionary containing:
- 'field': The field path to get the value from
- 'case': A dictionary mapping values to their transformations
- 'default': (optional) The default value if no match is found

Returns:
The transformed value based on the match, or the default value if no match is found
"""
val = get_from_path(data, switch_spec.get("field", ""))
lookup = switch_spec.get("case", {})
return lookup.get(val, switch_spec.get("default"))


# Registry for handlers
DEFAULT_HANDLERS: dict[str, handle_func] = {
"field": pluck_field_value,
"switch": switch_on_value,
}


def transform_from_mapping(
data: dict,
mapping: dict,
depth: int = 0,
max_depth: int = 500,
handlers: dict[str, handle_func] = DEFAULT_HANDLERS,
) -> dict:
"""
Transforms a data dictionary according to a mapping specification.

The mapping supports both literal values and transformations keyed by
the following reserved words:
- `field`: Extracts a value from the data using a dot-notation path
- `switch`: Performs a case-based lookup based on a field value

Args:
data: The source data dictionary to transform
mapping: A dictionary describing how to transform the data
depth: Current recursion depth (used internally)
max_depth: Maximum allowed recursion depth
handlers: A dictionary of handler functions to use for the transformations

Returns:
A new dictionary containing the transformed data according to the mapping

Example:

```python
source_data = {
"opportunity_status": "closed",
"opportunity_amount": 1000,
}

mapping = {
"status": { "field": "opportunity_status" },
"amount": {
"value": { "field": "opportunity_amount" },
"currency": "USD",
},
}

result = transform_from_mapping(source_data, mapping)

assert result == {
"status": "closed",
"amount": {
"value": 1000,
"currency": "USD",
},
}
```
"""
# Check for maximum depth
# This is a sanity check to prevent stack overflow from deeply nested mappings
# which may be a concern when running this function on third-party mappings
if depth > max_depth:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of enforcing a max depth? I agree 500 is an absurd depth, but curious what the intent is. Fear of runaway recursion?

Copy link
Collaborator Author

@widal001 widal001 May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a note explaining max depth here: docs(py-sdk): Add note explaining max_depth

It's mainly a check to avoid exceeding python's max recursion limit (1000) since this transformation function might be running on third-party (untrusted) mapping inputs.

In a future iteration I might consider refactoring this so we're using a loop instead of recursion and stress testing how depth is incremented.

raise ValueError("Maximum transformation depth exceeded.")

def transform_node(node: Any, depth: int) -> Any:
# Check for maximum depth
# This is a sanity check to prevent stack overflow from deeply nested mappings
# which may be a concern when running this function on third-party mappings
if depth > max_depth:
raise ValueError("Maximum transformation depth exceeded.")

# If the node is not a dictionary, return as is
# This allows users to set a key to a constant value (string or number)
if not isinstance(node, dict):
return node

# Walk through each key in the current node
for k, v in node.items():

# If the key is a reserved word, call the matching handler function
# on the value and return the result.
# Node: `{ "field": "opportunity_status" }`
# Returns: `extract_field_value(data, "opportunity_status")`
if k in handlers:
handler_func = handlers[k]
return handler_func(data, v)

# Otherwise, preserve the dictionary structure and
# recursively apply the transformation to each value.
# Node:
# ```
# {
# "status": { "field": "opportunity_status" },
# "amount": { "field": "opportunity_amount" },
# }
# ```
# Returns:
# ```
# {
# "status": transform_node({ "field": "opportunity_status" }, depth + 1)
# "amount": transform_node({ "field": "opportunity_amount" }, depth + 1)
# }
# ```
return {k: transform_node(v, depth + 1) for k, v in node.items()}

# Recursively walk the mapping until all nested transformations are applied
return transform_node(mapping, depth)
Empty file.
Loading