Skip to content

[Issue 191] Transformer Scaffolding #192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

DavidDudas-Intuitial
Copy link
Collaborator

@DavidDudas-Intuitial DavidDudas-Intuitial commented May 17, 2025

Summary

Changes proposed

What was added, updated, or removed in this PR.

Creates scaffolding for custom data transformers.

Context for reviewers

Testing instructions, background context, more in-depth details of the implementation, and anything else you'd like to call out or ask reviewers. Explain how the changes were verified.

Grant platform data may come from many heterogeneous sources (e.g. XML APIs, proprietary CSV formats, legacy JSON, etc). This PR introduces an abstract base class designed for runtime transformations from arbitrary data structures to canonical data models as defined by the CommonGrants protocol. This class defines a standard interface for transforming raw input data into structured OpportunityBase instances. Developers may subclass the abstract base class to implement custom transformation logic for normalizing various source formats.

Features

  • Enforces a clear contract for field-by-field transformation
  • Automatically generates certain fields if not provided
  • Provides a reusable and extensible template method pattern
  • Built on Pydantic for type safety and validation

Additional information

Screenshots, GIF demos, code examples or output to help show the changes working as expected.

Copy link
Collaborator

@widal001 widal001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate you taking a first stab at this, but this isn't quite the approach we want to take.

I implemented a basic pattern for automating transformations using a serializable mapping format in #193 sorry I didn't get a chance to do this before I went OOO to provide an example.

Comment on lines +119 to +123
### Transform Data Sources and Models

When extending an existing system to adopt the CommonGrants protocol, a developer might need to transform existing data model implementations or data sources into canonical model instances. Such custom transformations can be easily implemented by leveraging the abstract base class `OpportunityTransformer`.

The `OpportunityTransformer` base class defines a standard interface for transforming raw input data (from third-party feeds, legacy formats, custom JSON, etc.) into structured `OpportunityBase` instances. As an abstract base class, `OpportunityTransformer` itself *does not contain any transformation logic*; custom transformation logic must be implemented by subclassing the abstract base class (see following example).
Copy link
Collaborator

@widal001 widal001 May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't align with the pattern that we're trying to establish here. Adopters shouldn't have to define a custom transformer in Python code.

The goal is to define simple transformations using a serializable mapping that is portable across languages and provide language-specific utilities to automate transformations using that mapping.

Copy link
Collaborator Author

@DavidDudas-Intuitial DavidDudas-Intuitial May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss. IMO this is compatible. A transformer pattern and portable serializable mapping formats are not mutually exclusive and developers can benefit from one, the other, or both combined.

"Adopters shouldn't have to define a custom transformer in Python code"

Nor should they have to learn a complicated deeply-nested mapping format if they simply want to instantiate an object from a known data source! :) IMO we should give them options.

The spirit of this PR is to provide a very simple scaffold for enabling the ADR stated goal ("flexibly map existing data structures to the canonical models") within the accepted tradeoffs ("requires a custom runtime to apply transformations in each new language or SDK") which can be extended to support any number of formats and transformers (such as the draft PR you opened today) OR used as-is without use of any mapping format.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree these aren't incompatible approaches, maybe we can talk tomorrow about the benefit that this transformer ABC is providing if we want to keep it as an option?

Maybe you could outline the why someone might implement this ABC versus creating their own custom transformer function / class? Typically, the main use case for ABCs in Python is to support polymorphism, providing a consistent interface for downstream consumers of objects that implement this interface. Did you anticipate someone implementing multiple versions of this transformer? Or was there some specific downstream consumer you had in mind that would create the need for a consistent interface for the transformer class?

I could see the value of providing a standard Python interface for transformations that would be consistent across CommonGrants models, like Opportunity (current) and Organization, Person, etc. (future), but it would probably make more sense to define an ABC that is generic (e.g. ModelTransformer) rather than specific to a given model.

I'll also take some time to provide feedback on some specific design choices in this transformer ABC, because even if we decide to proceed with this as an option, there are a few things I'd recommend tweaking.

Comment on lines +90 to +93
@final
def _generate_id(self) -> UUID:
"""Generate unique ID for the opportunity."""
return uuid4()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might either want to replace this with a deterministic UUID method (e.g. v5) or at least provide an alternative internal function that supports deterministic UUID generation.

The current approach would generate two different UUIDs for the same input data.

),
]

def transform_opportunity(self, id: UUID | None = None) -> OpportunityBase:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to my comments below on the self._generate_id() method, I think it would make more sense to have id be a required value that is either a uuid or a string that is used to generate a UUID so that successive runs of this function with the same input data will yield identical results.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or if we're sticking with the pattern of having one instance of this transformer per record, expecting an ID value on initialization or generating and saving the UUID when the transformer is instantiated with input data.

Comment on lines +41 to +42
created_at=self._current_timestamp(),
last_modified_at=self._current_timestamp(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll likely want to allow consumers of this function to optionally provide existing created_at and last_modified_at attributes, for example if these values are already coming from an existing database like we might expect for the Simpler.Grants.gov API.

from common_grants_sdk.schemas.models.opp_timeline import OppTimeline


class OpportunityTransformer(CommonGrantsBaseModel, ABC):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for inheriting from the CommonGrantsBaseModel and thus making this transformer class a pydantic schema?

Is there a reason someone implementing this function would call transform_opportunity() multiple times after instantiating a given transformer with a single input record?

Some alternative patterns that I think would make more sense:

  • Treating a given transformer as a singleton class that accepts arbitrary input data to the transform_opportunity() method, so it can be used in a loop to transform a list of records without having to instantiate the class, then invoke transform_opportunity()
  • Removing the inheritance from CommonGrantsBaseModel so it's not a pydantic schema, accepting input data and invoking transform_opportunity() as part of the __init__() method and storing the resulting model in a model so that folks could simply run LegacyTransformer(data={...}).model to get the resulting opportunity instance.

into a canonical OpportunityBase model instance.
"""

return OpportunityBase(
Copy link
Collaborator

@widal001 widal001 May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works if folks only want to use the default OpportunityBase class, but if they want to extend the OpportunityBase class with custom fields (like outlined in this tutorial) this approach won't work.

An alternative would be to make this a generic class to which developers can pass a sub-class of OpportunityBase when they are implementing this ABC.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the generic classes docs for more info

@DavidDudas-Intuitial DavidDudas-Intuitial marked this pull request as draft May 20, 2025 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Py SDK] Transformer scaffolding
2 participants