-
Notifications
You must be signed in to change notification settings - Fork 3
[Issue 191] Transformer Scaffolding #192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appreciate you taking a first stab at this, but this isn't quite the approach we want to take.
I implemented a basic pattern for automating transformations using a serializable mapping format in #193 sorry I didn't get a chance to do this before I went OOO to provide an example.
### Transform Data Sources and Models | ||
|
||
When extending an existing system to adopt the CommonGrants protocol, a developer might need to transform existing data model implementations or data sources into canonical model instances. Such custom transformations can be easily implemented by leveraging the abstract base class `OpportunityTransformer`. | ||
|
||
The `OpportunityTransformer` base class defines a standard interface for transforming raw input data (from third-party feeds, legacy formats, custom JSON, etc.) into structured `OpportunityBase` instances. As an abstract base class, `OpportunityTransformer` itself *does not contain any transformation logic*; custom transformation logic must be implemented by subclassing the abstract base class (see following example). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't align with the pattern that we're trying to establish here. Adopters shouldn't have to define a custom transformer in Python code.
The goal is to define simple transformations using a serializable mapping that is portable across languages and provide language-specific utilities to automate transformations using that mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss. IMO this is compatible. A transformer pattern and portable serializable mapping formats are not mutually exclusive and developers can benefit from one, the other, or both combined.
"Adopters shouldn't have to define a custom transformer in Python code"
Nor should they have to learn a complicated deeply-nested mapping format if they simply want to instantiate an object from a known data source! :) IMO we should give them options.
The spirit of this PR is to provide a very simple scaffold for enabling the ADR stated goal ("flexibly map existing data structures to the canonical models") within the accepted tradeoffs ("requires a custom runtime to apply transformations in each new language or SDK") which can be extended to support any number of formats and transformers (such as the draft PR you opened today) OR used as-is without use of any mapping format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree these aren't incompatible approaches, maybe we can talk tomorrow about the benefit that this transformer ABC is providing if we want to keep it as an option?
Maybe you could outline the why someone might implement this ABC versus creating their own custom transformer function / class? Typically, the main use case for ABCs in Python is to support polymorphism, providing a consistent interface for downstream consumers of objects that implement this interface. Did you anticipate someone implementing multiple versions of this transformer? Or was there some specific downstream consumer you had in mind that would create the need for a consistent interface for the transformer class?
I could see the value of providing a standard Python interface for transformations that would be consistent across CommonGrants models, like Opportunity (current) and Organization, Person, etc. (future), but it would probably make more sense to define an ABC that is generic (e.g. ModelTransformer
) rather than specific to a given model.
I'll also take some time to provide feedback on some specific design choices in this transformer ABC, because even if we decide to proceed with this as an option, there are a few things I'd recommend tweaking.
@final | ||
def _generate_id(self) -> UUID: | ||
"""Generate unique ID for the opportunity.""" | ||
return uuid4() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might either want to replace this with a deterministic UUID method (e.g. v5) or at least provide an alternative internal function that supports deterministic UUID generation.
The current approach would generate two different UUIDs for the same input data.
), | ||
] | ||
|
||
def transform_opportunity(self, id: UUID | None = None) -> OpportunityBase: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to my comments below on the self._generate_id()
method, I think it would make more sense to have id
be a required value that is either a uuid or a string that is used to generate a UUID so that successive runs of this function with the same input data will yield identical results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or if we're sticking with the pattern of having one instance of this transformer per record, expecting an ID value on initialization or generating and saving the UUID when the transformer is instantiated with input data.
created_at=self._current_timestamp(), | ||
last_modified_at=self._current_timestamp(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll likely want to allow consumers of this function to optionally provide existing created_at
and last_modified_at
attributes, for example if these values are already coming from an existing database like we might expect for the Simpler.Grants.gov API.
from common_grants_sdk.schemas.models.opp_timeline import OppTimeline | ||
|
||
|
||
class OpportunityTransformer(CommonGrantsBaseModel, ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the motivation for inheriting from the CommonGrantsBaseModel
and thus making this transformer class a pydantic schema?
Is there a reason someone implementing this function would call transform_opportunity()
multiple times after instantiating a given transformer with a single input record?
Some alternative patterns that I think would make more sense:
- Treating a given transformer as a singleton class that accepts arbitrary input data to the
transform_opportunity()
method, so it can be used in a loop to transform a list of records without having to instantiate the class, then invoketransform_opportunity()
- Removing the inheritance from
CommonGrantsBaseModel
so it's not a pydantic schema, accepting input data and invokingtransform_opportunity()
as part of the__init__()
method and storing the resulting model in amodel
so that folks could simply runLegacyTransformer(data={...}).model
to get the resulting opportunity instance.
into a canonical OpportunityBase model instance. | ||
""" | ||
|
||
return OpportunityBase( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works if folks only want to use the default OpportunityBase
class, but if they want to extend the OpportunityBase
class with custom fields (like outlined in this tutorial) this approach won't work.
An alternative would be to make this a generic class to which developers can pass a sub-class of OpportunityBase
when they are implementing this ABC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the generic classes docs for more info
Summary
Changes proposed
Creates scaffolding for custom data transformers.
Context for reviewers
Grant platform data may come from many heterogeneous sources (e.g. XML APIs, proprietary CSV formats, legacy JSON, etc). This PR introduces an abstract base class designed for runtime transformations from arbitrary data structures to canonical data models as defined by the CommonGrants protocol. This class defines a standard interface for transforming raw input data into structured
OpportunityBase
instances. Developers may subclass the abstract base class to implement custom transformation logic for normalizing various source formats.Features
Additional information