-
Notifications
You must be signed in to change notification settings - Fork 49
[Issue #6980] ADR for SGM Data Flow #7683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
64ed70f to
e27ee12
Compare
…s checked by the linter so we know that one's up-to-date
e27ee12 to
460f78f
Compare
|
Not sure why the check is failing, the file it's mad about is linked in the Summary, just like all the other pages Brandon added along with it... |
|
|
||
| ## Context and Problem Statement | ||
|
|
||
| Given the different approach for modernization of GrantSolutions we need an ongoing interoperability with the existing system, not a strangler, rebuild and replace pattern. To allow for that we need to have strategies for accessing data across the existing and modernized system that were not already designed and built for the Simpler Grants.gov work. How will we allow for bi-directional, near real-time, data flow between GrantSolutions (GS) and Simpler Grants Management (SGM). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming this last sentence is a question
How will we allow for bi-directional, near real-time, data flow between GrantSolutions (GS) and Simpler Grants Management (SGM)?
| ## Decision Drivers | ||
|
|
||
| - Optimize User Experience | ||
| - User's should be able to move between systems as seamlessly as possible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - User's should be able to move between systems as seamlessly as possible | |
| - Users should be able to move between systems as seamlessly as possible |
|
|
||
| - Optimize User Experience | ||
| - User's should be able to move between systems as seamlessly as possible | ||
| - Workflows or processes should move from system to system as needed without long delays or user intervention |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this mean the workflows themselves should move, as in a workflow defined in simpler should be able to be accessed from grant solutions? Or is this more that as part of a user's "workflow" they should be able to track data between the two systems? "Workflow" being a tricky term I think this needs some clarification
|
|
||
| ### Bulk data copy on a scheduled basis | ||
|
|
||
| This is the approach we took on Simpler Grants.gov. Hourly, we Extract, Load, and Transform (ELT) all of the table data we need from Grants.gov's database into Simpler Grants.gov's database. Whenever possible we only pull records updated since the last run to minimize data volume and the associated load on the existing system. We do not currently send data back to Grants.gov's database but in the SGM work that would be a requirement as well. We would modify our existing processes to support bi-directional data transfer. We could also consider improving the existing code base to allow it to run more frequently without collision and add more filtering to avoid when we fetch rows that we didn't see their FK records and so we fail to create the records (we could just only process something if we've already seen the parent record this run). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are FK records here?
|
|
||
| ### Bulk data copy on a scheduled basis | ||
|
|
||
| This is the approach we took on Simpler Grants.gov. Hourly, we Extract, Load, and Transform (ELT) all of the table data we need from Grants.gov's database into Simpler Grants.gov's database. Whenever possible we only pull records updated since the last run to minimize data volume and the associated load on the existing system. We do not currently send data back to Grants.gov's database but in the SGM work that would be a requirement as well. We would modify our existing processes to support bi-directional data transfer. We could also consider improving the existing code base to allow it to run more frequently without collision and add more filtering to avoid when we fetch rows that we didn't see their FK records and so we fail to create the records (we could just only process something if we've already seen the parent record this run). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably leave out the detail about filtering based on the foreign key issue, while that is an annoying issue, it's probably too specific to call out here.
I will say that our current approach is heavily reliant on grants.gov's method of storing data in their DB (having created/updated timestamps), it wouldn't be possible to replicate if SGM doesn't have that working properly.
Also, bidirectional might not work well at all with our current approach, not only would writing back potentially put us in an endless loop (we write an update to legacy, which now has an updated timestamp and we pull it, and then process that update, and now we have an update to write back to legacy ... ).
Batch processing is likely something we'll want where timeliness isn't a concern, but we might need to start at least partially fresh depending on how SGM works.
| - **Cons** | ||
| - Still learning what data has existing APIs that will make this possible | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if a call fails? A con of using APIs might be that we end up in a weird state because an API call failed, but we don't have a way pipe that through? There's also some level of complexity about how to identify that two records across systems representing the same thing, opportunities have 3 different IDs (legacy integers, our UUIDs, and opportunity number which isn't unique, but often what other systems use). Even if we have IDs in sync, those IDs might not even exist to connect records (a legacy integer opportunity ID can't be created anywhere BUT grants.gov adding an entire dependency there).
|
|
||
| ### Call GS APIs directly as needed, store additional data points in SGM (without duplicating existing data) | ||
|
|
||
| We won't be able to always mutate the GS data model as quickly as we'd want to iterate on SGM. In those cases we would store new fields in the Simpler DB, with the identifier of the record in GS. This would allow the data from both systems to be pulled together either in the API layer or in the FE via 2 API calls depending on whether we're wrapping API calls to GS in the Simpler API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say we should keep the frontend from being aware of SGM as much as possible, since the problem is largely a data issue, having data be "merged" in the API itself makes more sense. It also means if something is only in our system or across two systems, the frontend (and user experience), doesn't really need to be aware of that.
Summary
Fixes #6980
Adds an ADR describing the ways we've identified we'll access GS data directly from SGM