-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Labels
Description
Context / Goal
In #18 we persisted the target hashes to the DB. This is currently done sequentially, so there is no race between streams of data being loaded and hashed from source and target DBs.
It may be advantageous to be able to do these at the same time
- if the query themselves takes time against source and target
- to avoid I/O waits when persisting to the Recce DB
It may also just add more contention and not be advantageous.
Expected Outcome
- Be in a position to evaluate whether this strategy is effective in terms of performance testing (Exploratory performance testing of load and persist #26)
- Change the loading mechanism to query and stream both source and target at the same time
- Should be able to handle
- source row inserted, target hash updated
- target row inserted, source hash updated
- Some mechanism will need to be introduced to avoid race conditions and retry a different strategy on an insert failure
- This may have a negative effect on ability to batch inserts, if done in Implement tunable batching of queries/inserts #40
- Should be able to handle
Out of Scope
Additional context / implementation notes