Skip to content

Interleave persistence of source + target rows #41

@chadlwilson

Description

@chadlwilson

Context / Goal

In #18 we persisted the target hashes to the DB. This is currently done sequentially, so there is no race between streams of data being loaded and hashed from source and target DBs.

It may be advantageous to be able to do these at the same time

  • if the query themselves takes time against source and target
  • to avoid I/O waits when persisting to the Recce DB

It may also just add more contention and not be advantageous.

Expected Outcome

  • Be in a position to evaluate whether this strategy is effective in terms of performance testing (Exploratory performance testing of load and persist #26)
  • Change the loading mechanism to query and stream both source and target at the same time
    • Should be able to handle
      • source row inserted, target hash updated
      • target row inserted, source hash updated
    • Some mechanism will need to be introduced to avoid race conditions and retry a different strategy on an insert failure
    • This may have a negative effect on ability to batch inserts, if done in Implement tunable batching of queries/inserts #40

Out of Scope

Additional context / implementation notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestsize:Mmedium items

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions