Skip to content

Create a realistic test env to show scaling capabilities of the pipeline #967

Open
@bashir2

Description

@bashir2

The main reason that our pipelines are implemented using Apache Beam is to make sure they are horizontally scalable and able to process large input FHIR data in a short time. We have shown this scalability feature with JSON input files (on a distributed file system) but a more realistic scenario is to have a FHIR server backed by a database with multiple replicas. This issue is to create and test the following two scenarios:

  • A HAPI FHIR server with large amount of data being queried through the search API.
  • Same as above but through the direct DB access mode.

The data for above cases can come from the Synthea-HIV module. The test env. should be easy/quick to deploy; i.e., we should save the DB snapshot such that it can quickly be deployed whenever needed. We will run the pipelines on Dataflow service of Google Cloud and the DB should be on Cloud SQL (with enough replicas enabled). So part of this issue is to create a test env on GCP with a replicated HAPI server and DB replicas backing it.

This can also be used as a test bed for the Bulk Export API once we are done with its implementation (#533 is related).

Metadata

Metadata

Assignees

Labels

P2:shouldAn issue to be addressed in a quarter or so.enhancementNew feature or requestgood first issueGood for newcomers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions