Create a realistic test env to show scaling capabilities of the pipeline

The main reason that our pipelines are implemented using Apache Beam is to make sure they are horizontally scalable and able to process large input FHIR data in a short time. We have shown this scalability feature with JSON input files (on a distributed file system) but a more realistic scenario is to have a FHIR server backed by a database with multiple replicas. This issue is to create and test the following two scenarios:
- A HAPI FHIR server with large amount of data being queried through the search API.
- Same as above but through the direct DB access mode.

The data for above cases can come from the [Synthea-HIV](https://github.com/google/fhir-data-pipes/tree/master/synthea-hiv) module. The test env. should be easy/quick to deploy; i.e., we should save the DB snapshot such that it can quickly be deployed whenever needed. We will run the pipelines on Dataflow service of Google Cloud and the DB should be on Cloud SQL (with enough replicas enabled). So part of this issue is to create a test env on GCP with a replicated HAPI server and DB replicas backing it.

This can also be used as a test bed for the Bulk Export API once we are done with its implementation (#533 is related).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create a realistic test env to show scaling capabilities of the pipeline #967

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Create a realistic test env to show scaling capabilities of the pipeline #967

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions