-
Notifications
You must be signed in to change notification settings - Fork 0
Description
WIP
Description
Most build steps dynamically generate themselves based on the domain tree. For example, when the building "agencies", the "model-transform" step uses the "buildSources" object to determine at run-time what other domains must also be built. However, the "model-transform" and "target-populate" steps are manually rewritten every time the builds change. For example, the db/pg/target-populate/agencies.sql script manually lists every table that is dependent on the agencies domain, (rather than just the tables that are specific to its own domain).
Goal
The goal is to refactor these steps so that each domain only lists its own tables and other required domains are added dynamically based on the build tree.
Limitations
Currently, the steps which use the build tree are also using "node-postgres" to operate on the database. "node-postgres" has two limitations that prevent its use in the "model-create" and "target-populate" steps.
First, it can only use the "COPY" SQL command. It cannot use the "\copy" psql command. The difference in these commands is where the "copied" data exists. "COPY" requires that the "copied" data exists in the same system as the database it's being copied into. As we are using managed databases, we cannot guarantee that we will have "file write" privileges. Even if we did have the privileges, we shouldn't pollute the production database with files. psql's "\copy" command allows data stored on an external system be copied into the target system. This let's us store the files in the flow database and then populate the target database.
Second, "node-postgres" cannot "dump" the model tables from the target database. We use the "pg_dump" for this, ex) db/pg/model-create/agencies.sh#L10
Acceptance criteria
- The model create step dynamically generates its domains based on the build tree
- It continues to source the model tables from the target database
- The target populate step dynamically generates its domains based on the build tree
- It continues to copy from files in the flow database to tables in the target database
- It continues to use a single transaction to both truncate and populate the target tables
Existing tooling
For 'pg_dump', the pg-dump-restore package provides a node-interface for 'pg_dump' functionality. Under the hood, it is a wrapper for execa.
For psql, there is no package to provide a wrapper for the postgres flavor of sql.
It is likely best to use 'execa' directly for both "pg_dump" and "psql" capabilities. We do not need all the features of pg-dump-restore; it's not worth adding another layer of dependecy above "execa".