Add optional Citus (distributed PostgreSQL) support to advanced compose setup#94
Add optional Citus (distributed PostgreSQL) support to advanced compose setup#94ViacheslavKlimov wants to merge 2 commits into
Conversation
ViacheslavKlimov
left a comment
There was a problem hiding this comment.
Review summary
Reviewed 13 changed files in Add optional Citus (distributed PostgreSQL) support to advanced compose setup. Left 5 comments inline.
The flag wiring is clean and mirrors the existing additionalComposeEdqsArgs() pattern well, and the inert-by-default behavior holds (with CITUS_ENABLED=false nothing changes). The two main concerns: suppressing the postgres service breaks docker compose when Trendz is also enabled, and the documented postgres→Citus conversion appears to start from an empty coordinator data directory.
Additional findings
These observations are about existing files outside the PR's diff — spotted while reading surrounding context.
- .gitignore — The new Citus bind mounts write live PostgreSQL data to
tb-node/citus/coordinator,tb-node/citus/worker-1, andtb-node/citus/worker-2, but.gitignoreonly excludestb-node/postgres/**(plus the cassandra/valkey dirs). These Citus data directories aren't ignored, so agit addin a user's checkout could accidentally commit database files. Consider addingtb-node/citus/**. - docker-install-tb.sh (install timing) — On a fresh
CITUS_ENABLED=trueinstall,up -dstarts coordinator/workers/manager and the very next command runsINSTALL_TB=trueagainst the coordinator.citus-managerregisters the workers asynchronously after the coordinator becomes healthy. Is anything guaranteeing the workers are registered before the schema is distributed? Ifcreate_distributed_tableruns before registration, shards could land only on the coordinator. (May well be handled on the backend — flagging in case it isn't.)
This review was auto-generated. Findings may contain errors — please verify before applying changes.
| ADDITIONAL_COMPOSE_ARGS="-f docker-compose.postgres.yml" | ||
| if [ "$CITUS_ENABLED" = true ] && [ "$TB_SETUP" = advanced ]; then | ||
| # Citus replaces the single postgres service (see additionalComposeCitusArgs). | ||
| ADDITIONAL_COMPOSE_ARGS="" |
There was a problem hiding this comment.
Suppressing the postgres service here collides with Trendz. docker-compose.trendz.yml still declares depends_on: - postgres (and trendz.env points SPRING_DATASOURCE_URL at host postgres, which trendz-db-init.sh also connects to). With CITUS_ENABLED=true and TRENDZ_ENABLED=true, the merged project includes trendz.yml but not postgres.yml, so postgres is undefined and docker compose fails the project validation with service "trendz" depends on undefined service "postgres" — breaking start/stop/install/upgrade. Since docker-install-trendz.sh/docker-upgrade-trendz.sh were updated to also pass ${ADDITIONAL_COMPOSE_CITUS_ARGS}, the two are clearly meant to coexist, but as wired they can't. Trendz would also need its datasource pointed at citus-coordinator instead of postgres.
| POSTGRES_USER: postgres | ||
| POSTGRES_PASSWORD: postgres | ||
| volumes: | ||
| - ../tb-node/citus/coordinator:/var/lib/postgresql/data |
There was a problem hiding this comment.
The coordinator binds a fresh ../tb-node/citus/coordinator directory, separate from the existing single-postgres data in ../tb-node/postgres. For the postgres-to-citus conversion documented in the README, the coordinator comes up empty — the existing data is never mounted or reachable — so the upgrade would run against an empty database rather than converting data in place. Is there a missing step to migrate the existing data (or point the coordinator at the old tb-node/postgres dir) before converting?
| restart: always | ||
| image: "citusdata/membership-manager:0.3.0" | ||
| volumes: | ||
| - /var/run/docker.sock:/var/run/docker.sock |
There was a problem hiding this comment.
Mounting the host Docker socket into citus-manager gives that container root-equivalent access to the host. It's the standard membership-manager pattern, so this may be acceptable, but worth calling out as an added attack surface for a production deployment.
| - citus-worker-2-data:/var/lib/postgresql/data | ||
|
|
||
| volumes: | ||
| citus-coordinator-data: |
There was a problem hiding this comment.
This deviates from the other *.volumes.yml files: postgres.volumes.yml, edqs.volumes.yml, and docker-compose.volumes.yml all use external: name: ${SOME_VOLUME} with an env-var indirection, whereas this hardcodes the volume names and uses the short external: true form (and there are no matching env vars in .env). Worth aligning with the existing convention. Note also that, like the sibling volumes files, this one isn't referenced by any script, so it is currently inert.
| When `CITUS_ENABLED=false` (the default) the deployment uses the single `postgres` service exactly as before; the Citus | ||
| files are inert. | ||
|
|
||
| A **fresh install** with `CITUS_ENABLED=true` distributes the schema automatically. To **convert an existing single-postgres |
There was a problem hiding this comment.
This conversion procedure looks incomplete given the volume layout. With CITUS_ENABLED=true the coordinator uses the fresh ../tb-node/citus/coordinator data dir and the old postgres service isn't started, so the existing data in ../tb-node/postgres is never reachable. Running --fromVersion=postgres-to-citus against an empty coordinator wouldn't convert the existing database. If a manual data migration (or repointing the coordinator at the old data dir) is required first, that step should be documented here.
Adds an opt-in
CITUS_ENABLEDflag that, in the advanced setup withDATABASE=postgres(PE only), replaces the singlepostgresservice with a Citus cluster.What it does (when
CITUS_ENABLED=true)citus-coordinator+citus-worker-1/2(citusdata/citus:12.1,shared_preload_libraries=citus, healthchecks) pluscitus-manager(citusdata/membership-manager) which auto-registers the labelled workers with the coordinator — no manualcitus_add_node.tb-nodeservices at the coordinator and passesDATABASE_CITUS_ENABLED=trueandDATABASE_CITUS_SHARD_COUNT=${CITUS_SHARD_COUNT}.additionalComposeCitusArgs()(mirroringadditionalComposeEdqsArgs()) merges the Citus compose file, suppresses the postgres compose file, and starts the Citus services instead ofpostgres. Wired into all lifecycle scripts (start/stop/remove/update/install/upgrade).Inert by default
With
CITUS_ENABLED=false(the default) there is no change to existing deployments: the postgres service is used as before and noDATABASE_CITUS_*env is rendered. The Citus env lives only in the Citus compose override (not in the sharedtb-node.postgres.env), so plain-postgres setups are unaffected.Files
.env—CITUS_ENABLED,CITUS_SHARD_COUNTadvanced/docker-compose.citus.yml,advanced/docker-compose.citus.volumes.ymlcompose-utils.sh+ lifecycle scriptsREADME.mdNotes
CITUS_SHARD_COUNTmust matchDATABASE_CITUS_SHARD_COUNTand is fixed at distribution time (documented in README).DATABASE_CITUS_*support, distribution on fresh install,postgres-to-citusconversion) live in the mainthingsboard-perepo.docker compose config(enabled and disabled).