Skip to content

Add optional Citus (distributed PostgreSQL) support to advanced compose setup#94

Open
ViacheslavKlimov wants to merge 2 commits into
masterfrom
feature/citus
Open

Add optional Citus (distributed PostgreSQL) support to advanced compose setup#94
ViacheslavKlimov wants to merge 2 commits into
masterfrom
feature/citus

Conversation

@ViacheslavKlimov
Copy link
Copy Markdown
Member

Adds an opt-in CITUS_ENABLED flag that, in the advanced setup with DATABASE=postgres (PE only), replaces the single postgres service with a Citus cluster.

What it does (when CITUS_ENABLED=true)

  • Brings up citus-coordinator + citus-worker-1/2 (citusdata/citus:12.1, shared_preload_libraries=citus, healthchecks) plus citus-manager (citusdata/membership-manager) which auto-registers the labelled workers with the coordinator — no manual citus_add_node.
  • Points tb-node services at the coordinator and passes DATABASE_CITUS_ENABLED=true and DATABASE_CITUS_SHARD_COUNT=${CITUS_SHARD_COUNT}.
  • additionalComposeCitusArgs() (mirroring additionalComposeEdqsArgs()) merges the Citus compose file, suppresses the postgres compose file, and starts the Citus services instead of postgres. Wired into all lifecycle scripts (start/stop/remove/update/install/upgrade).

Inert by default

With CITUS_ENABLED=false (the default) there is no change to existing deployments: the postgres service is used as before and no DATABASE_CITUS_* env is rendered. The Citus env lives only in the Citus compose override (not in the shared tb-node.postgres.env), so plain-postgres setups are unaffected.

Files

  • .envCITUS_ENABLED, CITUS_SHARD_COUNT
  • advanced/docker-compose.citus.yml, advanced/docker-compose.citus.volumes.yml
  • compose-utils.sh + lifecycle scripts
  • README.md

Notes

  • CITUS_SHARD_COUNT must match DATABASE_CITUS_SHARD_COUNT and is fixed at distribution time (documented in README).
  • The companion backend changes (DATABASE_CITUS_* support, distribution on fresh install, postgres-to-citus conversion) live in the main thingsboard-pe repo.
  • Validated with docker compose config (enabled and disabled).

Copy link
Copy Markdown
Member Author

@ViacheslavKlimov ViacheslavKlimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary

Reviewed 13 changed files in Add optional Citus (distributed PostgreSQL) support to advanced compose setup. Left 5 comments inline.

The flag wiring is clean and mirrors the existing additionalComposeEdqsArgs() pattern well, and the inert-by-default behavior holds (with CITUS_ENABLED=false nothing changes). The two main concerns: suppressing the postgres service breaks docker compose when Trendz is also enabled, and the documented postgres→Citus conversion appears to start from an empty coordinator data directory.

Additional findings

These observations are about existing files outside the PR's diff — spotted while reading surrounding context.

  • .gitignore — The new Citus bind mounts write live PostgreSQL data to tb-node/citus/coordinator, tb-node/citus/worker-1, and tb-node/citus/worker-2, but .gitignore only excludes tb-node/postgres/** (plus the cassandra/valkey dirs). These Citus data directories aren't ignored, so a git add in a user's checkout could accidentally commit database files. Consider adding tb-node/citus/**.
  • docker-install-tb.sh (install timing) — On a fresh CITUS_ENABLED=true install, up -d starts coordinator/workers/manager and the very next command runs INSTALL_TB=true against the coordinator. citus-manager registers the workers asynchronously after the coordinator becomes healthy. Is anything guaranteeing the workers are registered before the schema is distributed? If create_distributed_table runs before registration, shards could land only on the coordinator. (May well be handled on the backend — flagging in case it isn't.)

This review was auto-generated. Findings may contain errors — please verify before applying changes.

Comment thread compose-utils.sh
ADDITIONAL_COMPOSE_ARGS="-f docker-compose.postgres.yml"
if [ "$CITUS_ENABLED" = true ] && [ "$TB_SETUP" = advanced ]; then
# Citus replaces the single postgres service (see additionalComposeCitusArgs).
ADDITIONAL_COMPOSE_ARGS=""
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppressing the postgres service here collides with Trendz. docker-compose.trendz.yml still declares depends_on: - postgres (and trendz.env points SPRING_DATASOURCE_URL at host postgres, which trendz-db-init.sh also connects to). With CITUS_ENABLED=true and TRENDZ_ENABLED=true, the merged project includes trendz.yml but not postgres.yml, so postgres is undefined and docker compose fails the project validation with service "trendz" depends on undefined service "postgres" — breaking start/stop/install/upgrade. Since docker-install-trendz.sh/docker-upgrade-trendz.sh were updated to also pass ${ADDITIONAL_COMPOSE_CITUS_ARGS}, the two are clearly meant to coexist, but as wired they can't. Trendz would also need its datasource pointed at citus-coordinator instead of postgres.

POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
volumes:
- ../tb-node/citus/coordinator:/var/lib/postgresql/data
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The coordinator binds a fresh ../tb-node/citus/coordinator directory, separate from the existing single-postgres data in ../tb-node/postgres. For the postgres-to-citus conversion documented in the README, the coordinator comes up empty — the existing data is never mounted or reachable — so the upgrade would run against an empty database rather than converting data in place. Is there a missing step to migrate the existing data (or point the coordinator at the old tb-node/postgres dir) before converting?

restart: always
image: "citusdata/membership-manager:0.3.0"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mounting the host Docker socket into citus-manager gives that container root-equivalent access to the host. It's the standard membership-manager pattern, so this may be acceptable, but worth calling out as an added attack surface for a production deployment.

- citus-worker-2-data:/var/lib/postgresql/data

volumes:
citus-coordinator-data:
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deviates from the other *.volumes.yml files: postgres.volumes.yml, edqs.volumes.yml, and docker-compose.volumes.yml all use external: name: ${SOME_VOLUME} with an env-var indirection, whereas this hardcodes the volume names and uses the short external: true form (and there are no matching env vars in .env). Worth aligning with the existing convention. Note also that, like the sibling volumes files, this one isn't referenced by any script, so it is currently inert.

Comment thread README.md
When `CITUS_ENABLED=false` (the default) the deployment uses the single `postgres` service exactly as before; the Citus
files are inert.

A **fresh install** with `CITUS_ENABLED=true` distributes the schema automatically. To **convert an existing single-postgres
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conversion procedure looks incomplete given the volume layout. With CITUS_ENABLED=true the coordinator uses the fresh ../tb-node/citus/coordinator data dir and the old postgres service isn't started, so the existing data in ../tb-node/postgres is never reachable. Running --fromVersion=postgres-to-citus against an empty coordinator wouldn't convert the existing database. If a manual data migration (or repointing the coordinator at the old data dir) is required first, that step should be documented here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant