A locally-run Docker-based Salesforce-to-PostgreSQL data pipeline. The database is the product. External apps connect directly to Postgres. A React web UI handles configuration. Everything runs via docker compose up.
Full scope: docs/scope.md
Agents must commit their work after completing each task. Do not batch multiple tasks into one commit. One task = one commit (minimum).
<type>(<scope>): <short description>
<optional body — what changed and why>
Co-Authored-By: Claude <noreply@anthropic.com>
Types:
feat— new functionalityfix— bug fixchore— config, tooling, setuprefactor— restructure without behavior changedocs— documentation only
Scopes: scaffold, docker, db, auth, bulk-api, ddl, delta-sync, reconciliation, scheduler, api, ui, build
Examples:
feat(auth): add SF org token reader from ~/.sf JSON files
feat(delta-sync): implement initial full load when last_delta_sync is NULL
chore(scaffold): initialize project structure and package.json files
feat(ui): add objects page with sync toggle and row count display
- After completing a full task from the task list
- After a meaningful sub-step within a large task (e.g. after each route module, after each UI page)
- Always before starting a new task
- Never commit broken or partially-wired code — a commit should represent a working unit
Stage specific files by name — never git add . or git add -A blindly. Check git status first. Never stage:
data/(git-ignored, but double-check).env(git-ignored, but double-check)- Unrelated files touched incidentally
All work goes to main for this project. It is a local-only tool with a single developer. No feature branches required unless explicitly requested.
| Layer | Technology |
|---|---|
| Database | PostgreSQL (Docker) |
| Backend | Node.js + TypeScript + Express |
| Frontend | React + TypeScript + shadcn/ui + Tailwind |
| SF Auth | Read ~/.sf/ JSON directly (no sf binary in container) |
| SF Data | jsforce + Bulk API 2.0 |
| Scheduler | node-cron |
| Containers | Docker + Docker Compose |
sf-db/
├── src/
│ ├── api/ # Express backend + sync engine
│ └── ui/ # React frontend
├── docker/ # Dockerfiles only
├── data/ # LOCAL ONLY — git-ignored
│ ├── docker/ # Postgres volume mount
│ └── downloads/ # Future file exports
├── docs/ # Project documentation
├── docker-compose.yml
├── .env # Git-ignored — ports, DB creds, LOG_RETENTION_DAYS
├── .env.example # Committed — template with empty values
└── CLAUDE.md
- Strict mode on (
"strict": truein tsconfig) - No
any— useunknownand narrow properly - Prefer
interfaceovertypefor object shapes - Async/await over raw promises
- All database queries go through the pg pool — never create ad-hoc connections
- Internal app tables →
sfdbschema - Synced Salesforce data → one schema per registered org named
org_<lowercased orgid>(e.g.org_00d5g000001abcdeaa) - All
sfdb.*per-object/per-field tables (sync_config,field_config,field_metadata,sync_log,sync_lock) are keyed by(org_id, ...)withON DELETE CASCADEfromsfdb.orgs - The active UI/sync context is stored in
sfdb.active_org(single row); the API resolves it fromX-Org-Idrequest header first, falling back to that pointer - Every synced table must have:
id,sf_created_at,sf_updated_at,sf_deleted_at,synced_at - Field names are lowercase snake_case versions of SF API names
- DDL is always idempotent (
IF NOT EXISTS/IF EXISTS); identifiers are always quoted (objects likeOrder/Usercollide with PG reserved words)
- Every sync entry point takes
orgIdas its primary key; alias is only used to look up an~/.sfdxtoken viasfdb.orgs sfdb.sync_lockis per-org (one row per registered org). Acquire before any sync; always release in afinallyblock- Different orgs sync in parallel; one sync per org is serialized via that org's lock
- If
last_delta_syncis NULL → initial full load (no SystemModstamp WHERE clause) - Stale lock threshold: 30 minutes
- Log purge runs at the start of every sync (delete rows older than
LOG_RETENTION_DAYS) - The cron scheduler runs as one process with two ticks (delta per minute, full daily 02:00) that iterate every registered org
- All routes under
/api/prefix - Non-API routes serve the React SPA (
dist/index.html) - Return consistent error shape:
{ error: string, details?: unknown } - No authentication on API — local-only tool, localhost only
- Components in
src/ui/src/components/ - Pages in
src/ui/src/pages/ - API calls through a single typed client (
src/ui/src/lib/api.ts) - Use shadcn/ui components — do not build primitives from scratch
- Confirmation modals required before any destructive action (drop column, drop table)
Only bootstrap values live in .env — values needed before the DB exists.
POSTGRES_USER=sfdb
POSTGRES_PASSWORD=changeme
POSTGRES_PORT=7745
APP_PORT=7743
LOG_RETENTION_DAYS=14All runtime config (active org alias, sync intervals, enabled objects/fields) lives in the sfdb schema in the database.
- sf CLI binary is NOT in the Docker image. Auth tokens are read directly from the
~/.sfdx/JSON files mounted into the container. Nosf org displaycommand. - The API is not a data API. It serves the UI and orchestrates syncs only. External tools connect directly to Postgres.
- Deletions are soft.
sf_deleted_atis set — records are never hard-deleted from the local DB. - Bulk API 2.0 by default. REST query fallback only for objects under 2,000 records.
- Config in DB, not
.env..envis infrastructure only. Org registry, object selection, field selection, and schedule config all live insfdb.orgs/sfdb.sync_config/sfdb.field_config/sfdb.app_config. - Multi-org by schema. Every registered org gets its own
org_<orgid>schema. Removing an org drops the schema and cascades throughsfdb.*via the FKs onsfdb.orgs(org_id). - Schema name is derived from the immutable Salesforce org id, not the user-editable alias — aliases can be renamed without affecting where the data lives.