Skip to content

Commit 21a9cc7

Browse files
committed
feat: initial release of sfetch
Self-hosted Salesforce-to-PostgreSQL sync pipeline. docker compose up and connect any BI tool or SQL client directly to Postgres.
0 parents  commit 21a9cc7

78 files changed

Lines changed: 19731 additions & 0 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.dockerignore

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
.env
2+
data/
3+
.git/
4+
node_modules/
5+
*.log
6+
*.tsbuildinfo
7+
.DS_Store
8+
src/api/dist/
9+
src/ui/dist/
10+
src/ui/node_modules/

.env.example

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Database
2+
POSTGRES_USER=sfdb
3+
POSTGRES_PASSWORD= # required — no default, must be set before starting
4+
POSTGRES_DB=sfdb
5+
POSTGRES_PORT=7745
6+
READONLY_PASSWORD=
7+
8+
# App
9+
APP_PORT=7743
10+
NODE_ENV=production
11+
12+
# Sync log retention (days)
13+
LOG_RETENTION_DAYS=14

.gitignore

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# Local data — never commit
2+
data/
3+
4+
# Environment — never commit
5+
.env
6+
7+
# Dependencies
8+
node_modules/
9+
10+
# Build output
11+
dist/
12+
build/
13+
.vite/
14+
15+
# TypeScript
16+
*.tsbuildinfo
17+
18+
# Logs
19+
*.log
20+
npm-debug.log*
21+
22+
# OS
23+
.DS_Store
24+
Thumbs.db
25+
26+
# IDE
27+
.vscode/
28+
.idea/
29+
30+
# Test coverage
31+
coverage/
32+
33+
# Misc
34+
.cache/
35+
tmp/

CLAUDE.md

Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
# sfetch — Agent Rules & Project Conventions
2+
3+
## What this project is
4+
5+
A locally-run Docker-based Salesforce-to-PostgreSQL data pipeline. The database is the product. External apps connect directly to Postgres. A React web UI handles configuration. Everything runs via `docker compose up`.
6+
7+
Full scope: `docs/scope.md`
8+
9+
---
10+
11+
## Commit Rules for Agents
12+
13+
**Agents must commit their work after completing each task.** Do not batch multiple tasks into one commit. One task = one commit (minimum).
14+
15+
### Commit format
16+
17+
```
18+
<type>(<scope>): <short description>
19+
20+
<optional body — what changed and why>
21+
22+
Co-Authored-By: Claude <noreply@anthropic.com>
23+
```
24+
25+
**Types:**
26+
- `feat` — new functionality
27+
- `fix` — bug fix
28+
- `chore` — config, tooling, setup
29+
- `refactor` — restructure without behavior change
30+
- `docs` — documentation only
31+
32+
**Scopes:** `scaffold`, `docker`, `db`, `auth`, `bulk-api`, `ddl`, `delta-sync`, `reconciliation`, `scheduler`, `api`, `ui`, `build`
33+
34+
**Examples:**
35+
```
36+
feat(auth): add SF org token reader from ~/.sf JSON files
37+
feat(delta-sync): implement initial full load when last_delta_sync is NULL
38+
chore(scaffold): initialize project structure and package.json files
39+
feat(ui): add objects page with sync toggle and row count display
40+
```
41+
42+
### When to commit
43+
44+
- After completing a full task from the task list
45+
- After a meaningful sub-step within a large task (e.g. after each route module, after each UI page)
46+
- Always before starting a new task
47+
- Never commit broken or partially-wired code — a commit should represent a working unit
48+
49+
### What to stage
50+
51+
Stage specific files by name — never `git add .` or `git add -A` blindly. Check `git status` first. Never stage:
52+
- `data/` (git-ignored, but double-check)
53+
- `.env` (git-ignored, but double-check)
54+
- Unrelated files touched incidentally
55+
56+
---
57+
58+
## Branch Strategy
59+
60+
All work goes to `main` for this project. It is a local-only tool with a single developer. No feature branches required unless explicitly requested.
61+
62+
---
63+
64+
## Tech Stack (quick reference)
65+
66+
| Layer | Technology |
67+
|---|---|
68+
| Database | PostgreSQL (Docker) |
69+
| Backend | Node.js + TypeScript + Express |
70+
| Frontend | React + TypeScript + shadcn/ui + Tailwind |
71+
| SF Auth | Read `~/.sf/` JSON directly (no sf binary in container) |
72+
| SF Data | jsforce + Bulk API 2.0 |
73+
| Scheduler | node-cron |
74+
| Containers | Docker + Docker Compose |
75+
76+
---
77+
78+
## Directory Structure
79+
80+
```
81+
sf-db/
82+
├── src/
83+
│ ├── api/ # Express backend + sync engine
84+
│ └── ui/ # React frontend
85+
├── docker/ # Dockerfiles only
86+
├── data/ # LOCAL ONLY — git-ignored
87+
│ ├── docker/ # Postgres volume mount
88+
│ └── downloads/ # Future file exports
89+
├── docs/ # Project documentation
90+
├── docker-compose.yml
91+
├── .env # Git-ignored — ports, DB creds, LOG_RETENTION_DAYS
92+
├── .env.example # Committed — template with empty values
93+
└── CLAUDE.md
94+
```
95+
96+
---
97+
98+
## Code Conventions
99+
100+
### TypeScript
101+
- Strict mode on (`"strict": true` in tsconfig)
102+
- No `any` — use `unknown` and narrow properly
103+
- Prefer `interface` over `type` for object shapes
104+
- Async/await over raw promises
105+
- All database queries go through the pg pool — never create ad-hoc connections
106+
107+
### Postgres
108+
- Synced Salesforce data → `salesforce` schema
109+
- Internal app tables → `sfdb` schema
110+
- Every synced table must have: `id`, `sf_created_at`, `sf_updated_at`, `sf_deleted_at`, `synced_at`
111+
- Field names are lowercase snake_case versions of SF API names
112+
- DDL is always idempotent (`IF NOT EXISTS` / `IF EXISTS`)
113+
114+
### Sync engine
115+
- Always acquire `sfdb.sync_lock` before running any sync
116+
- Always release the lock in a `finally` block — never leave it held on error
117+
- If `last_delta_sync` is NULL → initial full load (no SystemModstamp WHERE clause)
118+
- Stale lock threshold: 30 minutes
119+
- Log purge runs at the start of every sync (delete rows older than `LOG_RETENTION_DAYS`)
120+
121+
### API
122+
- All routes under `/api/` prefix
123+
- Non-API routes serve the React SPA (`dist/index.html`)
124+
- Return consistent error shape: `{ error: string, details?: unknown }`
125+
- No authentication on API — local-only tool, localhost only
126+
127+
### React / UI
128+
- Components in `src/ui/src/components/`
129+
- Pages in `src/ui/src/pages/`
130+
- API calls through a single typed client (`src/ui/src/lib/api.ts`)
131+
- Use shadcn/ui components — do not build primitives from scratch
132+
- Confirmation modals required before any destructive action (drop column, drop table)
133+
134+
---
135+
136+
## Environment Variables
137+
138+
Only bootstrap values live in `.env` — values needed before the DB exists.
139+
140+
```env
141+
POSTGRES_USER=sfdb
142+
POSTGRES_PASSWORD=changeme
143+
POSTGRES_PORT=7745
144+
APP_PORT=7743
145+
LOG_RETENTION_DAYS=14
146+
```
147+
148+
All runtime config (active org alias, sync intervals, enabled objects/fields) lives in the `sfdb` schema in the database.
149+
150+
---
151+
152+
## Key Design Decisions (do not revisit without good reason)
153+
154+
- **sf CLI binary is NOT in the Docker image.** Auth tokens are read directly from the `~/.sf/` JSON files mounted into the container. No `sf org display` command.
155+
- **The API is not a data API.** It serves the UI and orchestrates syncs only. External tools connect directly to Postgres.
156+
- **Deletions are soft.** `sf_deleted_at` is set — records are never hard-deleted from the local DB.
157+
- **Bulk API 2.0 by default.** REST query fallback only for objects under 2,000 records.
158+
- **Config in DB, not `.env`.** `.env` is infrastructure only. Org alias, object selection, field selection, and schedule config all live in `sfdb.app_config` / `sfdb.sync_config` / `sfdb.field_config`.
159+
- **One active org at a time.** Multi-org simultaneous sync is out of scope for v1.

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2026 Seth Sheppard
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
# sfetch
2+
3+
A self-hosted Salesforce-to-PostgreSQL sync pipeline. Run it with `docker compose up`, point a BI tool or SQL client at the local Postgres instance, and query your Salesforce data like a normal database.
4+
5+
**The database is the product.** The web UI configures which objects and fields to sync. External tools connect directly to Postgres — no intermediate API.
6+
7+
---
8+
9+
> **Security note:** The web UI and API have no authentication. This is intentional — it is a localhost-only tool. Both services bind to `127.0.0.1` and must not be exposed to a network. Do not run this on a shared or internet-accessible host without adding an auth layer.
10+
11+
---
12+
13+
## Features
14+
15+
- Configure sync from a web UI — no config files to edit
16+
- Select individual objects and fields to sync
17+
- Delta sync (frequent, catches creates/updates) + full ID reconciliation (nightly, catches hard deletes)
18+
- Soft deletes: records deleted in Salesforce get `sf_deleted_at` set, never hard-deleted locally
19+
- Salesforce Bulk API 2.0 for large volumes; REST fallback for small objects
20+
- Auth via `~/.sfdx` files — no Salesforce credentials stored in the project
21+
- Sync logs with per-object record counts, errors, and duration
22+
23+
## Prerequisites
24+
25+
- [Docker Desktop](https://www.docker.com/products/docker-desktop/)
26+
- [Salesforce CLI](https://developer.salesforce.com/tools/salesforcecli) (`sf`) with at least one org authenticated
27+
28+
## Quick start
29+
30+
```bash
31+
# 1. Authenticate a Salesforce org (skip if already done)
32+
sf org login web --alias my-org
33+
34+
# 2. Configure environment
35+
cp .env.example .env
36+
# Edit .env — set POSTGRES_PASSWORD at minimum
37+
38+
# 3. Start
39+
docker compose up -d
40+
41+
# 4. Open the UI
42+
open http://localhost:7743
43+
```
44+
45+
First start takes ~30 seconds while Postgres initializes and the API container builds.
46+
47+
The onboarding screen will detect your authenticated orgs and ask you to pick one. After that, go to the Objects page and enable the Salesforce objects you want to sync.
48+
49+
## Connect a BI tool or SQL client
50+
51+
Once data is syncing, connect any Postgres-compatible tool directly:
52+
53+
| Setting | Default value |
54+
|----------|-----------------------|
55+
| Host | `localhost` |
56+
| Port | `7745` |
57+
| Database | `sfdb` |
58+
| Schema | `salesforce` |
59+
| User | `sfdb` |
60+
| Password | *(your `.env` value)* |
61+
62+
The Settings page in the UI shows a copyable connection string.
63+
64+
A read-only role is also available — set `READONLY_PASSWORD` in `.env` and connect as user `sfdb_readonly`.
65+
66+
## Ports
67+
68+
| Service | Default | Set via |
69+
|---------------|---------|---------------|
70+
| UI + API | `7743` | `APP_PORT` |
71+
| PostgreSQL | `7745` | `POSTGRES_PORT` |
72+
73+
Both default ports are chosen to avoid conflicts with common local services.
74+
75+
## Environment variables
76+
77+
`.env` holds only bootstrap config — values needed before the database exists. All runtime config (active org, sync intervals, enabled objects) lives in the database.
78+
79+
```env
80+
POSTGRES_USER=sfdb
81+
POSTGRES_PASSWORD= # required
82+
POSTGRES_DB=sfdb
83+
POSTGRES_PORT=7745
84+
READONLY_PASSWORD= # optional read-only role password
85+
86+
APP_PORT=7743
87+
NODE_ENV=production
88+
89+
LOG_RETENTION_DAYS=14
90+
```
91+
92+
Copy `.env.example` to `.env` and fill in at minimum `POSTGRES_PASSWORD`.
93+
94+
## How sync works
95+
96+
### Delta sync (default: every hour)
97+
98+
Queries `WHERE SystemModstamp >= last_delta_sync` via Bulk API 2.0, streams the CSV result, and batch-upserts into Postgres. On first sync, no WHERE clause — pulls all records.
99+
100+
### Full ID reconciliation (default: nightly)
101+
102+
Queries `SELECT Id FROM <Object>` for the full live ID set, diffs against local rows, and sets `sf_deleted_at` on any that are gone. This is the only way to catch hard deletes, merges, and cascade deletes.
103+
104+
### Concurrency
105+
106+
Only one sync runs at a time. A single-row lock table (`sfdb.sync_lock`) prevents overlap. Stale locks (> 30 min) are automatically reclaimed on startup.
107+
108+
## Database schema
109+
110+
**`salesforce` schema** — one table per enabled Salesforce object, e.g. `salesforce.account`
111+
112+
| Column | Type | Notes |
113+
|---|---|---|
114+
| `id` | `text PRIMARY KEY` | 18-char Salesforce ID |
115+
| *(enabled fields)* | *(mapped type)* | Lowercased snake_case API names |
116+
| `sf_created_at` | `timestamptz` | CreatedDate |
117+
| `sf_updated_at` | `timestamptz` | SystemModstamp |
118+
| `sf_deleted_at` | `timestamptz NULL` | NULL = live; set when deletion detected |
119+
| `synced_at` | `timestamptz` | Last written by this tool |
120+
121+
**`sfdb` schema** — internal app tables (sync config, logs, lock, field metadata)
122+
123+
## Tech stack
124+
125+
| Layer | Technology |
126+
|---|---|
127+
| Database | PostgreSQL 16 |
128+
| Backend | Node.js + TypeScript + Express |
129+
| Frontend | React + TypeScript + shadcn/ui + Tailwind |
130+
| Salesforce auth | `~/.sfdx` files read directly via Node `fs` (no `sf` binary in container) |
131+
| Salesforce data | jsforce + Bulk API 2.0 |
132+
| Scheduling | node-cron |
133+
| Containers | Docker + Docker Compose |
134+
135+
## Stopping and data persistence
136+
137+
```bash
138+
docker compose down # stop containers — data persists
139+
docker compose down -v # stop and delete all data
140+
```
141+
142+
Postgres data lives in `data/docker/postgres/` (git-ignored). It survives stop/start cycles.
143+
144+
## Rebuilding after code changes
145+
146+
```bash
147+
docker compose down
148+
docker compose build
149+
docker compose up -d
150+
```
151+
152+
## License
153+
154+
MIT — see [LICENSE](LICENSE).

0 commit comments

Comments
 (0)