Portable Over Unreliable Changes Handler that pipes CouchDB / Sync Gateway / Capella _changes feeds into clean, reliable downstream pipelines.
A production-ready, async Python 3 processor for the Couchbase _changes feed. It connects to Sync Gateway, Capella App Services, Couchbase Edge Server, or Apache CouchDB, consumes document changes via longpoll or continuous streaming, and forwards them to a downstream consumer β stdout, HTTP endpoint, RDBMS (PostgreSQL, MySQL, MS SQL, Oracle), or cloud blob storage (AWS S3, MinIO, S3-compatible).
Built for real-world workloads: multi-job pipelines, checkpoint management so you never re-process, schema mapping with 58 transform functions, throttled feed consumption for large datasets, configurable retry with exponential backoff, and full async concurrency control.
When attachment processing is enabled, the pipeline includes an attachment stage:
v2.0 is a major architecture redesign that replaces the monolithic config.json with a job-centric, composable document model stored in Couchbase Lite collections.
- Multi-job pipelines β Run multiple independent
_changesfeed pipelines concurrently, each with its own source, output, schema mapping, and checkpoint. - Reusable inputs & outputs β Define a source once, wire it to multiple outputs via jobs. No more duplicating config.
- Job lifecycle control β Start, stop, restart, and monitor individual jobs via REST API or the dashboard.
- PipelineManager β Thread-per-job orchestrator with crash detection, exponential-backoff restart, and graceful shutdown.
- v1.x auto-migration β Existing
config.jsonis automatically migrated to the new model on first startup.
π Full architecture details: docs/DESIGN_2_0.md
π Release notes: RELEASE_NOTES.md
ββββββββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββββββ
β Sync Gateway / β β β β HTTP Endpoint β
β App Services / β ββGETββ β changes_worker β ββPUTββ β (any REST API) β
β Edge Server / β _changesβ β POST βββββββββββββββββββββββ€
β CouchDB β ββJSONβ β β’ Schema Mappingβ DELETE β RDBMS β
β β β β’ Serialize β βββββββΊ β (Postgres/MySQL/ β
β /{db}/_changes β β β’ Checkpoint β β MSSQL/Oracle) β
β β β β’ Dead Letter Q β βββββββββββββββββββββββ€
β β β β’ Attachments β β Cloud Storage β
β β β β β (AWS S3/MinIO) β
β β β β βββββββββββββββββββββββ€
β β β β β stdout β
ββββββββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββββββ
- Consume β Longpoll, continuous, or WebSocket
_changesfeed with auto-reconnect - Filter β Skip deletes, removes, or limit to specific channels
- Fetch β Bulk or individual doc fetching when
include_docs=false - Map β Schema mappings transform JSON documents into SQL rows, remapped JSON, etc.
- Attachments (optional) β Detect, fetch, and upload document attachments to cloud storage, HTTP, or filesystem
- Forward β Serialize (JSON, XML, msgpack, etc.) and send to stdout, HTTP, RDBMS, or S3
- Checkpoint β Save
last_seqas a_local/doc so restarts resume exactly where they left off
- Python 3.11+
- A running Sync Gateway, Capella App Services, Edge Server, or CouchDB instance
pip install -r requirements.txt
# Test connectivity first
python main.py --config config.json --test
# Run the worker
python main.py --config config.jsondocker build -t changes-worker .
docker run --rm \
-v $(pwd)/config.json:/app/config.json \
changes-worker# Headless β worker + Prometheus metrics only (port 9090)
docker compose up --build
# With Admin UI β worker + metrics + web dashboard (ports 9090 + 8080)
docker compose --profile ui up --buildSet "admin_ui": { "enabled": false } in config.json for headless deployments where you only need /_metrics on port 9090.
| Flag | Description |
|---|---|
--config <path> |
Path to config.json (default: config.json) |
--test |
Test connectivity to source + output, then exit (exit code 0/1) |
--version |
Print version and exit |
| Feature | Description |
|---|---|
| Multi-job pipelines | Run multiple independent _changes pipelines concurrently β each with its own source, output, mapping, and checkpoint |
| Job lifecycle control | Start, stop, restart, and monitor jobs via REST API or dashboard |
| Multi-source | Sync Gateway, App Services, Edge Server, CouchDB β automatic compatibility handling |
| Multiple outputs | stdout, HTTP endpoint, RDBMS (Postgres/MySQL/MSSQL/Oracle), AWS S3 (MinIO/S3-compatible) |
| Feed modes | Longpoll, continuous, WebSocket, SSE/EventSource |
| Schema mapping | Transform JSON docs into SQL table rows with 58 built-in transform functions |
| Checkpoint | CBL-style _local/ doc checkpoints β never re-process on restart |
| Dead letter queue | Failed docs saved for later retry (CBL or JSONL file) |
| Attachment processing | Detect, fetch, and upload document attachments to S3, HTTP, or filesystem with optional post-processing |
| Retry + backoff | Configurable exponential backoff on both source and output sides |
| Prometheus metrics | Built-in /_metrics endpoint with pipeline, system, and runtime metrics |
| Admin UI | Web dashboard with real-time monitoring, job management, schema editor, and setup wizard |
| Startup validation | Every config setting validated before launch β clear error messages |
| Dry run | Process the feed and log what would be sent without sending |
| Embedded storage | Couchbase Lite CE for config, checkpoints, mappings, and DLQ in Docker |
| Structured logging | SG-inspired log keys, per-key levels, file rotation, and sensitive data redaction |
π Full feature details: docs/FEATURES.md
| Capability | Sync Gateway | App Services | Edge Server | CouchDB |
|---|---|---|---|---|
| Feed types | longpoll, continuous, websocket | longpoll, continuous, websocket | longpoll, continuous, sse | longpoll, continuous, eventsource |
_bulk_get |
β | β | β (individual GET) | β |
| Bearer auth | β | β | β | β |
| Session cookie auth | β | β | β | β |
| Channels filter | β | β | β | β |
active_only |
β | β | β | β |
| Scoped keyspace | β | β | β | β |
π Full compatibility matrix & auto-behaviors: docs/SOURCE_TYPES.md
In v2.0, the monolithic config.json is replaced by a composable document model:
βββββββββββββββββ βββββββββββββββββ βββββββββββββββββ
β Input A β β Job 1 β β Output X β
β (SG prices) ββββββΊβ A β X ββββββΊβ (PostgreSQL) β
βββββββββββββββββ β + mapping β βββββββββββββββββ
βββββββββββββββββ
βββββββββββββββββ βββββββββββββββββ βββββββββββββββββ
β Input A β β Job 2 β β Output Y β
β (SG prices) ββββββΊβ A β Y ββββββΊβ (HTTP API) β
βββββββββββββββββ β + mapping β βββββββββββββββββ
βββββββββββββββββ
βββββββββββββββββ βββββββββββββββββ βββββββββββββββββ
β Input B β β Job 3 β β Output X β
β (SG orders) ββββββΊβ B β X ββββββΊβ (PostgreSQL) β
βββββββββββββββββ β + mapping β βββββββββββββββββ
βββββββββββββββββ
- Inputs β Reusable
_changesfeed source definitions (host, auth, feed settings) - Outputs β Reusable output configs by type: RDBMS, HTTP, Cloud (S3), stdout
- Jobs β Connect one input β one output with a schema mapping, checkpoint, and lifecycle
Each job runs in its own thread with an isolated asyncio event loop, HTTP session, and checkpoint.
π Document model & collections: docs/DESIGN_2_0.md
π Job lifecycle & document schema: docs/JOBS.md
| Method | Path | Description |
|---|---|---|
GET |
/api/jobs |
List all jobs with state |
GET |
/api/jobs/{id} |
Get a single job |
POST |
/api/jobs |
Create a new job |
PUT |
/api/jobs/{id} |
Update a job |
DELETE |
/api/jobs/{id} |
Delete a job |
POST |
/api/jobs/{id}/start |
Start a job |
POST |
/api/jobs/{id}/stop |
Graceful stop |
POST |
/api/jobs/{id}/restart |
Stop + start |
GET |
/api/jobs/{id}/state |
Job status, uptime, error count |
POST |
/api/_restart |
Restart all jobs |
POST |
/api/_offline |
Stop all jobs (keep config) |
POST |
/api/_online |
Resume all jobs after offline |
| Method | Path | Description |
|---|---|---|
GET |
/api/inputs_changes |
Get all input definitions |
POST |
/api/inputs_changes |
Save inputs document |
PUT |
/api/inputs_changes/{id} |
Update one input entry |
DELETE |
/api/inputs_changes/{id} |
Delete one input entry |
GET |
/api/outputs_{type} |
Get outputs (type = rdbms, http, cloud, stdout) |
POST |
/api/outputs_{type} |
Save outputs document |
PUT |
/api/outputs_{type}/{id} |
Update one output entry |
DELETE |
/api/outputs_{type}/{id} |
Delete one output entry |
| Method | Path | Description |
|---|---|---|
GET |
/api/config |
Get infrastructure config |
POST |
/api/config |
Save infrastructure config |
GET |
/_metrics |
Prometheus metrics endpoint |
GET |
/_status |
Health check |
A web-based admin dashboard at http://localhost:8080:
| Page | Path | Description |
|---|---|---|
| Dashboard | / |
Multi-job status table with per-job start/stop/restart controls, live charts, architecture diagram |
| Settings | /settings |
Infrastructure config (logging, metrics, admin UI, CBL, shutdown) |
| Schema Mappings | /schema |
Visual drag-and-drop field mapping with transforms, AI assist, and coverage stats |
| Setup Wizard | /wizard |
Guided setup: connect source β configure output β map fields β create job |
| Logs | /logs |
Real-time log viewer with job filter, log key filter, and level filter |
| Dead Letter Queue | /dlq |
Browse, retry, and purge failed documents with job and reason filtering |
| Glossary | /glossary |
Reference for all 58 built-in transform functions |
| Help | /help |
Documentation and getting started guide |
π Full documentation: docs/ADMIN_UI.md
Each job monitors exactly one scope/collection. To process multiple collections, create multiple jobs:
ββββββββββββββββββββββββββββββββββββββββ
β changes_worker (v2.0) β
β β
config.json βββΊ β PipelineManager β
β βββ Thread 1: Job "pricesβPG" β
β β βββ us.prices _changes β
β βββ Thread 2: Job "ordersβPG" β
β β βββ us.orders _changes β
β βββ Thread 3: Job "pricesβHTTP" β
β βββ us.prices _changes β
β β
β Shared: metrics :9090, UI :8080, β
β CBL store, maintenance β
ββββββββββββββββββββββββββββββββββββββββ
change_stream_db/
βββ main.py # Main worker entry point + poll_changes logic
βββ pipeline.py # Per-job thread wrapper (v2.0)
βββ pipeline_manager.py # Multi-job thread orchestrator (v2.0)
βββ pipeline_logging.py # Structured logging (log keys, redaction, rotation)
βββ cbl_store.py # Couchbase Lite CE storage layer (v2.0 collections)
βββ config.json # Configuration (v1.x format, auto-migrated to v2.0)
βββ requirements.txt # Python dependencies
βββ Dockerfile # Container image (includes CBL-C 3.2.1)
βββ docker-compose.yml # Docker Compose setup
βββ rest/
β βββ api_v2.py # v2.0 REST API: inputs, outputs, jobs CRUD
β βββ api_v2_jobs_control.py# Job lifecycle endpoints (start/stop/restart)
β βββ changes_http.py # _changes feed HTTP client logic
β βββ output_http.py # HTTP output, dead letter queue, serialization
β βββ attachments.py # Attachment processor orchestrator
β βββ attachment_config.py # Attachment configuration parser
β βββ attachment_stream.py # Streaming attachment download
β βββ attachment_upload.py # Upload to S3/HTTP/filesystem
β βββ attachment_multipart.py # Multipart attachment handling
β βββ attachment_postprocess.py # Post-upload actions (update doc, delete, purge)
βββ cloud/
β βββ cloud_base.py # Abstract base forwarder + CloudMetrics
β βββ cloud_s3.py # AWS S3 / MinIO / S3-compatible output
βββ db/
β βββ db_base.py # Base DB forwarder + schema mapping
β βββ db_postgres.py # PostgreSQL output
β βββ db_mysql.py # MySQL output
β βββ db_mssql.py # MS SQL Server output
β βββ db_oracle.py # Oracle output
βββ schema/
β βββ mapper.py # Schema mapper (JSON β SQL operations)
β βββ validator.py # Mapping file validator
βββ web/
β βββ server.py # Web server module
β βββ templates/ # Admin UI HTML pages (8 pages)
β βββ static/ # CSS, JS, icons, favicon
βββ mappings/ # Schema mapping files (JSON)
βββ tests/ # Unit tests (24 test files, 775+ tests)
βββ docs/ # Documentation (22 docs)
βββ img/ # Architecture diagrams
βββ guide/ # Developer guides (release checklist, style guide)
βββ logs/ # Log output (gitignored)
βββ release_works/ # Release planning documents
| Document | Description |
|---|---|
docs/DESIGN_2_0.md |
v2.0 architecture: job model, CBL collections, PipelineManager, phases |
docs/JOBS.md |
Job document schema, lifecycle, and checkpoint isolation |
docs/UI_JOBS_MANAGEMENT.md |
Multi-job UI design: dashboard, logs, DLQ filtering |
docs/CONFIGURATION.md |
Full config.json reference with all settings |
docs/FEATURES.md |
Detailed feature documentation (feeds, output, metrics, etc.) |
docs/SOURCE_TYPES.md |
Source compatibility matrix and auto-behaviors |
docs/DESIGN.md |
Pipeline architecture, failure modes, checkpoint strategies |
docs/SCHEMA_MAPPING.md |
Schema mapping format, transforms, and examples |
docs/ADMIN_UI.md |
Admin dashboard documentation |
docs/WIZARD.md |
Setup wizard guide |
docs/ATTACHMENTS.md |
Attachment processing: modes, detection, fetch, upload, post-process |
docs/CLOUD_BLOB_PLAN.md |
Cloud blob storage design document |
docs/LOGGING.md |
Structured logging: log keys, redaction, rotation, TRACE level |
docs/DLQ.md |
Dead letter queue: lifecycle, replay, retention |
docs/CBL_DATABASE.md |
Couchbase Lite database schema |
docs/CBL_STORE.md |
CBL store API reference |
docs/CHANGES_PROCESSING.md |
Changes feed processing internals |
docs/DEBUGGING.md |
Debugging and troubleshooting guide |
docs/RDBMS_IMPLEMENTATION.md |
RDBMS output implementation details |
docs/HA.md |
High Availability via CBL replication (v3.0 roadmap) |
docs/MULTI_PIPELINE_PLAN.md |
Multi-pipeline threading design reference |
guide/RELEASE.md |
Release checklist and best practices |
v2.0 is backward compatible at startup. On first run, the worker automatically migrates your existing config.json into the new document model:
- Creates an input entry from
gateway+auth+changes_feed - Creates an output entry from
output - Creates a default job connecting them
- Preserves your checkpoint
Your original config.json is not modified β you can roll back to v1.7.0 at any time. See the RELEASE_NOTES.md for rollback details.
See LICENSE.


