Problem: Writing to Couchbase Lite is expensive. Each checkpoint save can trigger multiple CBL operations.
Finding: The checkpoint save is the primary CBL operation in the hot path, happening every N documents (configurable).
- Entry:
rest/changes_http.py:843-1477-_process_changes_batch() - Checkpoint Call:
rest/changes_http.py:1284, 1331, 1467, 1477 - Checkpoint Class:
main.py:2180-2450-Checkpointclass
When SG checkpoint save fails (network error, unavailable, etc.), the system falls back to CBL storage.
checkpoint.save()→ tries SG (PUT {keyspace}/_local/checkpoint-{uuid})- On Exception →
_save_fallback(seq) _save_fallback()→CBLStore().save_checkpoint(uuid, seq, client_id)storage/cbl_store.py:816-839
1 × GET (mutable) [_coll_get_mutable_doc @ L820]
└─ CBLCollection_GetMutableDocument()
1 × SAVE [_coll_save_doc @ L827]
└─ CBLCollection_SaveDocumentWithConcurrencyControl()
Total: 2 CBL operations per fallback checkpoint save
rest/changes_http.py:1198-1287
if every_n_docs > 0 and sequential:
for i in range(0, len(filtered), every_n_docs):
sub_batch = filtered[i : i + every_n_docs]
# ... process sub_batch ...
await checkpoint.save(since, ...) # AFTER each sub-batchSaves per batch: len(filtered) / every_n_docs (rounded up)
rest/changes_http.py:1290-1340
else:
# Sequential checkpoint stride: save every N docs rather than
# after every single doc
checkpoint_stride = proc_cfg.get("checkpoint_stride", 100)Saves per batch: len(filtered) / checkpoint_stride
rest/changes_http.py:1345-1400
# Wait for all tasks to complete, then checkpoint once
tasks = [...]
await asyncio.gather(*tasks)
await checkpoint.save(since, ...) # ONCE per entire batchSaves per batch: 1 (single checkpoint after entire parallel batch completes)
- Batch size: 100 documents
- checkpoint_stride: 100 (default)
- SG availability: 10% (fails 90% of time, triggering fallback)
Per batch:
- 1 checkpoint save
- ~90% trigger fallback path
0.9 × 2 CBL ops = 1.8 CBL operations per batch
- Batch size: 100 documents
- every_n_docs: 10
- SG availability: 10%
Per batch:
- 10 checkpoint saves (100 / 10)
- ~90% trigger fallback per save
10 saves × 0.9 fallback rate × 2 CBL ops = 18 CBL operations per batch
storage/cbl_store.py:199-209
def _coll_get_mutable_doc(db, collection_name: str, doc_id: str):
coll = _get_collection(db, CBL_SCOPE, collection_name)
doc_ref = lib.CBLCollection_GetMutableDocument(
coll, stringParam(doc_id), _cbl_gError
)
# ...Cost: Medium
- Collection lookup (O(1))
- Document fetch from CBL storage
- Tracked as:
"operation": "SELECT"in logs
storage/cbl_store.py:212-223
def _coll_save_doc(db, collection_name: str, doc) -> None:
coll = _get_collection(db, CBL_SCOPE, collection_name)
doc._prepareToSave()
ok = lib.CBLCollection_SaveDocumentWithConcurrencyControl(
coll,
doc._ref,
0, # kCBLConcurrencyControlLastWriteWins
_cbl_gError,
)Cost: High
- Serialization (
_prepareToSave()) - Disk I/O (CBL database file write)
- Concurrency control check
- Tracked as:
"operation": "INSERT"or"operation": "UPDATE"
main.py:2250-2333 - Checkpoint.load()
When SG checkpoint fails to load (network error), fallback calls:
main.py:2408-2432
def _load_fallback(self) -> str:
if USE_CBL:
data = self._get_fallback_store().load_checkpoint(self._uuid)Which does:
storage/cbl_store.py:780-814
def load_checkpoint(self, uuid: str) -> dict | None:
doc = _coll_get_doc(self.db, COLL_CHECKPOINTS, doc_id) # 1 GETCost: 1 CBL operation (GET) during startup
Default: 100 → Recommended: 500-1000
{
"processing": {
"checkpoint_stride": 500
}
}Impact: Reduces checkpoint saves by 5-10×
- 100 doc batch: 1 save instead of ~10 saves
- If in fallback mode: 2 CBL ops instead of 20
The fallback only triggers on SG errors. Maintain 100% SG availability.
Current health check: None. Consider adding SG heartbeat monitoring.
Sequential mode can checkpoint more frequently.
{
"processing": {
"sequential": false,
"max_concurrent": 10
}
}Impact: Single checkpoint per batch (regardless of size)
If you guarantee SG is always reachable, disable fallback:
{
"checkpoint": {
"enabled": false
}
}Impact: Zero CBL operations (no checkpoint storage)
All checkpoint operations are tracked:
| Metric | Path | Meaning |
|---|---|---|
checkpoint_saves_total |
rest/changes_http.py:1286 | How many checkpoint saves attempted |
checkpoint_load_errors_total |
main.py:2318, 2331 | SG load failures triggering fallback |
checkpoint_save_errors_total |
main.py:2398 | SG save failures triggering fallback |
Log Events (storage/cbl_store.py):
"CBL checkpoint saved"withoperation="INSERT"oroperation="UPDATE""CBL checkpoint loaded"withoperation="SELECT"- Duration in
duration_ms
| Scenario | Checkpoints/Batch | CBL Ops/Checkpoint | Total CBL Ops/Batch |
|---|---|---|---|
| Parallel (100 docs) | 1 | 0 (SG success) | 0 |
| Parallel (100 docs, SG fails) | 1 | 2 (fallback) | 2 |
| Sequential (stride=100) | 1 | 2 (fallback) | 2 |
| Sequential (stride=10) | 10 | 2 each (fallback) | 20 |
| Sequential (stride=1) | 100 | 2 each (fallback) | 200 🔴 |
Recommendation: Use Parallel mode or increase checkpoint_stride to 500+.