This document describes the long-running system tests for Couchbase Lite Edge Server replication and resilience, including a stability run and a chaos run with periodic Edge Server kill/restart.
Test that create, update, and delete operations replicate correctly between Sync Gateway and Edge Server over an extended period (up to 6 hours), with cycle (sync_gateway or edge_server) and operations (create, create_update_delete, or create_delete) randomly chosen per document.
Steps:
- Setup
- Create a bucket
bucket-1on Couchbase Server - Add 10 documents
doc_1throughdoc_10to the bucket (each withid,channels: ["public"],timestamp) - Create a database
db-1on Sync Gateway with bucketbucket-1, sync functionfunction(doc){channel(doc.channels);}, num_index_replicas: 0 - Add role
stdroleand usersync_gatewayto Sync Gateway with collection access_default._default: ["public"] - Create database
dbon Edge Server using configtest_e2e_empty_database.jsonwith replication source set to Sync Gateway URL fordb-1 - Wait for Edge Server replication to become idle
- Get all documents from Sync Gateway
db-1; verify count is 10 - Get all documents from Edge Server
db; verify count is 10
- Create a bucket
- Set document counter to 11; run loop until 6 hours have elapsed (or test stop).
- For each iteration: set
doc_idtodoc_{counter}; randomly choosecycleandoperationsas above. - If cycle is
sync_gateway:- Create document
doc_idvia Sync Gateway indb-1with standard body; verify create succeeds - Wait a random 1–5 seconds
- Get document
doc_idfrom Edge Serverdb; verify it exists,idmatches, and_revis present; capture revision ID - If
operationsincludes update: update document via Sync Gateway with body includingchanged: "yes"and captured revision; verify update succeeds; get document from Edge Server again and verify revision ID changed; set captured revision to new value - If
operationsincludes delete: delete document via Edge Server using captured revision; verify delete succeeds; get document from Edge Server and verify request fails; wait 2 seconds; get document from Sync Gateway and verify request fails
- Create document
- If cycle is
edge_server:- Create document
doc_idvia Edge Server indbwith standard body; verify create succeeds - Wait 5 seconds
- Get document
doc_idfrom Sync Gatewaydb-1; verify it exists,idmatches, and_revis present; capture revision ID - If
operationsincludes update: update document via Edge Server with body includingchanged: "yes"and captured revision; verify update succeeds; get document from Sync Gateway and verify revision ID changed; set captured revision to new value - If
operationsincludes delete: delete document via Sync Gateway using captured revision; wait 2 seconds; get document from Edge Server and verify request fails
- Create document
- Increment document counter; repeat from step 3.
Test that replication and document counts remain consistent when the Edge Server is periodically killed and restarted (40% chance per iteration, 1-minute down window), with random create/update/delete operations over an extended period.
Steps:
- Setup: Same as test_system_one_client_l (bucket, 10 docs, SG db
db-1, ES dbdb, verify 10 docs on both). Initializeedge_server_down = falseand chaos window end (e.g. far future). Set document counter to 11. - Run loop until 6 hours have elapsed (or test stop).
- Chaos recovery: If current time is past the chaos window end:
- Start the Edge Server; wait 10 seconds; set
edge_server_down = false - Get document counts from Sync Gateway and Edge Server; verify they are equal
- Start the Edge Server; wait 10 seconds; set
- Set
doc_idtodoc_{counter}; randomly choosecycleandoperationsas in test_system_one_client_l. - Chaos trigger: If Edge Server is up and random value ≤ 0.4:
- Kill the Edge Server; set chaos window end to current time + 1 minute; wait 10 seconds; set
edge_server_down = true
- Kill the Edge Server; set chaos window end to current time + 1 minute; wait 10 seconds; set
- If cycle is
sync_gateway:- Create document
doc_idvia Sync Gateway indb-1; verify create succeeds; wait 1–5 seconds - If not
edge_server_down: get document from Edge Serverdband verify it exists with matchingidand present_rev; capture revision from created doc for update/delete - If
operationsincludes update: update via Sync Gateway withchanged: "yes"and revision; verify update succeeds; if notedge_server_down, get document from Edge Server and verify revision changed; set revision to updated doc’s revision - If
operationsincludes delete and notedge_server_down: delete via Edge Server; verify success; verify get from Edge Server fails; wait 2 seconds; verify get from Sync Gateway fails
- Create document
- If cycle is
edge_serverand notedge_server_down:- Create document
doc_idvia Edge Server indb; verify create succeeds; get from Sync Gateway and verify exists with matchingidand_rev; capture revision - If
operationsincludes update: update via Edge Server withchanged: "yes"and revision; verify update succeeds; get from Sync Gateway and verify revision changed; set revision to new value - If
operationsincludes delete: delete via Sync Gateway using revision; wait 2 seconds; verify get from Edge Server fails
- Create document
- Increment document counter; repeat from step 2.
Test that 4 concurrent async client coroutines can independently perform create, update, and delete operations via both Sync Gateway and Edge Server over an extended period (up to 6 hours) without interference, and that document counts reconcile correctly at the end.
Steps:
- Setup: Same as
test_system_one_client_l(bucket-1 with 10 docs, SG dbdb-1, ES dbdbwith replication, verify 10 docs on both). - Define
client_worker(client_id)as an async coroutine. Each worker initialises its owndoc_counterat 1 and loops until 6 hours have elapsed. - For each iteration in
client_worker: setdoc_idtoc{client_id}_doc_{doc_counter}; randomly choosecycleandoperations. - If cycle is
sync_gateway:- Create document
doc_idvia Sync Gateway indb-1with standard body; verify create succeeds - Wait a random 1–5 seconds (
asyncio.sleep, non-blocking) - Get document
doc_idfrom Edge Serverdb; verify it exists,idmatches, and_revis present; capture revision ID - If
operationsincludes update: update document via Sync Gateway with body includingchanged: "yes"and captured revision; verify update succeeds; get document from Edge Server and verify revision changed; set captured revision to new value - If
operationsincludes delete: delete document via Edge Server using captured revision; verify delete succeeds; verify get from Edge Server raisesCblEdgeServerBadResponseError; wait 2 seconds; verify get from Sync Gateway raisesCblSyncGatewayBadResponseError
- Create document
- If cycle is
edge_server:- Create document
doc_idvia Edge Server indbwith standard body; verify create succeeds - Wait 5 seconds (
asyncio.sleep, non-blocking) - Get document
doc_idfrom Sync Gatewaydb-1; verify it exists,idmatches, and_revis present; capture revision ID - If
operationsincludes update: update document via Edge Server with body includingchanged: "yes"and captured revision; verify update succeeds; get document from Sync Gateway and verify revision changed; set captured revision to new value - If
operationsincludes delete: delete document via Sync Gateway using captured revision; wait 2 seconds; verify get from Edge Server raisesCblEdgeServerBadResponseError
- Create document
- Increment
doc_counter; repeat from step 3. - Launch all 4
client_workercoroutines concurrently withasyncio.gather; all run simultaneously for the full 6-hour window. - Final reconciliation: Get all documents from Sync Gateway and Edge Server; verify counts are equal.
Test that 4 concurrent async client coroutines continue to operate correctly while a dedicated chaos_controller coroutine kills and restarts the Edge Server at random intervals (every 5–20 minutes, with a 1-minute down window), verifying document count consistency after each restart and at the end of the 6-hour run.
Steps:
- Setup: Same as
test_system_one_client_l(bucket-1 with 10 docs, SG dbdb-1, ES dbdbwith replication, verify 10 docs on both). Initialiseshared = {"edge_server_down": False}andrecent_docs = []. - Define
chaos_controller()as an async coroutine that loops until 6 hours have elapsed:- Wait a random 5–20 minutes (
asyncio.sleep(300–1200), non-blocking) - If the 6-hour window has expired, exit the loop
- Kill the Edge Server; set
shared["edge_server_down"] = True; wait 10 seconds - Wait an additional 60 seconds (1-minute down window)
- Restart the Edge Server; wait 10 seconds; set
shared["edge_server_down"] = False - Get all documents from Sync Gateway and Edge Server; verify counts are equal
- Wait a random 5–20 minutes (
- Define
fire_read_burst(doc_id)as an async helper: ifshared["edge_server_down"]isTrue, return immediately; otherwise issueNUM_CLIENTSconcurrentget_documentcalls fordoc_idagainst Edge Server usingasyncio.gather(return_exceptions=True); for each non-exception result assert the document is not None. - Define
client_worker(client_id)as an async coroutine. Each worker initialisesdoc_counterat 1 and loops until 6 hours have elapsed. - For each iteration in
client_worker: setdoc_idtocc{client_id}_doc_{doc_counter}; randomly choosecycleandoperations. - If cycle is
sync_gateway:- Create document
doc_idvia Sync Gateway indb-1; verify create succeeds; wait 1–5 seconds - If not
edge_server_down: get document from Edge Server; verify it exists with matchingidand_rev; appenddoc_idtorecent_docs(evict oldest if list exceeds 10); callfire_read_burst(doc_id); capturerev_idfrom created doc - If
operationsincludes update: update via Sync Gateway withchanged: "yes"andrev_id; verify update succeeds; if notedge_server_down, get document from Edge Server and verify revision changed; updaterev_idfrom updated doc - If
operationsincludes delete and notedge_server_down: delete via Edge Server; verify delete response hasok: true; verify get from Edge Server raisesCblEdgeServerBadResponseError; wait 2 seconds; verify get from Sync Gateway raisesCblSyncGatewayBadResponseError
- Create document
- If cycle is
edge_serverand notedge_server_down:- Create document
doc_idvia Edge Server indb; verify create succeeds; wait 5 seconds - Get document from Sync Gateway; verify it exists with matching
idand_rev; capturerev_id; appenddoc_idtorecent_docs(evict oldest if list exceeds 10); callfire_read_burst(doc_id) - If
operationsincludes update: update via Edge Server withchanged: "yes"andrev_id; verify update succeeds; get from Sync Gateway and verify revision changed; updaterev_id - If
operationsincludes delete: delete via Sync Gateway usingrev_id; wait 2 seconds; verify get from Edge Server raisesCblEdgeServerBadResponseError
- Create document
- Increment
doc_counter; repeat from step 5. - Launch all 4
client_workercoroutines andchaos_controllerconcurrently withasyncio.gather(5 total tasks); all run simultaneously for the full 6-hour window. - Final reconciliation: Get all documents from Sync Gateway and Edge Server; verify counts are equal.