Skip to content

Latest commit

 

History

History
127 lines (89 loc) · 5.63 KB

File metadata and controls

127 lines (89 loc) · 5.63 KB

Couchbase SDK Connection — Shared Concepts

Platform-agnostic reference shared by all server-connection-* skills.


The One Rule: One Cluster Per Process

Create one Cluster object per application process and reuse it for the lifetime of the application. Creating a new Cluster per request pays TCP + TLS + auth overhead (50–500 ms) and leaks connections.

The SDK maintains an internal connection pool to every data node. Reusing the Cluster object means reusing those connections. Destroying and recreating it on every request is the single most common cause of Couchbase performance problems in production.


Connection String Formats

couchbase://localhost                          # single node (dev/test)
couchbase://node1,node2,node3                  # multi-node (any node works as seed)
couchbases://cb.xxxxx.cloud.couchbase.com      # TLS — Capella or self-managed with TLS
  • couchbase:// — plaintext, port 11210 for KV, 8091 for management
  • couchbases:// — TLS, port 11207 for KV, 18091 for management
  • Capella always requires couchbases:// and a valid TLS certificate

Timeout Semantics

All SDKs expose the same logical timeouts under different names:

Concept What it covers Recommended default
Connect timeout Initial TCP + TLS + auth + topology fetch 10 s
KV timeout Single key-value operation (get, upsert, etc.) 2.5 s
Query timeout N1QL / SQL++ query execution 75 s
Search timeout Full-text or vector search query 75 s
Analytics timeout Analytics (CBAS / columnar) query 75 s
Management timeout Admin API calls (bucket create, index create) 75 s

When to adjust:

  • Increase connect_timeout only in high-latency or slow-DNS environments
  • Increase kv_timeout only if P99 KV latency data justifies it — a high timeout masks cluster health problems
  • Increase query_timeout for known long-running analytical queries; keep it low for OLTP
  • Never set timeouts to 0 or very large values in production

Serverless / Cloud Functions

The Cluster must be initialized outside the handler function to survive across warm invocations. Cold starts pay the connection cost; warm invocations reuse the existing connection.

Key points:

  • Use a module-level or static variable to hold the Cluster / Collection reference
  • Call waitUntilReady / WaitUntilReady once during initialization, not per request
  • Set connect_timeout to 10 s and kv_timeout to 2.5 s — serverless functions have strict execution budgets
  • Store credentials in environment variables, not in code

See references/deployment-scenarios.md in each language skill for scenario-specific timeout values and code examples.


Durability Levels

Durability controls how many nodes must acknowledge a write before it is considered successful.

Level Guarantee Latency impact
None Acknowledged by active node memory only Lowest
Majority Acknowledged by a majority of replicas in memory Low
MajorityAndPersistActive Majority in memory + persisted on active node disk Medium
PersistToMajority Persisted on a majority of nodes' disks Highest

Guidance:

  • Use Majority for most production writes — protects against active node failure without high latency
  • Use PersistToMajority for financial transactions or data that must survive simultaneous node + disk failure
  • None is acceptable for ephemeral data (caches, sessions) where loss is tolerable
  • Durability requires at least as many replicas as the level demands; PersistToMajority needs ≥ 2 replicas

Sub-Document Operations

Sub-document operations read or mutate a specific path inside a document without fetching the entire body. Use them when:

  • The document is large and you only need one field
  • You need atomic field-level increments (counters, append to array)
  • You want to avoid read-modify-write races on a single field

Sub-document operations use the same KV path as full-document operations and have the same low latency.


Troubleshooting

sdk-doctor

sdk-doctor diagnose "$CB_CONNECTION_STRING" -u "$CB_USERNAME" -p "$CB_PASSWORD"

Checks DNS resolution, port reachability (8091, 11210), TLS certificate validity, authentication, and topology discovery. Run this first when a connection fails.

Common errors

Error Cause Fix
UnambiguousTimeoutException Operation timed out; cluster did not receive it Retry safely; check cluster health and kv_timeout
AmbiguousTimeoutException Timeout after send — operation may have completed Verify state before retrying mutations
AuthenticationFailureException Wrong credentials or missing RBAC role Check user roles in Couchbase UI → Security → Users
BucketNotFoundException Wrong bucket name (case-sensitive) Verify exact name in Couchbase UI
Connection refused / ECONNREFUSED Port 8091 or 11210 blocked SDK must reach all data nodes directly — check firewall rules
TLS certificate error Self-signed cert not trusted Add cluster CA cert to trust store; use couchbases://

Port reference

Port Protocol Service
8091 HTTP Management API (bootstrap)
18091 HTTPS Management API (TLS)
11210 TCP KV (data service)
11207 TLS KV (data service, TLS)
8093 HTTP Query service (N1QL)
8094 HTTP Search service (FTS)
8095 HTTP Analytics service

The SDK must be able to reach all data nodes on port 11210/11207, not just the bootstrap node. NAT and port-forwarding setups that expose only one node will fail after topology discovery.