Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 26 additions & 9 deletions base/bootstrap.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ import (
"context"
"errors"
"fmt"
"net/http"
"reflect"
"sort"
"sync"
Expand Down Expand Up @@ -53,13 +54,14 @@ type BootstrapConnection interface {

// CouchbaseClusterSpec define how to make a connection to Couchbase Server
type CouchbaseClusterSpec struct {
Server string // connection string to connect to the Couchbase cluster
Username string // RBAC username to authenticate with the cluster
Password string // RBAC password to authenticate with the cluster
X509Certpath string // X.509 cert path to authenticate with the cluster
X509Keypath string // X.509 key path to authenticate with the cluster
CACertpath string // CA cert path to use for TLS connections
TLSSkipVerify bool // If true, do not validate TLS certificate
Server string // connection string to connect to the Couchbase cluster
Username string // RBAC username to authenticate with the cluster
Password string // RBAC password to authenticate with the cluster
X509Certpath string // X.509 cert path to authenticate with the cluster
X509Keypath string // X.509 key path to authenticate with the cluster
CACertpath string // CA cert path to use for TLS connections
TLSSkipVerify bool // If true, do not validate TLS certificate
UseGOCBFastFailRetry bool // When true, readiness checks fail fast instead of using the best-effort retry strategy
}

// CouchbaseCluster is a GoCBv2 implementation of BootstrapConnection
Expand All @@ -73,6 +75,7 @@ type CouchbaseCluster struct {
cachedBucketConnections cachedBucketConnections // Per-bucket cached connections
cachedConnectionLock sync.Mutex // mutex for access to cachedBucketConnections
configPersistence ConfigPersistence // ConfigPersistence mode
useGOCBFastFailRetry bool // When true, readiness checks fail fast instead of using the best-effort retry strategy
}

type BucketConnectionMode int
Expand Down Expand Up @@ -198,6 +201,7 @@ func NewCouchbaseCluster(ctx context.Context, clusterSpec CouchbaseClusterSpec,
perBucketAuth: perBucketAuth,
clusterOptions: clusterOptions,
bucketConnectionMode: bucketMode,
useGOCBFastFailRetry: clusterSpec.UseGOCBFastFailRetry,
}

if bucketMode == CachedClusterConnections {
Expand All @@ -212,6 +216,12 @@ func NewCouchbaseCluster(ctx context.Context, clusterSpec CouchbaseClusterSpec,
return cbCluster, nil
}

// UseGOCBFastFailRetry returns whether gocb operations on this cluster should fail fast instead of using the
// best-effort retry strategy.
func (cc *CouchbaseCluster) UseGOCBFastFailRetry() bool {
return cc.useGOCBFastFailRetry
}

// connect attempts to open a gocb.Cluster connection. Callers will be responsible for closing the connection.
// Pass an authenticator to use that to connect instead of using the cluster credentials.
func (cc *CouchbaseCluster) connect(auth *gocb.Authenticator) (*gocb.Cluster, error) {
Expand All @@ -230,7 +240,7 @@ func (cc *CouchbaseCluster) connect(auth *gocb.Authenticator) (*gocb.Cluster, er
err = cluster.WaitUntilReady(time.Second*10, &gocb.WaitUntilReadyOptions{
DesiredState: gocb.ClusterStateOnline,
ServiceTypes: []gocb.ServiceType{gocb.ServiceTypeManagement},
RetryStrategy: &goCBv2FailFastRetryStrategy{},
RetryStrategy: goCBRetryStrategy(cc.useGOCBFastFailRetry),
})
if err != nil {
_ = cluster.Close(nil)
Expand Down Expand Up @@ -579,7 +589,7 @@ func (cc *CouchbaseCluster) connectToBucket(ctx context.Context, bucketName stri
b = connection.Bucket(bucketName)
err = b.WaitUntilReady(time.Second*10, &gocb.WaitUntilReadyOptions{
DesiredState: gocb.ClusterStateOnline,
RetryStrategy: &goCBv2FailFastRetryStrategy{},
RetryStrategy: goCBRetryStrategy(cc.useGOCBFastFailRetry),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nit but I find the capitalization on this to be somewhat sanity inducing:

I'd prefer: gocbRetryStrategy to goCBRetryStrategy.

I think for this function, I would do something like: func (cc *CouchbaseCluster) getRetryStrategy() to make this code easier to read.

ServiceTypes: []gocb.ServiceType{gocb.ServiceTypeKeyValue},
})
if err != nil {
Expand All @@ -589,6 +599,13 @@ func (cc *CouchbaseCluster) connectToBucket(ctx context.Context, bucketName stri
return nil, nil, ErrAuthError
}

// In best-effort retry mode a missing/unreachable bucket surfaces as a timeout rather than an
// auth failure. Classify it as a connection error so callers translate it to a 502 instead of a
// generic 500, mirroring the per-database connection path (see db.connectToBucketErrorHandling).
if !cc.useGOCBFastFailRetry {
return nil, nil, HTTPErrorf(http.StatusBadGateway,
"Unable to connect to Couchbase Server. Please ensure it is running and reachable, and that bucket %q exists. Error: %s", MD(bucketName).Redact(), err)
}
return nil, nil, err
}

Expand Down
1 change: 1 addition & 0 deletions base/bucket.go
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,7 @@ type BucketSpec struct {
ViewQueryTimeoutSecs *uint32 // the view query timeout in seconds (default: 75 seconds)
MaxConcurrentQueryOps *int // maximum number of concurrent query operations (default: DefaultMaxConcurrentQueryOps)
BucketOpTimeout *time.Duration // How long bucket ops should block returning "operation timed out". If nil, uses GoCB default. GoCB buckets only.
UseGOCBFastFailRetry bool // When true, gocb readiness checks and index lookups fail fast instead of using the best-effort retry strategy
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about doing something like: InitialConnectionRetryStrategy *gocb.RetryStrategy with something like "If unset, use the default retry strategy for sync gateway.

The other idea if this is confusing is to create an enum for types of strategies:

  1. DefaultRetryStrategy
  2. FastFailOnInitialConnectRetryStrategy

}

const defaultNumRetries = 10
Expand Down
54 changes: 32 additions & 22 deletions base/bucket_gocb_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1987,31 +1987,41 @@ func TestCouchbaseServerIncorrectLogin(t *testing.T) {
if UnitTestUrlIsWalrus() {
t.Skip("This test only works against Couchbase Server")
}
LongRunningTest(t)

ctx := TestCtx(t)
for _, tls := range []bool{true, false} {
t.Run(fmt.Sprintf("tls=%v", tls), func(t *testing.T) {
testBucket := GetTestBucket(t)
defer testBucket.Close(ctx)

// Override test bucket spec with invalid creds
testBucket.BucketSpec.Auth = TestAuthenticator{
Username: "invalid_username",
Password: "invalid_password",
BucketName: testBucket.BucketSpec.BucketName,
}
if tls {
testBucket.BucketSpec.Server = strings.ReplaceAll(testBucket.BucketSpec.Server, "couchbase://", "couchbases://")
testBucket.BucketSpec.TLSSkipVerify = true // test env isn't always using valid certs
} else {
testBucket.BucketSpec.Server = strings.ReplaceAll(testBucket.BucketSpec.Server, "couchbases://", "couchbase://")
}

// Attempt to open the bucket again using invalid creds. We should expect an error.
bucket, err := GetBucket(TestCtx(t), testBucket.BucketSpec)
assert.Equal(t, ErrAuthError, err)
assert.Nil(t, bucket)
})
for _, fastFail := range []bool{true, false} {
t.Run(fmt.Sprintf("tls=%v/fastFail=%v", tls, fastFail), func(t *testing.T) {
testBucket := GetTestBucket(t)
defer testBucket.Close(ctx)

// Override test bucket spec with invalid creds
testBucket.BucketSpec.Auth = TestAuthenticator{
Username: "invalid_username",
Password: "invalid_password",
BucketName: testBucket.BucketSpec.BucketName,
}
testBucket.BucketSpec.UseGOCBFastFailRetry = fastFail
if tls {
testBucket.BucketSpec.Server = strings.ReplaceAll(testBucket.BucketSpec.Server, "couchbase://", "couchbases://")
testBucket.BucketSpec.TLSSkipVerify = true // test env isn't always using valid certs
} else {
testBucket.BucketSpec.Server = strings.ReplaceAll(testBucket.BucketSpec.Server, "couchbases://", "couchbase://")
}

// Attempt to open the bucket again using invalid creds. We should expect an error.
bucket, err := GetBucket(TestCtx(t), testBucket.BucketSpec)
assert.Nil(t, bucket)
if fastFail {
// Fail-fast strategy surfaces the authentication failure directly.
assert.Equal(t, ErrAuthError, err)
} else {
// Best-effort strategy retries the auth failure until WaitUntilReady times out.
assert.ErrorIs(t, err, gocb.ErrUnambiguousTimeout)
}
})
}
}
}

Expand Down
21 changes: 12 additions & 9 deletions base/cluster_n1ql.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,15 +36,17 @@ type ClusterOnlyN1QLStore struct {
collectionName string // Used to build keyspace for query when not otherwise set
supportsCollections bool
supportsIfNotExistsInDDL bool // 7.1.0+ MB-38737
useGOCBFastFailRetry bool // When true, index lookups fail fast instead of using the best-effort retry strategy
}

func NewClusterOnlyN1QLStore(cluster *gocb.Cluster, bucketName, scopeName, collectionName string) (*ClusterOnlyN1QLStore, error) {
func NewClusterOnlyN1QLStore(cluster *gocb.Cluster, bucketName, scopeName, collectionName string, useGOCBFastFailRetry bool) (*ClusterOnlyN1QLStore, error) {

clusterOnlyn1qlStore := &ClusterOnlyN1QLStore{
cluster: cluster,
bucketName: bucketName,
scopeName: scopeName,
collectionName: collectionName,
cluster: cluster,
bucketName: bucketName,
scopeName: scopeName,
collectionName: collectionName,
useGOCBFastFailRetry: useGOCBFastFailRetry,
}

major, minor, err := GetClusterVersion(cluster)
Expand Down Expand Up @@ -209,10 +211,11 @@ func (cl *ClusterOnlyN1QLStore) runQuery(statement string, n1qlOptions *gocb.Que

func (cl *ClusterOnlyN1QLStore) indexManager(scopeName, collectionName string) *indexManager {
return &indexManager{
cluster: cl.cluster.QueryIndexes(),
bucketName: cl.bucketName,
scopeName: scopeName,
collectionName: collectionName,
cluster: cl.cluster.QueryIndexes(),
bucketName: cl.bucketName,
scopeName: scopeName,
collectionName: collectionName,
useGOCBFastFailRetry: cl.useGOCBFastFailRetry,
Copy link
Copy Markdown
Collaborator

@torcolvin torcolvin May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually don't need this, because the RetryStrategy is inherited from the cluster and/or Bucket, so we can remove the complexity of this code and only set this for the Cluster/Bucket which makes this simpler.

At this point of making index calls, we have already made a bucket connection and we don't care about surfacing an authenitcation error.

}
}

Expand Down
12 changes: 3 additions & 9 deletions base/collection.go
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ func GetGoCBv2Bucket(ctx context.Context, spec BucketSpec) (*GocbV2Bucket, error
err = cluster.WaitUntilReady(time.Second*30, &gocb.WaitUntilReadyOptions{
DesiredState: gocb.ClusterStateOnline,
ServiceTypes: []gocb.ServiceType{gocb.ServiceTypeManagement},
RetryStrategy: &goCBv2FailFastRetryStrategy{},
RetryStrategy: goCBRetryStrategy(spec.UseGOCBFastFailRetry),
})

if err != nil {
Expand All @@ -89,7 +89,7 @@ func GetGoCBv2Bucket(ctx context.Context, spec BucketSpec) (*GocbV2Bucket, error
return nil, err
}

return GetGocbV2BucketFromCluster(ctx, cluster, spec, connString, time.Second*30, true)
return GetGocbV2BucketFromCluster(ctx, cluster, spec, connString, time.Second*30, spec.UseGOCBFastFailRetry)

}

Expand All @@ -111,14 +111,8 @@ func GetGocbV2BucketFromCluster(ctx context.Context, cluster *gocb.Cluster, spec
// Connect to bucket
bucket := cluster.Bucket(spec.BucketName)

var retryStrategy gocb.RetryStrategy
if failFast {
retryStrategy = &goCBv2FailFastRetryStrategy{}
} else {
retryStrategy = gocb.NewBestEffortRetryStrategy(nil)
}
err := bucket.WaitUntilReady(waitUntilReady, &gocb.WaitUntilReadyOptions{
RetryStrategy: retryStrategy,
RetryStrategy: goCBRetryStrategy(failFast),
})
if err != nil {
_ = cluster.Close(&gocb.ClusterCloseOptions{})
Expand Down
7 changes: 4 additions & 3 deletions base/collection_n1ql.go
Original file line number Diff line number Diff line change
Expand Up @@ -64,9 +64,10 @@ func (c *Collection) BucketName() string {

func (c *Collection) indexManager() *indexManager {
m := &indexManager{
bucketName: c.BucketName(),
collectionName: c.CollectionName(),
scopeName: c.ScopeName(),
bucketName: c.BucketName(),
collectionName: c.CollectionName(),
scopeName: c.ScopeName(),
useGOCBFastFailRetry: c.Bucket.Spec.UseGOCBFastFailRetry,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, we should rely on RetryStrategy from the cluster/bucket.

}
if !c.IsSupported(sgbucket.BucketStoreFeatureCollections) {
m.cluster = c.Bucket.cluster.QueryIndexes()
Expand Down
13 changes: 7 additions & 6 deletions base/collection_n1ql_common.go
Original file line number Diff line number Diff line change
Expand Up @@ -101,16 +101,17 @@ func ExplainQuery(ctx context.Context, store N1QLStore, statement string, params
}

type indexManager struct {
cluster *gocb.QueryIndexManager
collection *gocb.CollectionQueryIndexManager
bucketName string
scopeName string
collectionName string
cluster *gocb.QueryIndexManager
collection *gocb.CollectionQueryIndexManager
bucketName string
scopeName string
collectionName string
useGOCBFastFailRetry bool
}

func (im *indexManager) GetAllIndexes() ([]gocb.QueryIndex, error) {
opts := &gocb.GetAllQueryIndexesOptions{
RetryStrategy: &goCBv2FailFastRetryStrategy{},
RetryStrategy: goCBRetryStrategy(im.useGOCBFastFailRetry),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably wrote this code, but we should rely on the cluster/bucket parameters and we can drop this line entirely, regardless of the state of this PR.

}

if im.collection != nil {
Expand Down
9 changes: 9 additions & 0 deletions base/gocb_utils.go
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,15 @@ func (rs *goCBv2FailFastRetryStrategy) RetryAfter(req gocb.RetryRequest, reason
return &gocb.NoRetryRetryAction{}
}

// goCBRetryStrategy returns the fail-fast retry strategy when useFailFast is true, otherwise the best-effort
// strategy that retries until the operation's timeout. Gated by the unsupported.use_gocb_fast_fail_retry config.
func goCBRetryStrategy(useFailFast bool) gocb.RetryStrategy {
if useFailFast {
return &goCBv2FailFastRetryStrategy{}
}
return gocb.NewBestEffortRetryStrategy(nil)
}

// GOCBCORE Utilities

// CertificateAuthenticator allows for certificate auth in gocbcore
Expand Down
6 changes: 6 additions & 0 deletions base/gocb_utils_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ package base
import (
"testing"

"github.com/couchbase/gocb/v2"
"github.com/stretchr/testify/require"

"github.com/stretchr/testify/assert"
Expand Down Expand Up @@ -95,3 +96,8 @@ func TestGoCBCoreAuthConfigInvalidPaths(t *testing.T) {
_, err := GoCBCoreAuthConfig("", "", "/non/existent/cert", "/non/existent/key")
assert.Error(t, err)
}

func TestGoCBRetryStrategy(t *testing.T) {
assert.IsType(t, &goCBv2FailFastRetryStrategy{}, goCBRetryStrategy(true))
assert.IsType(t, gocb.NewBestEffortRetryStrategy(nil), goCBRetryStrategy(false))
}
1 change: 1 addition & 0 deletions db/indextest/indextest_dual_metadata_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ func TestDualMetadataStoreIndexes(t *testing.T) {
gocbBucket.BucketName(),
store.ScopeName(),
store.CollectionName(),
gocbBucket.Spec.UseGOCBFastFailRetry,
)
require.NoError(t, err)
indexesMeta, err = base.GetSystemCollectionIndexesMeta(ctx, n1qlStore, base.SystemScope, base.SystemCollectionMobile, expectedIndexes)
Expand Down
2 changes: 1 addition & 1 deletion db/indextest/indextest_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -355,7 +355,7 @@ func TestInitializeIndexes(t *testing.T) {
gocbBucket, err := base.AsGocbV2Bucket(database.Bucket)
require.NoError(t, err)

n1qlStore, err := base.NewClusterOnlyN1QLStore(gocbBucket.GetCluster(), gocbBucket.BucketName(), collection.ScopeName, collection.Name)
n1qlStore, err := base.NewClusterOnlyN1QLStore(gocbBucket.GetCluster(), gocbBucket.BucketName(), collection.ScopeName, collection.Name, gocbBucket.Spec.UseGOCBFastFailRetry)
require.NoError(t, err)

// add and drop indexes that may be different from the way the bucket pool expects, so use specific options here for test
Expand Down
2 changes: 1 addition & 1 deletion db/indextest/util.go
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ func initializeCollectionIndexes(ctx context.Context, t *testing.T, testBucket b
gocbBucket, err := base.AsGocbV2Bucket(testBucket)
require.NoError(t, err)

n1qlStore, err := base.NewClusterOnlyN1QLStore(gocbBucket.GetCluster(), gocbBucket.BucketName(), dsName.ScopeName(), dsName.CollectionName())
n1qlStore, err := base.NewClusterOnlyN1QLStore(gocbBucket.GetCluster(), gocbBucket.BucketName(), dsName.ScopeName(), dsName.CollectionName(), gocbBucket.Spec.UseGOCBFastFailRetry)
require.NoError(t, err)

ctx = base.CollectionLogCtx(ctx, dsName.ScopeName(), dsName.CollectionName())
Expand Down
4 changes: 4 additions & 0 deletions docs/api/components/schemas.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2436,6 +2436,10 @@ Startup-config:
description: Store database configurations in system xattrs
type: boolean
default: false
use_gocb_fast_fail_retry:
description: When true, gocb cluster/bucket readiness checks and index lookups (on both the bootstrap and per-database connections) fail on the first error instead of retrying. When false, they use the best-effort retry strategy, retrying until their timeout when Couchbase Server is unavailable or failing over.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason you want to enable this is if you expect to have authentication errors. I think it's worthwhile to call this out.

When true, errors on initial connection to Couchbase Server will fail instantaneously. Enabling this will surface authentication errors quickly, but can cause some Sync Gateway operations to shut down databases with intermittent Couchbase Server connection errors.

Given this text, it seems low value to even expose this option and I can't imagine people really wanting to enable it. If we do have a flag, I'd be inclined to make this unsupported so we aren't committed to having this option for all eternity.

The reasons that this can be useful:

  1. If using persistent configuration with a single set of credentials, Sync Gateway will fail to start up in 1sec rather than 30sec. If you are using persistent configuration however, we actually would only care about this for the very initial bootstrap connection if you are not using custom bucket/database credentials. While custom credentials are supported, I have never seen them used in the wild.
  2. If you are not using persistent configuration, the credentials are at the database level, and basically you are protecting against typos in the configuration. In this case, the potential for failure would occur at startup only.

It is possible that the RBAC user would be able to make a cluster connection but then not have bucket permissions, so it is possible to fail under each bucket.

type: boolean
default: false
readOnly: true
couchbase_keepalive_interval:
description: TCP keep-alive interval between SG and Couchbase server. This is unused.
Expand Down
Loading
Loading