Skip to content

CBG-5184 use cbgt one shot mode#8266

Open
torcolvin wants to merge 10 commits into
mainfrom
CBG-5184
Open

CBG-5184 use cbgt one shot mode#8266
torcolvin wants to merge 10 commits into
mainfrom
CBG-5184

Conversation

@torcolvin
Copy link
Copy Markdown
Collaborator

@torcolvin torcolvin commented May 12, 2026

CBG-5184 use cbgt one shot mode

This does not yet work in multi node mode (expected) and all tests are not enabled. I'm working on why some of the tests fail or flake. I do not believe there will be major changes from the failing tests, so I am putting this up for review as is. Most tests do pass in distributed resync mode (where distributed resync is only run on a single node).

  • Register high sequence numbers endSeqNos with cbgt via sourceParams. This marks when cbgt will stop processing documents.
  • when specifying a high sequence number to kv, it will always stream sequences to the end of the snapshot, which might be higher than the specified endSeqNo. Avoid checkpointing these sequence numbers in DCPCommon.updateSeq to avoid a ERANGE error when re-opening the feed. https://github.com/couchbase/cbgt/blob/6e34ff79a0e6e97d5fb5b548c217a8a9d279a94b/feed_dcp_gocbcore.go#L1034
  • Add ForceCheckpointWrite when the feed shuts down to force the checkpoints to be written. This is really only helpful for tests, but DCPCommon.setMetaData will only be called on a 1 minute interval, so the last checkpoints are not set. For a test that makes sure that some documents are processed, and then resumed from that point, this test fails without this call.
  • cbgt.EventHandler.OnUnregisterFeed is now used to discover when vBuckets are completed. This is tracked locally (for now, see TODO) and used to close dcpDoneChan to indicate when the DCP operations should finish. There are N feeds for N partitions used by cbgt. This function is called whenever a feed (set of vBuckets for a partition) completes. This is true on reaching the endSeqNos, or when unregistering a feed (normal shutdown, such as /db/resync?action=stop)

Small fixes:

  • Created an options struct for DCPDestOptions to avoid having a another nil-able optino.
  • cbgt changed a type from string to ConsistencyLevel, this is type alias.
  • Allow dbExpVarsStat to be nil, especially in tests. I do not have a place for this stat yet in distributed resync.

Pre-review checklist

  • Removed debug logging (fmt.Print, log.Print, ...)
  • Logging sensitive data? Make sure it's tagged (e.g. base.UD(docID), base.MD(dbName))
  • Updated relevant information in the API specifications (such as endpoint descriptions, schemas, ...) in docs/api

Integration Tests

Copilot AI review requested due to automatic review settings May 12, 2026 21:01
@torcolvin torcolvin marked this pull request as draft May 12, 2026 21:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This WIP/demo PR experiments with running Sync Gateway’s cbgt-backed DCP in “one-shot” mode by providing end sequence numbers and wiring cbgt feed shutdown events back into the resync background process so it can terminate when all vbuckets are complete.

Changes:

  • Switches cbgt to a fork via go.mod replace and updates go.sum accordingly.
  • Refactors DCP destination construction to use a DCPDestOptions struct and threads one-shot end-seq handling into cbgt feed params and DCP dest/common logic.
  • Adds an OnUnregisterFeed callback path in sharded DCP manager event handlers and tracks completion in resync; updates resync tests/logging.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
go.sum Removes upstream cbgt sums and adds sums for the forked cbgt module.
go.mod Adds a replace directive to point github.com/couchbase/cbgt at github.com/torcolvin/cbgt.
db/import_listener.go Updates DCP dest creation to use base.DCPDestOptions.
db/background_mgr_resync_dcp.go Adds sharded one-shot end-seq support, feed-unregister callback completion, and new run-state fields.
db/background_mgr_resync_dcp_test.go Adjusts resync test behavior and enables debug logging (with leftover commented code).
base/util.go Introduces a generic MutexMap helper used for completion tracking.
base/dcp_sharded.go Extends sharded DCP options and cbgt manager event handlers to support unregister-feed callbacks and end-seq feed params.
base/dcp_feed_type.go Extends cbgt feed params to include stop-after settings and adds GetHighSeqNos.
base/dcp_dest.go Introduces DCPDestOptions and updates dest creation/signatures accordingly.
base/dcp_common.go Threads one-shot end-seq handling into DCP common processing.

Comment thread base/dcp_common.go Outdated
Comment thread base/dcp_dest.go
Comment thread base/dcp_feed_type.go Outdated
Comment thread base/dcp_sharded.go
Comment thread db/background_mgr_resync_dcp.go Outdated
Comment on lines 35 to 37
// TODO distributed resync: this value has to be serialized and evaluated in a distributed context.
completedVBs base.MutexMap[uint16, struct{}]
useXattrs bool
Comment thread db/background_mgr_resync_dcp.go Outdated
Comment thread db/background_mgr_resync_dcp_test.go Outdated
if distributed && base.UnitTestUrlIsWalrus() {
t.Skip("Distribute resync not supported for rosmar")
}
//base.SetUpTestLogging(t, base.LevelDebug, base.KeyDCP, base.KeyCRUD)
Comment thread go.mod Outdated
Comment thread base/dcp_common.go Outdated
@torcolvin torcolvin self-assigned this May 21, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.

Comment thread db/background_mgr_resync_dcp.go Outdated
Comment thread db/background_mgr_resync_dcp.go Outdated
Comment thread base/dcp_common.go Outdated
Comment thread base/dcp_feed_type.go
Comment thread base/dcp_sharded.go
@torcolvin torcolvin changed the title WIP: CBG-5184 use cbgt one shot mode CBG-5184 use cbgt one shot mode May 27, 2026
@torcolvin torcolvin marked this pull request as ready for review May 27, 2026 15:46
@torcolvin torcolvin requested a review from adamcfraser May 27, 2026 15:46
@torcolvin torcolvin assigned adamcfraser and unassigned torcolvin May 27, 2026
Copy link
Copy Markdown
Collaborator

@adamcfraser adamcfraser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks fine, a few comments to look at.

Comment thread base/dcp_common.go Outdated
defer c.m.Unlock()

// Check the expected maximum sequence number when running a one shot feed. Do not checkpoint if the incoming
// sequence is greater than the expected maximum sequence number.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this check earlier (in dataUpdate) and also avoid callback processing for any sequences higher than endSeqNo?

Comment thread base/dcp_dest.go
type SGDest interface {
cbgt.Dest
cbgt.DestEx
ForceCheckpointWrite()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, PR doesn't currently have any callers of ForceCheckpointWrite. Review whether that's intentional.

Comment thread db/background_mgr_resync_dcp.go Outdated
// doneChan when all vBuckets have completed, which will allow the resync process to finish.
func (r *ResyncManagerDCP) getUnregisterFeedFunc(ctx context.Context, totalVBuckets uint16) base.CbgtUnregisterFeedCallback {
return func(feed cbgt.Feed) {
for _, d := range feed.Dests() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, don't think we have a use case where our feed will have multiple Dests, but to be defensive can make this code just look for at least one SGDest to use for the ForceCheckpoint call (and exit the loop when it finds it)

Copy link
Copy Markdown
Collaborator

@adamcfraser adamcfraser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good - one minor question on handling the error case in shouldProcessSequence

Comment thread base/dcp_common.go Outdated
// DCP will provide mutations that run to the end of the snapshot that contains the end sequence number.
endSeq, ok := c.endSeqNos[vBucketID]
if !ok {
AssertfCtx(c.loggingCtx, "Received DCP event for vbno %d which is not tracked by the expected endSeqNos %#+v. This means that endSeqNos was specified with the incorrect number of vBuckets", vBucketID, c.endSeqNos)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case where the endSeq isn't found, it looks like the code will skip the mutation (endSeq will be zero for line 237). Would it be more defensive to return true in that scenario?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants