Skip to content

[draft] cmd/tap: use internal/group for managing the various goroutine components of a tap instance#1295

Draft
davecheney wants to merge 4114 commits into
mainfrom
dfc/tap-group
Draft

[draft] cmd/tap: use internal/group for managing the various goroutine components of a tap instance#1295
davecheney wants to merge 4114 commits into
mainfrom
dfc/tap-group

Conversation

@davecheney

Copy link
Copy Markdown
Contributor

This is a worked example of switching from using go to a group like structure to manage the lifecycle of the various workers inside a tap instance.

Because of the way some components, kick off their own child goroutines, this required stacking groups -- not something I'd done before -- but worked surprisingly well. I think the size of the change could be reduced by a preparatory PR which moves more of the worker creation into tap.Run.

bnewbold and others added 30 commits November 11, 2025 21:55
This is at a point where the new lexgen code roughly reproduces the
current ("legacy") codegen behavior. This means it can replace the
previous `lex/*.go` implementation. This version is somewhat more
complete and correct about the lexicon language in general, not just the
features we have used in the `com.atproto.*` and `app.bsky.*` lexicons
to date.

Things I might try to hit before merging:

- [ ] add some serialization tests (`./testing/`): records with
strongRef, recordWithMedia, unions, etc
- [x] resolve some more XXX and TODO

For later follow-up:

- [ ] API server handlers (for query and procedure), and ability to
disable client method output
- [ ] alternative CBOR library support (eg, annotations for `go-dasl`)
- [ ] ability to configure more package mappings (not just the indigo
pages)
- [ ] string, boolean, and integer query params as pointers
(configurable?)
- [ ] more correctness around `$type` serialization
- [ ] validation methods (for limits, etc)
- [ ] option to break out large query param lists as a struct
This reverts commit 5b368e3, reversing
changes made to 7ad48e1.
This branch wasn't quite ready yet. In particular, issues came up with
subscribeRepos, which is pretty load-bearing for the Go codebase.
Some big repos take more than 30 seconds to fetch as I have learned the
hard way.
This is an effort to get some of the lexgen refactor merged as a
package.

I want to start using this code from `glot`, but am not ready to run and
merge updates to the existing / "legacy" code in `api/`.
bnewbold and others added 26 commits January 19, 2026 21:15
As indicated in the comment, adding an actual foreign key relationship
would probably help with this. But doing it as a query tweak is a
simpler change to deploy (no schema change).

I have tested this in prod and it helped a bunch with perf. I have not
tested even locally with an actual call to listRepos, should do that
before merging+deploying.

closes: #1236
Re-adds `recId` similar to #1263, but as string (see #1271 for context)
This was a clever struct wrapped, but that was weird and inconsistent
with all the other types.

This updates it to be a simple string alias type.

This is (slightly) breaking, and will require updates in downstream
packages. Particularly those doing OAuth.
Fix `PasswordAuth#Refresh` using the wrong method.
This check was overly aggressive. There are a lot of situations where an
account needs to authenticate in "deactivated" state.

I'm not even sure if a log line is necessary here, but figure this is a
change in SDK behavior so it should at least be a bit vocal.

This also fixes a bug where a pointer to the status string was printed,
instead of the status string itself.
The majority of deliverEvent was managing the case where a did worker
was active. As the reference to these workers are already maintained in
an xsync.Map, we can outsource the paperwork of looking up or creating
the worker to Map's LoadOrCompute method.

LoadOrCompute does hold a write lock over the Map during the compute
function, which is a potenial bottleneck, althought the lock time is
O(1) for the allocation of the &DIDWorker so I am hoping for now.
Fixes a few issues in Tap. The most important is that we were not
updating prev data & rev when we didn't need to process any ops from a
commit. This would cause commits to appear as disjunctions in commit
history.

This also rearranges some logic in `ProcessCommit` which would cause us
to mark a repo "desynced" if an event came in while resyncing rather
than adding that event to the resync buffer.

For events that we add to the resync buffer, we hold off checking that
they play neatly on the current state of the repo (through prevData)
until we actually go to play them back.
Settin modules-download-mode: readonly causes the linter to complain,
rather than silently letting the tooling fix things up behind the
scenes. This is a good belt and suspenders check to ensure whats in the
commit is what is built at that commit.
Rather than mentioning the Go version in several places,
actions/setup-go can source the correct version from the go.mod file,
removing one place where they can get out of sync.

I wasn't sure if the go-version-file syntax is available with setup-go
v4, so I went ahead and bumped up the checkout and setup actions to
their latest versions. I use these versions on several projects as they
are what dependabot will upgrade you too by default, so feel pretty
confident this is a safe change.
Prev data can be nil in the case of legacy (non sync1.1) commits,
causing a nil pointer error.
…1286)

Rather than mentioning the Go version in several places,
actions/setup-go can source the correct version from the go.mod file,
removing one place where they can get out of sync.

I wasn't sure if the go-version-file syntax is available with setup-go
v4, so I went ahead and bumped up the checkout and setup actions to
their latest versions. I use these versions on several projects as they
are what dependabot will upgrade you too by default, so feel pretty
confident this is a safe change.
Setting modules-download-mode: readonly causes the linter to complain,
rather than silently letting the tooling fix things up behind the
scenes. This is a good belt and suspenders check to ensure whats in the
commit is what is built at that commit.
Rather than making runTap responsible for wrapping the context.Context
passsed to it in a signal handler, push that responsibility to the
caller as SIGTERM'ness is really a property of the process wrapped
around tap, rather than runTap itself managing its own lifecycle. The
logic is now runTap is pass a parent context, when that context is done,
everyone shuts down. The why it was cancelled is not something runTap
has to manage.
The majority of `deliverEvent` was managing the case where a did worker
was active. As the reference to these workers are already maintained in
an xsync.Map, we can outsource the paperwork of looking up or creating
the worker to Map's LoadOrCompute method.

LoadOrCompute does hold a write lock over the Map during the compute
function, which is a potential bottleneck, althought the lock time is
O(1) for the allocation of the &DIDWorker so I am hoping this is ok for
now.
Rather than making runTap responsible for wrapping the context.Context
passsed to it in a signal handler, push that responsibility to the
caller as SIGTERM'ness is really a property of the process wrapped
around tap, rather than runTap itself managing its own lifecycle. The
logic is now runTap is pass a parent context, when that context is done,
everyone shuts down. The why it was cancelled is not something runTap
has to manage.
runResyncWorker takes a context for lifecycle management, however if
that context is cancelled -- err != nil -- then the rsync worker will
sleep for a second, hit WaitForReady, which returns immediately because
the context is cancelled, then claimResyncJob will also return
immediately with context.Cancelled, err != nil, so the worker sleeps for
a second and tries again, etc.

Fix the loop by explicitly checking for the context.Cancelled case and
exiting the worker.

No test yet, mia culpa.
Naming is hard, group.G, group.Group, both aren't great. I also
considered work.Group, but none really light my fire. As a saving grace,
this is internal to indigo so we can iterate on the name later.

@jcalabro jcalabro left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like! Can we just import github.com/pkg/group rather than copy+pasting though?

@davecheney davecheney changed the title [draft] cmd/tap: use internal/group for managing the various coroutine components of a tap instance [draft] cmd/tap: use internal/group for managing the various goroutine components of a tap instance Jan 27, 2026
@davecheney

Copy link
Copy Markdown
Contributor Author

I like! Can we just import github.com/pkg/group rather than copy+pasting though?

This was just sample code I wrote for a talk, I'd rather not adopt another open source project, so for the moment, copying it into an internal package means nobody has to commit to is as something others can use directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.