identity.CacheDirectory should not cache context.Canceled / context.DeadlineExceeded errors#1345
Open
arushibandi wants to merge 4241 commits into
Open
identity.CacheDirectory should not cache context.Canceled / context.DeadlineExceeded errors#1345arushibandi wants to merge 4241 commits into
arushibandi wants to merge 4241 commits into
Conversation
Pulls in fixed MST key height examples, and adds a simple test to confirm heights. See: bluesky-social/atproto-interop-tests#11
Thought this was already getting checked! Caught by @matthieusieben over at: bluesky-social/atproto-interop-tests#8
This fixes a deadlock in the outbox that was causing Tap to permanently stall for a single DID when multiple live events for that DID would end up in the outbox. After processing, we check pending events before waiting on the `notifChan` to see if there are more events that we should immediately kick into processing. Also, in `processPendingEvents`, we could fail to set `blockedInLive` if there were already in-flight historical events for the DID. Resolves bluesky-social#1331
Test suite for Tap. This isn't fully-featured. Specifically, it doesn't do full integration testing for event/repo processing. So doesn't test most sync 1.1 semantics. However it does test: - outbox, delivery, and event ordering - the basics of the firehose state machine - the resync system - claiming resync jobs - draining the resync buffer Snagged a couple of bug fixes along the way as well.
…l#1330) This should: - Fix a potential `blockedOnLive` stall for individual DIDs when the wakeup signal has already been consumed. - Prevent historical events from bypassing pending live events via the fast path.
This fixes the issue that was getting repos into a "desynchronized" state. I was accidentally using `r.Commit`, not realizing that it created a new commit, not return the existing commit details. This would cause us to store & compare incorrect, locally-generated revs rather than the rev from the commit message and subsequently end up skipping valid commits, thinking they were an old rev. Resolves bluesky-social#1315
Need to kick off the CI to test this, I *think* this is all that's needed
@jazware gave the all-clear on this. This was used to do some tuning for the original postgres bsky appview, and this is no longer relevant.
There is Go code in the new did-method-plc/go-didplc package which supports PLC generation. For now I don't want to create a dependency loop between the two packages though, so just removing this functionality here.
…l#1342) These have been deprecated for a while. The same functionality is covered by: - `atproto/identity`: DID resolution and parsing - `atproto/atcrypto`: cryptographic keys - https://github.com/did-method-plc/did-method-plc: creating and verifying DID PLC operations The one piece of dropped functionality here was the ability for `gosky` to generate PLC operations and submit them to the directory. Currently the `go-didplc` package depends on `indigo` (for crypto and syntax packages), and I don't want to create a dependency loop (even if it isn't a package-level dependency loop). Also the `goat plc ...` commands support this functionality (using the `go-didplc` package). This is part of general cleanups.
arushibandi
commented
Feb 25, 2026
| } | ||
| return nil, false, fmt.Errorf("identity not found in cache after coalesce returned") | ||
| case <-ctx.Done(): | ||
| return nil, false, ctx.Err() |
Author
There was a problem hiding this comment.
this should not be cached right?
arushibandi
added a commit
to habitat-network/habitat
that referenced
this pull request
Feb 26, 2026
… on shutdown. (#146) Context errors are cached. We see this specifically during an OAuth flow, where the browser re-directs a number of times, and then on the redirect back to habitat, on page load, we fail to load because a context cancelled error is cached. This type of error should not be cached. I opened a PR in my fork of indigo to demonstrate the issue: bluesky-social/indigo#1345. Seems like these should never be cached, but I'm also wondering if we're doing something wrong in our flow that is causing this. This is a temporary workaround (I see this like 30 times a day, on each login, which I'm doing frequently for testing reasons); either we should stop this from happening or merge the fix in the upstream repo (maybe both?) Co-authored-by: Arushi Bandi <arushibandi@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I'm using the
identity.DefaultDirectory()and noticed that I was seeing failed lookups due tocontext cancelled. This also seemed to coincide with logins/oauth redirect flows or refreshed. Restarting my process fixed this. After digging in, it seems that this is because the CacheDirectory caches all errors, not filtering for context cancelled / deadline exceeded.My workaround for this is always pass in
context.Background()into thedir.Lookup()function, but this behavior seems unintended.The tests fail before the fix and pass after.
closes #1346