-
Notifications
You must be signed in to change notification settings - Fork 494
Add tracing spans for input and output gates for hold and wait #5666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
e7436b0 to
71c15e3
Compare
71c15e3 to
79c3cb4
Compare
This comment was marked as outdated.
This comment was marked as outdated.
justin-mp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is trending in the right direction.
Where do we capture the spans for SQLite operations?
| // before the gate is unlocked. | ||
| Lock addRef() { | ||
| return Lock(*gate); | ||
| return Lock(*gate, nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know how often this is called? It would be nice to preserve the span if we can.
| auto& context = IoContext::current(); | ||
| auto userSpan = context.makeUserTraceSpan("durable_object_storage_sync"_kjc); | ||
| KJ_IF_SOME(p, cache->onNoPendingFlush()) { | ||
| KJ_IF_SOME(p, cache->onNoPendingFlush(context.getCurrentTraceSpan())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this better to do than context.makeTraceSpan like we do in DurableObjectStorage::getCurrentBookmark()?
| void shutdown(kj::Maybe<const kj::Exception&> maybeException) override; | ||
| kj::OneOf<CancelAlarmHandler, RunAlarmHandler> armAlarmHandler( | ||
| kj::Date scheduledTime, bool noCache = false, kj::StringPtr actorId = "") override; | ||
| kj::Date scheduledTime, bool noCache, kj::StringPtr actorId, SpanParent parentSpan) override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can keep the optional parameters by putting the parentSpan before them. Unless we always provide the optional parameters, it's probably easier to keep the previous interface.
|
|
||
| // Trace span for the current commit operation. Captured from the first write | ||
| // that triggers a commit, used for the output gate lock hold trace. | ||
| SpanParent currentCommitSpan = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the first write is allowUnconfirmed? At that point, we don't actually lock the output gate.
| // Reset the commit span after the commit completes | ||
| auto resetSpan = kj::defer([this]() { | ||
| currentCommitSpan = nullptr; | ||
| hasCommitSpan = false; | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New commits can start before the previous commit has finished. The right thing to do is to kj::mv the currentCommitSpan into the commitImpl and reset it it immediately.
| // Capture trace span from the alarm handler for the commit batch. | ||
| if (!hasCommitSpan) { | ||
| currentCommitSpan = kj::mv(deferredAlarmSpan); | ||
| hasCommitSpan = true; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand why we're capturing the span here?
| // Capture trace span from the first write in this commit batch. | ||
| if (!hasCommitSpan) { | ||
| currentCommitSpan = kj::mv(traceSpan); | ||
| hasCommitSpan = true; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a better pattern would be to capture the currentCommitSpan on every write and then use whatever is the current capture when you actually do lockWhile. That way if you have a allowUnconfirmed write, you won't capture that one, which won't actually wait for the output gate. (As an optimization, you could stop overwriting the currentCommitSpan once you send it to the lockWhile.)
Also, this really should be a method that we call rather than duplicating it in every method.
| // Trace span for the deferred alarm deletion, captured from armAlarmHandler and used when | ||
| // the alarm is actually deleted. | ||
| SpanParent deferredAlarmSpan = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this one separate from the currentCommitSpan?
| // Capture the first span for use at commit time | ||
| if (!hasCommitSpan) { | ||
| commitSpan = kj::mv(traceSpan); | ||
| hasCommitSpan = true; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment in the sqlite file. I think the same thing applies here, at least when it comes to upgrading the output gate.
These spans will be useful during latency investigations for Durable Objects
Previous context - #5631