-
Notifications
You must be signed in to change notification settings - Fork 19
🐛 Resync issue #181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
🐛 Resync issue #181
Conversation
Signed-off-by: clyang82 <[email protected]>
Signed-off-by: clyang82 <[email protected]>
WalkthroughModified the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: clyang82 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
pkg/cloudevents/generic/clients/baseclient.go (1)
260-267: Excellent fix for the deadlock issue.The non-blocking send correctly prevents the deadlock described in the PR objectives. When the subscription goroutine is blocked in
transport.Subscribe(), it cannot read fromreceiverChan, so a blocking send would deadlock forever.However, for consistency with
sendReconnectedSignal(lines 289-293) and to aid debugging, consider logging when a signal is dropped.🔎 Suggested enhancement: Log dropped signals
func (c *baseClient) sendReceiverSignal(signal int) { c.RLock() defer c.RUnlock() if c.receiverChan != nil { select { case c.receiverChan <- signal: // Signal sent successfully default: - // Receiver is busy/blocked, can't send now - // This prevents deadlock when receiver is stuck in Subscribe() + // Receiver is busy/blocked, can't send now + // This prevents deadlock when receiver is stuck in Subscribe() + klog.V(2).Info("receiver signal not sent, receiver is busy", "signal", signal) } } }Note: You'll need to add
ctx context.Contextas a parameter tosendReceiverSignalto useklog.FromContext(ctx)(similar tosendReconnectedSignal), or useklogdirectly as shown above.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
pkg/cloudevents/generic/clients/baseclient.go
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-16T02:22:20.929Z
Learnt from: skeeey
Repo: open-cluster-management-io/sdk-go PR: 144
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:200-213
Timestamp: 2025-09-16T02:22:20.929Z
Learning: In the GRPC CloudEvents protocol implementation, when startEventsReceiver encounters a stream error, it sends the error to reconnectErrorChan. The consumer of this channel handles the error by calling Close() on the protocol, which triggers close(p.closeChan), causing OpenInbound to unblock and call cancel() to properly terminate both the events receiver and heartbeat watcher goroutines.
Applied to files:
pkg/cloudevents/generic/clients/baseclient.go
📚 Learning: 2025-09-01T03:34:05.141Z
Learnt from: morvencao
Repo: open-cluster-management-io/sdk-go PR: 138
File: pkg/cloudevents/server/grpc/metrics/metrics.go:231-254
Timestamp: 2025-09-01T03:34:05.141Z
Learning: In open-cluster-management.io/sdk-go gRPC CloudEvents metrics, processing duration metrics should only be recorded for unary RPCs, not stream RPCs. Stream RPCs can be long-lived connections that persist as long as the gRPC server runs, making duration metrics confusing and less useful for operators debugging issues.
Applied to files:
pkg/cloudevents/generic/clients/baseclient.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: unit
- GitHub Check: integration
- GitHub Check: verify
morvencao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
/assign @qiujian16 |
Summary
When the message broker is gRPC broker, the maestro agent cannot reconnect to the server after the server is restarted.
Here is analysis based on the logs:
- Error handler is blocked in sendReceiverSignal waiting for subscription goroutine to read
- Subscription goroutine is blocked in transport.Subscribe() at line 185, can't read from receiverChan
logs:
Related issue(s)
Fixes #
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.