Skip to content

Fix panic when sending error to closed channel#1338

Open
zhengxiexie wants to merge 1 commit intovmware-tanzu:mainfrom
zhengxiexie:topic/zhengxie/main/clean_token
Open

Fix panic when sending error to closed channel#1338
zhengxiexie wants to merge 1 commit intovmware-tanzu:mainfrom
zhengxiexie:topic/zhengxie/main/clean_token

Conversation

@zhengxiexie
Copy link
Contributor

@zhengxiexie zhengxiexie commented Jan 13, 2026

✨ What's Changed

Safe Channel Error Handling

  • Add safeSendError helper function to prevent panics when sending errors to a potentially closed channel
  • Update PopulateResourcetoStore to use safe error sending with fallback logging
  • Ensure graceful handling when fatalErrors channel is already closed

Implementation Details

  • Added safeSendError function in pkg/nsx/services/common/store.go
  • Uses defer/recover pattern to catch panic from sending on closed channel
  • Falls back to logging the error when channel send fails
  • Early return after error handling to prevent further processing

🎯 Motivation

SR Background: Production environment reported panic during store initialization:

panic: send on closed channel

goroutine 3191563 [running]:
runtime.chansend(0xc0056bc770, 0xc005f27dd0, 0x1, 0x0?)
        /build/mts/release/bora-24933634/.../runtime/chan.go:217 +0x55e
github.com/vmware-tanzu/nsx-operator/pkg/nsx/services/common.(*Service).PopulateResourcetoStore(...)

Issue: When PopulateResourcetoStore encounters an error, it sends to the fatalErrors channel. If the channel receiver has already closed the channel (e.g., due to timeout or another goroutine's fatal error), this causes a panic that crashes the operator.

Solution: Wrap channel send in a safe helper that recovers from panics and logs the error as a fallback, ensuring the operator remains stable even when concurrent error handling races occur.

✅ Testing

  • Existing unit tests pass
  • The change is defensive and handles edge cases during concurrent operations

🔄 Backward Compatibility

This change is fully backward compatible - it only adds safety handling without changing existing behavior when the channel is open.

Add safeSendError helper function to handle the case where fatalErrors
channel may be closed when PopulateResourcetoStore tries to send errors.
This prevents potential panics during store initialization.

Change-Id: Ia5cdcba37b8080b65141584ab5393b2d2913a1c6
@zhengxiexie zhengxiexie force-pushed the topic/zhengxie/main/clean_token branch from 39124aa to 8924a31 Compare January 13, 2026 08:34
@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 55.55556% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.36%. Comparing base (d4f9e3e) to head (8924a31).

Files with missing lines Patch % Lines
pkg/nsx/services/common/store.go 55.55% 2 Missing and 2 partials ⚠️

❌ Your patch status has failed because the patch coverage (55.55%) is below the target coverage (70.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1338      +/-   ##
==========================================
- Coverage   76.36%   76.36%   -0.01%     
==========================================
  Files         149      149              
  Lines       20679    20687       +8     
==========================================
+ Hits        15791    15797       +6     
- Misses       3757     3758       +1     
- Partials     1131     1132       +1     
Flag Coverage Δ
unit-tests 76.36% <55.55%> (-0.01%) ⬇️
Files with missing lines Coverage Δ
pkg/nsx/services/common/store.go 75.63% <55.55%> (-1.09%) ⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants