-
Notifications
You must be signed in to change notification settings - Fork 15
Goroutine Leak in RateLimitMiddleware #147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Goroutine Leak in RateLimitMiddleware #147
Conversation
|
@Ahmedhossamdev could you please review the PR and let me know if the changes are valid or if anything else is required? |
|
@Bhup-GitHUB Could you please request review from @aaronbrethorst |
|
@aaronbrethorst could you please review the PR and let me know if the changes are valid or if anything else is required? |
aaronbrethorst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tackling this goroutine leak issue! The core approach you've chosen—using a stopChan with sync.Once for idempotent shutdown—is exactly the right pattern for graceful goroutine termination in Go. The API change to return *RateLimitMiddleware instead of just the handler function is also a good design decision that enables proper lifecycle management. Nice work on getting the fundamentals right!
I found a few issues we'll need to address before merging.
Issues to Fix
1. Shutdown Order Needs to be Reversed (Critical)
In cmd/api/app.go, the shutdown order has the GTFS manager shutting down before the API:
// Shutdown GTFS manager
if gtfsManager != nil {
gtfsManager.Shutdown()
}
if api != nil {
api.Shutdown()
}What needs to change:
The API rate limiter should be shut down before the GTFS manager. The correct shutdown order follows a "last in, first out" pattern:
- HTTP Server (already correct - stops accepting new connections)
- API/Rate Limiter (stops background goroutines for request handling)
- GTFS Manager (stops data fetching - the lowest-level dependency)
// Shutdown API rate limiter first
if api != nil {
api.Shutdown()
}
// Then shutdown GTFS manager
if gtfsManager != nil {
gtfsManager.Shutdown()
}This ensures we stop processing requests before we stop the data sources those requests depend on.
2. Tests Don't Verify the Goroutine Actually Exits (Important)
In internal/restapi/rate_limit_shutdown_test.go, the current tests verify that Stop() returns quickly, but they don't actually prove the goroutine terminates. This is important because the whole point of this PR is to fix a goroutine leak!
What needs to change:
Add a test that uses runtime.NumGoroutine() to verify the cleanup goroutine actually exits:
func TestRateLimitMiddleware_GoroutineActuallyExits(t *testing.T) {
// Get baseline goroutine count
initial := runtime.NumGoroutine()
middleware := NewRateLimitMiddleware(10, time.Second)
time.Sleep(10 * time.Millisecond) // Give goroutine time to start
afterCreate := runtime.NumGoroutine()
assert.Greater(t, afterCreate, initial, "cleanup goroutine should have started")
middleware.Stop()
time.Sleep(10 * time.Millisecond) // Give goroutine time to exit
afterStop := runtime.NumGoroutine()
assert.LessOrEqual(t, afterStop, initial, "cleanup goroutine should have exited")
}This test directly proves the fix works—something we can point to and say "this demonstrates the leak is fixed."
3. Test Goroutine Leaks from Other Tests Using createTestApi (Important)
While you correctly added defer middleware.Stop() to the rate limiter tests, other tests that use createTestApi() don't call api.Shutdown(). This means those tests leak goroutines during the test run.
What needs to change:
For consistency, tests using createTestApi should clean up:
func TestSomeHandler(t *testing.T) {
api := createTestApi(t)
defer api.Shutdown() // Add this line
// ... rest of test
}This isn't a production issue, but it keeps our test suite clean and consistent with the new lifecycle management pattern you've introduced.
Suggestions (Non-blocking)
-
Documentation: Consider adding a brief comment to
Stop()explaining that it doesn't affect in-flight requests—it only stops the background cleanup goroutine. -
Cleaner test structure: In
TestRateLimitMiddleware_Shutdown, you have bothdefer middleware.Stop()and then callStop()again in the goroutine. While this works due tosync.Once, it's a bit confusing. Consider removing the defer and just testing the explicit call.
Once these issues are addressed, this PR is ready to merge! The fundamental fix is solid—we just need to tighten up the shutdown ordering and add a test that definitively proves the goroutine leak is fixed.
|
Hey @aaronbrethorst ! Just wanted to give you a quick update on the changes I made based on your feedback.
implemented both optional suggestions:
2.Cleaned up TestRateLimitMiddleware_Shutdown by removing the redundant defer. All shutdown-related tests are passing now. I just wanted to double-check that using runtime.GC() in the goroutine exit test is okay with you — it seemed like the most reliable way to keep the test stable alongside others. Totally open to a different approach if you’d prefer one. Thanks again for the detailed review,really appreciated it! |
Fix Rate Limiter Cleanup Goroutine Leak
Problem #144
The
RateLimitMiddlewarestarts a background cleanup goroutine inNewRateLimitMiddleware()that never stops during graceful shutdown, causing a goroutine leak. This leads to:Root Cause
NewRateLimitMiddleware()returned onlyfunc(http.Handler) http.Handler, discarding the middleware instanceRestAPIstruct only stored the handler function, not the middleware referenceStop()method existed but was never called during shutdownStop()were called, the cleanup goroutine would still block indefinitely becauseticker.Stop()doesn't close the channel, and thefor rangeloop would continue waitingSolution
This fix ensures the cleanup goroutine properly exits during graceful shutdown by:
NewRateLimitMiddleware()to return*RateLimitMiddlewareinstead of just the handler functionstopChan chan struct{}to signal the goroutine to exitsync.Onceto ensureStop()can be safely called multiple timescleanup()to use aselectstatement that listens for both ticker events and stop signalsapi.Shutdown()which stops the rate limiterChanges Made
Core Changes
internal/restapi/rate_limit_middleware.go:stopChan chan struct{}andstopOnce sync.OncefieldsNewRateLimitMiddleware()return type to*RateLimitMiddlewareHandler()method to return the middleware handler functioncleanup()to useselectwith stop channelStop()to close stop channel usingsync.Oncefor idempotencyinternal/restapi/rest_api.go:rateLimiterfield type fromfunc(http.Handler) http.Handlerto*RateLimitMiddlewareShutdown()method to stop the rate limiterinternal/restapi/routes.go:api.rateLimiter.Handler()instead of calling the function directlycmd/api/app.go:CreateServer()to return both*http.Serverand*restapi.RestAPIRun()to accept*restapi.RestAPIparameterapi.Shutdown()call during graceful shutdowncmd/api/main.go:Run()Test Changes
internal/restapi/rate_limit_shutdown_test.go(NEW):internal/restapi/rate_limit_middleware_test.go:.Handler()methoddefer middleware.Stop()to prevent test-time goroutine leakscmd/api/app_test.go:CreateServer()calls to handle new return valuedefer api.Shutdown()to prevent test-time goroutine leaksTesting
All existing tests pass, and new tests verify:
Stop()calls are safe (idempotent)Test Commands
Run these commands to verify the fix:
Impact
Verification
The fix has been verified to:
gtfsManager.Shutdown()