Skip to content

Conversation

@Bhup-GitHUB
Copy link
Contributor

Fix Rate Limiter Cleanup Goroutine Leak

Problem #144

The RateLimitMiddleware starts a background cleanup goroutine in NewRateLimitMiddleware() that never stops during graceful shutdown, causing a goroutine leak. This leads to:

  • Goroutine leaks on every server shutdown
  • Cleanup ticker continues running after shutdown
  • Incomplete graceful shutdown
  • Unnecessary resource usage

Root Cause

  1. NewRateLimitMiddleware() returned only func(http.Handler) http.Handler, discarding the middleware instance
  2. RestAPI struct only stored the handler function, not the middleware reference
  3. Stop() method existed but was never called during shutdown
  4. Even if Stop() were called, the cleanup goroutine would still block indefinitely because ticker.Stop() doesn't close the channel, and the for range loop would continue waiting

Solution

This fix ensures the cleanup goroutine properly exits during graceful shutdown by:

  1. Retaining middleware instance: Changed NewRateLimitMiddleware() to return *RateLimitMiddleware instead of just the handler function
  2. Adding stop channel: Added stopChan chan struct{} to signal the goroutine to exit
  3. Making Stop() idempotent: Used sync.Once to ensure Stop() can be safely called multiple times
  4. Updating cleanup loop: Modified cleanup() to use a select statement that listens for both ticker events and stop signals
  5. Integrating with shutdown: Updated server shutdown flow to call api.Shutdown() which stops the rate limiter

Changes Made

Core Changes

  • internal/restapi/rate_limit_middleware.go:

    • Added stopChan chan struct{} and stopOnce sync.Once fields
    • Changed NewRateLimitMiddleware() return type to *RateLimitMiddleware
    • Added Handler() method to return the middleware handler function
    • Updated cleanup() to use select with stop channel
    • Updated Stop() to close stop channel using sync.Once for idempotency
  • internal/restapi/rest_api.go:

    • Changed rateLimiter field type from func(http.Handler) http.Handler to *RateLimitMiddleware
    • Added Shutdown() method to stop the rate limiter
  • internal/restapi/routes.go:

    • Updated to call api.rateLimiter.Handler() instead of calling the function directly
  • cmd/api/app.go:

    • Updated CreateServer() to return both *http.Server and *restapi.RestAPI
    • Updated Run() to accept *restapi.RestAPI parameter
    • Added api.Shutdown() call during graceful shutdown
  • cmd/api/main.go:

    • Updated to capture and pass API instance to Run()

Test Changes

  • internal/restapi/rate_limit_shutdown_test.go (NEW):

    • Added tests for proper shutdown behavior
    • Added tests for idempotent shutdown calls
  • internal/restapi/rate_limit_middleware_test.go:

    • Updated all tests to use .Handler() method
    • Added defer middleware.Stop() to prevent test-time goroutine leaks
  • cmd/api/app_test.go:

    • Updated all CreateServer() calls to handle new return value
    • Added defer api.Shutdown() to prevent test-time goroutine leaks

Testing

All existing tests pass, and new tests verify:

  • ✅ Shutdown completes within reasonable time
  • ✅ Multiple Stop() calls are safe (idempotent)
  • ✅ No goroutine leaks during shutdown
  • ✅ All existing functionality remains intact

Test Commands

Run these commands to verify the fix:

# Run all rate limiter tests
go test ./internal/restapi -run TestRateLimitMiddleware -v

# Run shutdown-specific tests
go test ./internal/restapi -run "TestRateLimitMiddleware_Shutdown|TestRestAPI_Shutdown" -v

# Run app tests
go test ./cmd/api -v

# Run full test suite
go test ./...

Impact

  • Low Risk: Changes are isolated to rate limiter and server lifecycle
  • Backward Compatible: External API unchanged
  • Pattern Consistency: Follows existing GTFS manager shutdown pattern
  • No Breaking Changes: All existing functionality preserved

Verification

The fix has been verified to:

  • ✅ Stop cleanup goroutine during graceful shutdown
  • ✅ Prevent goroutine leaks
  • ✅ Maintain all existing rate limiting functionality
  • ✅ Pass all existing tests
  • ✅ Follow the same shutdown pattern as gtfsManager.Shutdown()

@Bhup-GitHUB
Copy link
Contributor Author

@Ahmedhossamdev could you please review the PR and let me know if the changes are valid or if anything else is required?

@Ahmedhossamdev
Copy link
Member

@Bhup-GitHUB Could you please request review from @aaronbrethorst

@Bhup-GitHUB
Copy link
Contributor Author

@aaronbrethorst could you please review the PR and let me know if the changes are valid or if anything else is required?

Copy link
Member

@aaronbrethorst aaronbrethorst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this goroutine leak issue! The core approach you've chosen—using a stopChan with sync.Once for idempotent shutdown—is exactly the right pattern for graceful goroutine termination in Go. The API change to return *RateLimitMiddleware instead of just the handler function is also a good design decision that enables proper lifecycle management. Nice work on getting the fundamentals right!

I found a few issues we'll need to address before merging.


Issues to Fix

1. Shutdown Order Needs to be Reversed (Critical)

In cmd/api/app.go, the shutdown order has the GTFS manager shutting down before the API:

// Shutdown GTFS manager
if gtfsManager != nil {
    gtfsManager.Shutdown()
}

if api != nil {
    api.Shutdown()
}

What needs to change:

The API rate limiter should be shut down before the GTFS manager. The correct shutdown order follows a "last in, first out" pattern:

  1. HTTP Server (already correct - stops accepting new connections)
  2. API/Rate Limiter (stops background goroutines for request handling)
  3. GTFS Manager (stops data fetching - the lowest-level dependency)
// Shutdown API rate limiter first
if api != nil {
    api.Shutdown()
}

// Then shutdown GTFS manager
if gtfsManager != nil {
    gtfsManager.Shutdown()
}

This ensures we stop processing requests before we stop the data sources those requests depend on.


2. Tests Don't Verify the Goroutine Actually Exits (Important)

In internal/restapi/rate_limit_shutdown_test.go, the current tests verify that Stop() returns quickly, but they don't actually prove the goroutine terminates. This is important because the whole point of this PR is to fix a goroutine leak!

What needs to change:

Add a test that uses runtime.NumGoroutine() to verify the cleanup goroutine actually exits:

func TestRateLimitMiddleware_GoroutineActuallyExits(t *testing.T) {
    // Get baseline goroutine count
    initial := runtime.NumGoroutine()
    
    middleware := NewRateLimitMiddleware(10, time.Second)
    time.Sleep(10 * time.Millisecond) // Give goroutine time to start
    
    afterCreate := runtime.NumGoroutine()
    assert.Greater(t, afterCreate, initial, "cleanup goroutine should have started")
    
    middleware.Stop()
    time.Sleep(10 * time.Millisecond) // Give goroutine time to exit
    
    afterStop := runtime.NumGoroutine()
    assert.LessOrEqual(t, afterStop, initial, "cleanup goroutine should have exited")
}

This test directly proves the fix works—something we can point to and say "this demonstrates the leak is fixed."


3. Test Goroutine Leaks from Other Tests Using createTestApi (Important)

While you correctly added defer middleware.Stop() to the rate limiter tests, other tests that use createTestApi() don't call api.Shutdown(). This means those tests leak goroutines during the test run.

What needs to change:

For consistency, tests using createTestApi should clean up:

func TestSomeHandler(t *testing.T) {
    api := createTestApi(t)
    defer api.Shutdown()  // Add this line
    
    // ... rest of test
}

This isn't a production issue, but it keeps our test suite clean and consistent with the new lifecycle management pattern you've introduced.


Suggestions (Non-blocking)

  1. Documentation: Consider adding a brief comment to Stop() explaining that it doesn't affect in-flight requests—it only stops the background cleanup goroutine.

  2. Cleaner test structure: In TestRateLimitMiddleware_Shutdown, you have both defer middleware.Stop() and then call Stop() again in the goroutine. While this works due to sync.Once, it's a bit confusing. Consider removing the defer and just testing the explicit call.


Once these issues are addressed, this PR is ready to merge! The fundamental fix is solid—we just need to tighten up the shutdown ordering and add a test that definitively proves the goroutine leak is fixed.

@Bhup-GitHUB
Copy link
Contributor Author

Bhup-GitHUB commented Jan 7, 2026

Hey @aaronbrethorst ! Just wanted to give you a quick update on the changes I made based on your feedback.

  1. Shutdown order: Fixed the shutdown sequence so the API shuts down before the GTFS manager in cmd/api/app.go, following the last in, first out pattern you mentioned.

  2. Goroutine exit test: Added TestRateLimitMiddleware_GoroutineActuallyExits, which uses runtime.NumGoroutine() to verify that the cleanup goroutine actually terminates. I added a runtime.GC() call and a short sleep at the start to clear out any lingering goroutines from other tests, which made the test more stable when run as part of the full suite.

  3. Test cleanup: Added defer api.Shutdown() to all tests that use createTestApi() to avoid goroutine leaks during test runs.

implemented both optional suggestions:

  1. Added documentation to Stop() clarifying that it doesn’t affect in-flight requests.

2.Cleaned up TestRateLimitMiddleware_Shutdown by removing the redundant defer.

All shutdown-related tests are passing now. I just wanted to double-check that using runtime.GC() in the goroutine exit test is okay with you — it seemed like the most reliable way to keep the test stable alongside others. Totally open to a different approach if you’d prefer one.

Thanks again for the detailed review,really appreciated it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants