A lightweight, hierarchical goroutine supervision system for Go that centralizes goroutine creation, lifecycle control, and shutdown handling. This library prevents goroutine leaks, improves observability, and enforces safe concurrency patterns in production applications.
- Introduction
- Pain Points with Raw Goroutines
- Why GoRoutinesManager?
- Architecture
- Quick Start
- Features
- Metrics & Observability
- Best Practices
- API Reference
- Contributing
- License
Go's goroutines are powerful for concurrent programming, but managing them at scale in production applications presents significant challenges. Without proper supervision, goroutines can leak, become orphaned, or fail silently, leading to resource exhaustion and unpredictable behavior.
GoRoutinesManager provides a structured, hierarchical approach to goroutine management that brings enterprise-grade supervision, observability, and lifecycle control to your Go applications.
When using raw goroutines without a management system, developers face several critical challenges:
Problem: Goroutines can leak when parent contexts are cancelled or when goroutines become blocked. Without tracking, there's no guarantee of cleanup, leading to memory leaks that accumulate over time and are difficult to detect in production.
Issues:
- No tracking of spawned goroutines
- No guarantee of cleanup on context cancellation
- Memory leaks accumulate over time
- Difficult to detect in production
Problem: There's no visibility into goroutine lifecycle. You cannot track how many goroutines are active, monitor their health, or understand their execution patterns.
Issues:
- Cannot track how many goroutines are active
- No metrics on goroutine lifecycle (creation, completion, duration)
- No visibility into goroutine health
- Difficult to debug production issues
Problem: There's no coordinated shutdown mechanism. Manual wait group management is error-prone, and there's no timeout handling for stuck goroutines, risking data loss or corruption.
Issues:
- No coordinated shutdown mechanism
- Risk of data loss or corruption
- No timeout handling for stuck goroutines
- Manual wait group management is error-prone
Problem: All goroutines exist at the same level with no logical grouping. This makes it impossible to manage large applications effectively or shutdown specific components independently.
Issues:
- Cannot group related goroutines (e.g., all API handlers)
- Cannot shutdown specific groups independently
- Difficult to manage large applications
- No logical separation between components
Problem: Panics in goroutines can crash the entire application. There's no recovery mechanism by default, making it difficult to handle errors gracefully.
Issues:
- Panics in goroutines can crash the entire application
- No recovery mechanism by default
- Difficult to handle errors gracefully
Problem: Manual context creation and cancellation is error-prone. It's easy to forget cleanup, and there's no automatic propagation of cancellation signals.
Issues:
- Manual context creation and cancellation
- Easy to forget cleanup
- No automatic propagation of cancellation
- Context leaks when not properly managed
GoRoutinesManager addresses all these pain points with a comprehensive, production-ready solution:
- All goroutines are tracked and automatically cleaned up
- Context cancellation is handled automatically
- No orphaned goroutines
- Real-time metrics on all goroutines (count, age, duration)
- Prometheus integration for monitoring
- Detailed operation tracking (create, cancel, shutdown)
- Graceful shutdown with configurable timeouts
- Automatic force-cancellation of stuck goroutines
- Hierarchical shutdown (app → local → routine)
- Three-level hierarchy: Global → App → Local → Routine
- Logical grouping of related goroutines
- Independent lifecycle management per level
- Automatic panic recovery (configurable)
- Timeout support per goroutine
- Function-level wait groups for coordinated shutdown
- Thread-safe operations with proper locking
- High-performance with atomic counters
- Comprehensive error handling
- Extensive test coverage
GoRoutinesManager uses a three-level hierarchical architecture that mirrors real-world application structure
The singleton manager that orchestrates the entire application.
Responsibilities:
- Manages all
AppManagerinstances - Provides global context with signal handling (SIGINT, SIGTERM)
- Stores application-wide metadata and configuration
- Coordinates global shutdown
Key Features:
- Singleton pattern (one instance per process)
- Thread-safe with RWMutex
- Automatic signal handling for graceful shutdown
- Metadata management (metrics, timeouts, limits)
Manages a logical application or service within the system.
Responsibilities:
- Manages all
LocalManagerinstances for a specific app - Provides app-level context (derived from global)
- Coordinates app-level shutdown
- Groups related local managers
Use Cases:
- Separate API server from worker pool
- Different microservices in a monolith
- Different application modules
Manages goroutines for a specific module or file within an app.
Responsibilities:
- Spawns and tracks individual goroutines (
Routine) - Manages function-level wait groups
- Provides local context (derived from app context)
- Handles routine lifecycle (create, cancel, shutdown)
Use Cases:
- All HTTP handlers in
handlers.go - All database workers in
db.go - All background jobs in
jobs.go
Represents a single tracked goroutine.
Properties:
- Unique ID (fast UUID generation ~40ns)
- Function name (for grouping and metrics)
- Context (for cancellation)
- Done channel (for completion signaling)
- Start timestamp (for age tracking)
Lifecycle:
- Created via
LocalManager.Go() - Added to tracking map
- Executes worker function
- Automatically removed on completion
The Context system in GoRoutinesManager provides a hierarchical, process-wide context management solution with automatic signal handling and graceful shutdown capabilities.
The Context architecture consists of two main components:

- GlobalContext - Process-wide context with signal handling
- AppContext - Application-level contexts derived from global context
Purpose: Provides a single, process-wide context that serves as the root for all application contexts.
Key Features:
- Singleton Pattern: One global context per process, shared across all components
- Signal Handling: Automatically listens for SIGINT and SIGTERM signals
- Thread-Safe: Protected by RWMutex for concurrent access
- Idempotent Operations: Safe to call
Init()orGet()multiple times - Automatic Initialization:
Get()automatically initializes if context doesn't exist
Architecture:
GlobalContext (package-level state)
├── Signal Handler (SIGINT, SIGTERM)
├── Global Cancel Function
└── App Context Registry
├── App Context 1
├── App Context 2
└── App Context N
Signal Handling Flow:
- On first
Init(), a signal handler is registered (usingsync.Onceto ensure single registration) - Signal handler listens for SIGINT (Ctrl+C) and SIGTERM (termination signal)
- When signal is received,
Shutdown()is called automatically - All app-level contexts are cancelled first
- Global context is cancelled, propagating to all child contexts
Thread Safety:
- All operations protected by
ctxMu(RWMutex) - Read operations use
RLock()for concurrent access - Write operations use
Lock()for exclusive access - Signal handler setup uses
sync.Onceto prevent race conditions
Purpose: Provides application-level contexts that are children of the global context, allowing independent lifecycle management per application.
Key Features:
- Hierarchical Derivation: Each app context is derived from the global context
- Automatic Propagation: Cancelling global context automatically cancels all app contexts
- Independent Lifecycle: Each app can be shut down independently
- Context Registry: All app contexts are tracked in a map for management
Architecture:
GlobalContext
│
├── AppContext("api-server")
│ │
│ ├── LocalContext("handlers")
│ │ └── RoutineContext("httpHandler-1")
│ │
│ └── LocalContext("workers")
│ └── RoutineContext("worker-1")
│
└── AppContext("worker-pool")
│
└── LocalContext("jobs")
└── RoutineContext("job-1")
Context Creation Flow:
GetAppContext(appName)is called- System checks if global context exists, initializes if needed
- System checks if app context already exists for the app name
- If exists and valid, returns existing context
- If not, creates new app context as child of global context
- App context is registered in
appContextsmap - Cancel function is stored in
appCancelsmap
Shutdown Flow:
Shutdown()is called on app context- App's cancel function is invoked
- App context is removed from registry
- All child contexts (local, routine) are automatically cancelled via context propagation
The context hierarchy ensures proper cancellation propagation:
Cancellation Rules:
- Global Context Cancellation: When global context is cancelled (via signal or manual shutdown), all app contexts are automatically cancelled, which cascades to all local and routine contexts
- App Context Cancellation: When an app context is cancelled, only that app's local and routine contexts are cancelled; other apps remain unaffected
- Local Context Cancellation: When a local context is cancelled, only that local manager's routine contexts are cancelled
- Routine Context Cancellation: When a routine context is cancelled, only that specific goroutine is affected
Benefits:
- Cascade Shutdown: Cancelling a parent automatically cancels all children
- Selective Shutdown: Can shutdown specific apps or modules without affecting others
- Automatic Cleanup: No manual context management required
- Signal Integration: System signals automatically trigger graceful shutdown
The Context package provides utilities for creating child contexts:
Functions:
SpawnChild(ctx)- Creates a child context with cancel from any parent contextSpawnChildWithTimeout(ctx, timeout)- Creates a child context with timeoutNewChildContext()- Creates child from global contextNewChildContextWithTimeout(timeout)- Creates child with timeout from global context
Usage Pattern:
Child contexts are automatically created by the manager system when spawning goroutines. The LocalManager creates routine contexts as children of the local context, which is a child of the app context, which is a child of the global context.
All context operations are thread-safe:
- Global State: Protected by
ctxMu(RWMutex) - Concurrent Reads: Multiple goroutines can read contexts simultaneously
- Exclusive Writes: Write operations (create, cancel, shutdown) are exclusive
- Signal Handler: Protected by
sync.Onceto ensure single registration - Map Operations: All map access (appContexts, appCancels) is protected by mutex
- Uninitialized: Global context doesn't exist
- Initialized: Global context created, signal handler registered
- Active: Contexts are active and can be used
- Cancelled: Context is cancelled,
ctx.Done()channel is closed - Shutdown: All contexts cleaned up, state reset
The Context system integrates seamlessly with the manager hierarchy:
- GlobalManager uses
GlobalContextfor process-wide coordination - AppManager uses
AppContextfor app-level coordination - LocalManager creates routine contexts as children of app context
- Routine uses routine context for cancellation signaling
This integration ensures that:
- Manager shutdown automatically triggers context cancellation
- Context cancellation automatically signals goroutines to stop
- Signal handling works across the entire system
- No manual context management is required
- User calls
LocalManager.Go()with function name and worker function - LocalManager creates a new
Routineinstance with unique ID - Routine is added to tracking map (atomic counter incremented)
- Context is derived from LocalManager's parent context (app context)
- Goroutine is spawned with context and done channel
- Worker function executes with the routine's context
- On completion: routine is removed from map, counter decremented, metrics updated
- If panic occurs (and recovery enabled): panic is caught, logged, and goroutine completes normally
Safe Shutdown (graceful → timeout → force):
- User calls
GlobalManager.Shutdown(safe=true) - GlobalManager iterates all AppManagers
- For each AppManager:
a. Iterates all LocalManagers
b. For each LocalManager:
- Attempts graceful shutdown: cancels contexts and waits for WaitGroup
- If timeout occurs: force cancels remaining routines
- Removes all routines from tracking map c. Cancels app context
- Cancels global context
- All contexts propagate cancellation automatically via context hierarchy
Unsafe Shutdown (immediate):
- User calls
GlobalManager.Shutdown(safe=false) - All contexts are cancelled immediately
- Routines are removed from tracking without waiting
- No graceful shutdown attempt
- User calls
LocalManager.ShutdownFunction(functionName, timeout) - System finds all routines with matching function name
- Cancels all routine contexts
- Waits for function wait group with timeout
- If timeout: removes remaining routines and cleans up wait group
- If success: all routines completed, wait group cleaned up
All operations are thread-safe:
- GlobalManager: Protected by
sync.RWMutex(read-write lock) - AppManager: Protected by
sync.RWMutexper instance - LocalManager: Protected by
sync.RWMutexper instance - Routine Count: Atomic operations (
sync/atomic) for lock-free reads - Metadata: Protected by
sync.RWMutex - Context System: Protected by
sync.RWMutexfor all operations
Lock Strategy:
- Read Operations: Use
RLock()allowing concurrent reads - Write Operations: Use
Lock()for exclusive access - Atomic Operations: Use
sync/atomicfor counters (lock-free reads) - Signal Handler: Use
sync.Onceto prevent race conditions
- Atomic Counters: Routine counts use atomic operations for O(1) lock-free reads
- Fast UUID Generation: Custom UUID generation (~40ns) vs crypto UUID (~microseconds)
- Efficient Map Operations: Direct map access with proper locking
- Minimal Allocations: Reuse of contexts and channels where possible
- Lock-Free Reads: Routine count reads don't require locks
go get github.com/JupiterMetaLabs/goroutine-orchestrator- Initialize Global Manager - Creates the singleton global manager and sets up signal handling
- Create App Manager - Creates an app-level manager for your application or service
- Create Local Manager - Creates a local manager for a specific module or file
- Spawn Goroutines - Use
LocalManager.Go()to spawn tracked goroutines - Shutdown - System automatically handles shutdown on SIGINT/SIGTERM, or call
Shutdown()manually
The global context automatically listens for SIGINT (Ctrl+C) and SIGTERM signals and triggers graceful shutdown of all managers and goroutines.
- ✅ Hierarchical Management: Three-level structure (Global → App → Local → Routine)
- ✅ Automatic Tracking: All goroutines tracked automatically with unique IDs
- ✅ Context Management: Automatic context creation and hierarchical cancellation
- ✅ Safe Shutdown: Graceful shutdown with timeout and force-cancel fallback
- ✅ Panic Recovery: Built-in panic recovery (configurable, enabled by default)
- ✅ Timeout Support: Per-goroutine timeout configuration
- ✅ Function Wait Groups: Coordinate shutdown of specific functions
- ✅ Thread Safety: All operations are thread-safe with proper locking
- ✅ High Performance: Optimized for low overhead with atomic operations
- ✅ Prometheus Metrics: Comprehensive metrics integration with 18+ metric types
- ✅ Real-time Tracking: Live goroutine counts, ages, durations
- ✅ Operation Metrics: Track all operations (create, cancel, shutdown)
- ✅ Error Tracking: Detailed error metrics with categorization
- ✅ Grafana Dashboards: Pre-built dashboards for visualization
- ✅ Metadata Management: Configure timeouts, limits, metrics via metadata API
- ✅ Selective Shutdown: Shutdown specific functions, apps, or modules
- ✅ Routine Inspection: Query routine status, context, uptime, completion state
- ✅ Signal Handling: Automatic SIGINT/SIGTERM handling via global context
- ✅ Builder Pattern: Fluent API for configuration and setup
GoRoutinesManager provides comprehensive Prometheus metrics for complete observability:
goroutine_manager_global_initialized- Whether global manager is initializedgoroutine_manager_global_app_managers_total- Total app managersgoroutine_manager_global_local_managers_total- Total local managersgoroutine_manager_global_goroutines_total- Total tracked goroutinesgoroutine_manager_global_shutdown_timeout_seconds- Configured shutdown timeout
goroutine_manager_app_initialized- Whether app is initializedgoroutine_manager_app_local_managers- Local managers per appgoroutine_manager_app_goroutines- Goroutines per app
goroutine_manager_local_goroutines- Goroutines per local managergoroutine_manager_local_function_waitgroups- Function wait groups per local manager
goroutine_manager_goroutine_by_function- Goroutines grouped by functiongoroutine_manager_goroutine_duration_seconds- Goroutine execution duration (histogram)goroutine_manager_goroutine_age_seconds- Age of currently running goroutines
goroutine_manager_operations_goroutine_operations_total- Goroutine operations countergoroutine_manager_operations_manager_operations_total- Manager operations countergoroutine_manager_operations_function_operations_total- Function operations countergoroutine_manager_operations_errors_total- Error counter with error typesgoroutine_manager_operations_goroutine_operation_duration_seconds- Operation duration (histogram)goroutine_manager_operations_manager_operation_duration_seconds- Manager operation duration (histogram)goroutine_manager_operations_shutdown_duration_seconds- Shutdown duration (histogram)goroutine_manager_operations_shutdown_goroutines_remaining- Goroutines remaining after shutdown timeout
The metrics system supports two integration patterns:
- Handler Integration (Recommended): Register the metrics handler with your existing HTTP server
- Standalone Server: Start a dedicated metrics server (useful for testing/demos)
The metrics collector runs periodically (configurable interval, default 5 seconds) and updates all metrics from the manager state.
A pre-built Grafana dashboard is available for visualizing all metrics, providing:
- Real-time goroutine counts and trends
- Goroutine age and duration distributions
- Operation rates and error rates
- Shutdown metrics and health indicators
For detailed metrics documentation, see:
- metrics/README.md - Metrics integration guide
- metrics/API.md - Complete API reference
- metrics/Dashboard/GRAFANA_DASHBOARD.md - Dashboard setup
Worker functions should always check ctx.Done() in loops to ensure they can exit gracefully when the context is cancelled. This prevents goroutines from running indefinitely.
When spawning multiple goroutines for the same function, use function wait groups to coordinate their completion. This allows you to wait for all instances of a function to finish before proceeding.
Panic recovery is enabled by default and should remain enabled in production. This prevents panics in individual goroutines from crashing the entire application. Only disable if you have specific error handling requirements.
Configure timeouts for goroutines that might run indefinitely. This ensures they are automatically cancelled after a specified duration, preventing resource leaks.
Use the hierarchical structure to organize goroutines logically. Group related goroutines under the same app and local manager for better organization and easier management.
Configure appropriate shutdown timeouts based on your application's requirements. Use safe shutdown for graceful termination, which attempts graceful shutdown first and force-cancels only if timeout occurs.
Enable metrics and monitor them in production. Track goroutine counts, ages, and error rates to detect issues early. Use Grafana dashboards for visualization and alerting.
When possible, use selective shutdown (function-level or app-level) instead of global shutdown. This allows you to shutdown specific components without affecting others.
Initialization:
NewGlobalManager()- Creates a new global manager instanceInit()- Initializes the global manager and sets up signal handling
Shutdown:
Shutdown(safe bool)- Shuts down all app managers (safe = graceful, unsafe = immediate)
Metadata:
GetMetadata()- Returns current metadata configurationUpdateMetadata(flag, value)- Updates metadata (timeouts, limits, metrics)
Listing:
GetAllAppManagers()- Returns all app managersGetAppManagerCount()- Returns count of app managersGetAllLocalManagers()- Returns all local managers across all appsGetLocalManagerCount()- Returns total count of local managersGetAllGoroutines()- Returns all tracked goroutinesGetGoroutineCount()- Returns total count of tracked goroutines
Creation:
NewAppManager(appName)- Creates a new app manager instanceCreateApp()- Creates and registers the app manager
Shutdown:
Shutdown(safe bool)- Shuts down all local managers in the app
Local Managers:
CreateLocal(localName)- Creates a new local managerGetAllLocalManagers()- Returns all local managers in the appGetLocalManagerCount()- Returns count of local managersGetLocalManagerByName(localName)- Returns specific local manager
Goroutines:
GetAllGoroutines()- Returns all goroutines in the appGetGoroutineCount()- Returns count of goroutines in the app
Creation:
NewLocalManager(appName, localName)- Creates a new local manager instanceCreateLocal(localName)- Creates and registers the local manager
Goroutine Spawning:
Go(functionName, workerFunc, opts...)- Spawns a tracked goroutine with optional configuration
Shutdown:
Shutdown(safe bool)- Shuts down all goroutines in the local managerShutdownFunction(functionName, timeout)- Shuts down all goroutines of a specific function
Wait Groups:
NewFunctionWaitGroup(ctx, functionName)- Creates or retrieves a function wait groupWaitForFunction(functionName)- Waits for all goroutines of a function to completeWaitForFunctionWithTimeout(functionName, timeout)- Waits with timeoutGetFunctionGoroutineCount(functionName)- Returns count of goroutines for a function
Routine Management:
GetAllGoroutines()- Returns all tracked goroutinesGetGoroutineCount()- Returns count of tracked goroutinesGetRoutine(routineID)- Returns a specific routine by IDGetRoutinesByFunctionName(functionName)- Returns all routines for a functionCancelRoutine(routineID)- Cancels a specific routineWaitForRoutine(routineID, timeout)- Waits for a routine to completeIsRoutineDone(routineID)- Checks if a routine is doneGetRoutineContext(routineID)- Returns a routine's contextGetRoutineStartedAt(routineID)- Returns routine start timestampGetRoutineUptime(routineID)- Returns routine uptime durationIsRoutineContextCancelled(routineID)- Checks if routine context is cancelled
WithTimeout(duration)- Sets a timeout for the goroutineWithPanicRecovery(enabled)- Enables or disables panic recoveryAddToWaitGroup(functionName)- Adds goroutine to a function wait group
SET_METRICS_URL- Configure metrics (string URL, or [bool, string], or [bool, string, duration])SET_SHUTDOWN_TIMEOUT- Configure shutdown timeout (duration)SET_MAX_ROUTINES- Configure maximum routines limit (int)SET_UPDATE_INTERVAL- Configure metrics update interval (duration)
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow Go best practices and conventions
- Add tests for new features
- Update documentation
- Ensure all tests pass
- Run
go fmtandgo vet
See LICENSE file for details.
For issues, questions, or contributions, please open an issue on GitHub.
Built with ❤️ for the Go community
