Component(s)
receiver/windowseventlog
Is your feature request related to a problem? Please describe.
The evtNext function in api.go wraps the Windows EvtNext syscall. The current caller in subscription.go invokes it with timeout=0.
Timeout=0 tells the Windows API "don't wait for new events, return immediately." However, this parameter only controls the event-wait phase. It does not bound the time spent in the execution phase — including RPC transport, kernel I/O, storage driver operations, and security descriptor checks. These layers can hang indefinitely regardless of the timeout value.
This issue becomes significantly more severe in enterprise deployments where the OpenTelemetry Collector is configured to auto-discover and collect from Active Directory Domain Controllers (DCs).
Domain Controllers are uniquely problematic for this issue due to high number of evtNext calls
Describe the solution you'd like
Add a Go-level context-aware timeout wrapper at the Subscription level in subscription.go, leaving api.go unchanged:
Something like Below
var (
errSyscallTimeout = errors.New("evtNext syscall safety timeout exceeded")
defaultSyscallSafetyTimeout = 10 * time.Second
)
func (s *Subscription) evtNextWithTimeout(
ctx context.Context,
resultSet uintptr,
eventsSize uint32,
events *uintptr,
timeout, flags uint32,
returned *uint32,
) error {
done := make(chan error, 1)
go func() {
done <- evtNext(resultSet, eventsSize, events, timeout, flags, returned)
}()
safetyTimeout := s.syscallSafetyTimeout
if safetyTimeout == 0 {
safetyTimeout = defaultSyscallSafetyTimeout
}
totalTimeout := time.Duration(timeout)*time.Millisecond + safetyTimeout
timer := time.NewTimer(totalTimeout)
defer timer.Stop()
select {
case err := <-done:
return err
case <-ctx.Done():
return ctx.Err()
case <-timer.C:
s.orphanedGoroutines.Add(1)
s.logger.Warn("evtNext did not return within safety timeout; goroutine orphaned",
zap.Duration("safety_timeout", totalTimeout),
zap.Int64("orphaned_goroutines", s.orphanedGoroutines.Load()),
zap.String("channel", s.channel),
)
return errSyscallTimeout
}
}
Describe alternatives you've considered
No response
Additional context
No response
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.
Component(s)
receiver/windowseventlog
Is your feature request related to a problem? Please describe.
The
evtNextfunction inapi.gowraps the WindowsEvtNextsyscall. The current caller insubscription.goinvokes it withtimeout=0.Timeout=0 tells the Windows API "don't wait for new events, return immediately." However, this parameter only controls the event-wait phase. It does not bound the time spent in the execution phase — including RPC transport, kernel I/O, storage driver operations, and security descriptor checks. These layers can hang indefinitely regardless of the timeout value.
This issue becomes significantly more severe in enterprise deployments where the OpenTelemetry Collector is configured to auto-discover and collect from Active Directory Domain Controllers (DCs).
Domain Controllers are uniquely problematic for this issue due to high number of evtNext calls
Describe the solution you'd like
Add a Go-level context-aware timeout wrapper at the Subscription level in subscription.go, leaving api.go unchanged:
Something like Below
Describe alternatives you've considered
No response
Additional context
No response
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding
+1orme too, to help us triage it. Learn more here.