Log a warning when async shutdown stops with queued events still pending
Summary
When the PHP agent uses async backend communication, shutdown appears to stop draining the send queue once its drain deadline is reached. In that path, queued events may be left unsent without any dedicated warning in the agent log.
That makes missing transactions and spans difficult to diagnose. Users can see data loss in APM, but there is no clear log signal that shutdown ended before the queue was fully flushed.
Observed behavior
In a CLI batch workload, some transactions were missing from APM even though we did not see queue-overflow errors in the agent log.
While investigating, I noticed that the async backend shutdown logic seems to exit on timeout via shouldExitBy without emitting a warning in that branch:
// backgroundBackendCommThreadFunc_shouldBreakLoop in backend_comm.cpp
if ( compareAbsTimeSpecs( &sharedStateSnapshot->shouldExitBy, &now ) < 0 )
{
*shouldBreakLoop = true;
goto success;
}
I may be missing surrounding context in the function, but from reading this branch it looks like shutdown can stop without a user-visible indication that queued data was still pending.
Why this matters
The queue overflow path already logs a clear error when events are rejected because the queue limit is exceeded.
The shutdown-timeout path is a different failure mode. If the queue is still draining when the process exits, users need a comparable signal in logs so they can distinguish:
- queue overflow during runtime
- shutdown ending before async flush completed
- normal successful drain
Without that signal, the only visible symptom may be missing APM data after process exit.
Request
Please log a warning when async shutdown stops because the drain timeout/deadline is reached and there is still queued data waiting to be sent.
Even a simple warning would help, for example:
ELASTIC_APM_LOG_WARNING(
"Async shutdown drain timed out with queued events still pending."
" Remaining events may be discarded."
);
If queue count or queued byte size is already available at that point, including those values in the message would be even better. But the main request is the warning itself, not a specific implementation.
Relevant code
agent/native/ext/backend_comm.cpp
- function
backgroundBackendCommThreadFunc_shouldBreakLoop
Environment
- Ecentria fork: ecentria/apm-agent-php
- PHP 7.3
- Elastic APM PHP agent 1.15.1
- CLI SAPI
- async backend communication enabled
Additional context
- This request is about observability of the shutdown path, not necessarily changing shutdown behavior.
- If useful, I can provide the CLI workload pattern and agent logs from the failing run.
Log a warning when async shutdown stops with queued events still pending
Summary
When the PHP agent uses async backend communication, shutdown appears to stop draining the send queue once its drain deadline is reached. In that path, queued events may be left unsent without any dedicated warning in the agent log.
That makes missing transactions and spans difficult to diagnose. Users can see data loss in APM, but there is no clear log signal that shutdown ended before the queue was fully flushed.
Observed behavior
In a CLI batch workload, some transactions were missing from APM even though we did not see queue-overflow errors in the agent log.
While investigating, I noticed that the async backend shutdown logic seems to exit on timeout via
shouldExitBywithout emitting a warning in that branch:I may be missing surrounding context in the function, but from reading this branch it looks like shutdown can stop without a user-visible indication that queued data was still pending.
Why this matters
The queue overflow path already logs a clear error when events are rejected because the queue limit is exceeded.
The shutdown-timeout path is a different failure mode. If the queue is still draining when the process exits, users need a comparable signal in logs so they can distinguish:
Without that signal, the only visible symptom may be missing APM data after process exit.
Request
Please log a warning when async shutdown stops because the drain timeout/deadline is reached and there is still queued data waiting to be sent.
Even a simple warning would help, for example:
If queue count or queued byte size is already available at that point, including those values in the message would be even better. But the main request is the warning itself, not a specific implementation.
Relevant code
agent/native/ext/backend_comm.cppbackgroundBackendCommThreadFunc_shouldBreakLoopEnvironment
Additional context