Skip to content

Log a warning when async shutdown stops with queued events still pending #1445

@githoober

Description

@githoober

Log a warning when async shutdown stops with queued events still pending

Summary

When the PHP agent uses async backend communication, shutdown appears to stop draining the send queue once its drain deadline is reached. In that path, queued events may be left unsent without any dedicated warning in the agent log.

That makes missing transactions and spans difficult to diagnose. Users can see data loss in APM, but there is no clear log signal that shutdown ended before the queue was fully flushed.

Observed behavior

In a CLI batch workload, some transactions were missing from APM even though we did not see queue-overflow errors in the agent log.

While investigating, I noticed that the async backend shutdown logic seems to exit on timeout via shouldExitBy without emitting a warning in that branch:

// backgroundBackendCommThreadFunc_shouldBreakLoop in backend_comm.cpp
if ( compareAbsTimeSpecs( &sharedStateSnapshot->shouldExitBy, &now ) < 0 )
{
    *shouldBreakLoop = true;
    goto success;
}

I may be missing surrounding context in the function, but from reading this branch it looks like shutdown can stop without a user-visible indication that queued data was still pending.

Why this matters

The queue overflow path already logs a clear error when events are rejected because the queue limit is exceeded.

The shutdown-timeout path is a different failure mode. If the queue is still draining when the process exits, users need a comparable signal in logs so they can distinguish:

  • queue overflow during runtime
  • shutdown ending before async flush completed
  • normal successful drain

Without that signal, the only visible symptom may be missing APM data after process exit.

Request

Please log a warning when async shutdown stops because the drain timeout/deadline is reached and there is still queued data waiting to be sent.

Even a simple warning would help, for example:

ELASTIC_APM_LOG_WARNING(
    "Async shutdown drain timed out with queued events still pending."
    " Remaining events may be discarded."
);

If queue count or queued byte size is already available at that point, including those values in the message would be even better. But the main request is the warning itself, not a specific implementation.

Relevant code

  • agent/native/ext/backend_comm.cpp
  • function backgroundBackendCommThreadFunc_shouldBreakLoop

Environment

  • Ecentria fork: ecentria/apm-agent-php
  • PHP 7.3
  • Elastic APM PHP agent 1.15.1
  • CLI SAPI
  • async backend communication enabled

Additional context

  • This request is about observability of the shutdown path, not necessarily changing shutdown behavior.
  • If useful, I can provide the CLI workload pattern and agent logs from the failing run.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions