Skip to content

Performance Investigation: CPU Spikes & System Call Overhead in New Relic PHP Agent #1021

Open
@theophileds

Description

@theophileds

Context

Following up on issue #806, we conducted an isolated investigation to better understand the CPU spikes observed when the New Relic PHP agent is enabled. Our testing was performed in a controlled environment with a single container on a dedicated Kubernetes node.

Environment

  • Kubernetes 1.30 (EKS)
  • Instance type: m7a
  • PHP-FPM 8.2
  • New Relic PHP Agent: Latest version with all features disabled

Findings

CPU Usage Pattern

CPU usage graph showing spikes to 100% with New Relic disabled and 300% with New Relic enabled
*Figure 1: Grafana CPU metrics showing distinct usage patterns:

  • Baseline period with normal activity
  • Spike to ~100% CPU with New Relic disabled (16:10)
  • Spike to ~300% CPU with New Relic enabled (16:15)*

Flame Graph Comparison

Flame graph visualization without New Relic enabled
Figure 2: System-wide flame graph (test 4) with New Relic disabled, showing normal system call patterns and CPU usage distribution

Flame graph visualization with New Relic enabled
Figure 3: System-wide flame graph (test 4) with New Relic enabled, demonstrating significantly increased fstatat64 system calls and higher CPU utilization across all cores

This pattern remained consistent across multiple test runs and was not affected by:

  • Disabling all New Relic features
  • Using the latest agent version
  • Different sampling frequencies (99Hz and 997Hz)

System Call Analysis

Through system-wide performance profiling, we identified a significant increase in fstatat64 system calls when the New Relic agent is enabled. This suggests excessive file operations being performed by the agent.

Testing Methodology

We conducted extensive profiling using:

  1. PHP-FPM specific profiling at different sampling rates:
    perf record -F [99|997] -p $(pgrep php-fpm -o) -a -g --call-graph fp -- sleep 60

  2. System-wide profiling:
    perf record -F [99|997] -a -g -- sleep 60

  3. System call tracing:
    timeout 60 strace -tt -f -C -p $(pgrep -o php-fpm)

Version Impact

This performance regression appears to have been introduced between versions 10.0.0.312 and 10.7.0.319. Earlier versions did not exhibit this behavior.

Supporting Evidence

All profiling results are attached to this issue in newrelic_profiling_results.zip, which includes:

PHP-FPM Specific Profiles

  • With New Relic disabled:
    • 99Hz sampling (phpfpm_nr_off_99hz.*)
    • 997Hz sampling (phpfpm_nr_off_997hz.*)
  • With New Relic enabled:
    • 99Hz sampling (phpfpm_nr_on_99hz.*)
    • 997Hz sampling (phpfpm_nr_on_997hz.*)

System-Wide Profiles

  • With New Relic disabled:
    • Test 3 (system_nr_off_99hz_test3.*)
    • Test 4 (system_nr_off_99hz_test4.*)
  • With New Relic enabled:
    • Test 3 (system_nr_on_99hz_test3.*)
    • Test 4 (system_nr_on_99hz_test4.*)

Questions

  • Is there a known reason for the increased frequency of fstatat64 calls?
  • Are there plans to optimize file operations in future releases?
  • Could this be related to the agent's file monitoring or instrumentation mechanisms?

newrelic_profiling_results.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions