Skip to content

Max CPU Time Per Tick measurements are misleading #116

@enicholson

Description

@enicholson

Description

TLDR: I suspect using uv_getrusage to calculate tick times is including times from all Node.js threads and not wall clock time. This could be much higher on multicore systems.

We upgraded a Node.js application that we've been monitoring with New Relic for ages earlier this year from Node.js 8.x to 10.x (10.20.1 currently to be specific) and noticed at the time that most of the metrics looked much better. One that stood out was the Max CPU Time per tick which got noticeably worse. Unfortunately I don't have a great comparison right now, but we've seen maybe 50%-100% higher tick times. Unexpectedly total event loop ticks per minute is similar or higher.

We've recently been diagnosing some performance issues and max time per tick has been my go-to measurement for spotting responsiveness issues and our high tick times weren't manifesting in GC pressure, transaction latency, etc.

Digging into the code a little bit, I found LoopChecker::_checkCB is using uv_getrusage which is really just a wrapper around Unix getrusage. I found this snippet on Wikipedia:
POSIX functions clock() and getrusage() can be used to get CPU time consumed by any process in a POSIX environment. If the process is multithreaded, the CPU time is the sum for all threads. With Linux starting from kernel 2.6.26 there is a parameter RUSAGE_THREAD which leads to resource usage statistics for the calling thread only.

Expected Behavior

Event loop metrics would report wall-clock/single-threaded time that's more relevant to a Node.js developer.

Your Environment

  • Node.js 10.20.1
  • AWS Linux
  • AWS c5.Xlarge ( 4 thread/2 core) VM

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Triage Needed

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions