Skip to content

/proc/cvitek/vi_dbg is read via popen("cat | grep | awk") on a hot poll path #790

@rohanpandula

Description

@rohanpandula

/proc/cvitek/vi_dbg is read via popen("cat | grep | awk") on a hot poll path

Summary

On firmware 2.3.6 (PCIe variant), two processes (kvm_system and NanoKVM-Server) read /proc/cvitek/vi_dbg via shell pipelines on a recurring poll. The read itself takes about 1 second on my box, almost all of it spent blocked on the Cvitek driver. Each shell pipeline launches 3-4 short-lived processes that live for that full second, and the two callers don't share the result.

Even if the 1-second driver read is expected, userspace shouldn't be paying that cost through repeated shell pipelines from multiple independent callers.

Hardware / firmware

  • NanoKVM PCIe
  • Firmware 2.3.6 (image v1.4.1, application 2.3.6)
  • Kernel: Linux kvm-7c12 5.10.4-tag- riscv64
  • The box had 99 days of uptime when the numbers were worst

Measurements

/proc/cvitek/vi_dbg takes ~1 second per read

Five consecutive timed runs of the exact pipeline kvm_system uses:

$ /usr/bin/time -f "%e sec (%U user + %S sys)" sh -c \
    "cat /proc/cvitek/vi_dbg | grep VIFPS | awk '{print \$3}'"
60
1.01 sec (0.00 user + 0.04 sys)
60
1.01 sec (0.01 user + 0.03 sys)
60
1.01 sec (0.01 user + 0.03 sys)
60
1.01 sec (0.01 user + 0.03 sys)
60
1.01 sec (0.00 user + 0.03 sys)

~40 ms of CPU per call, the rest is sleep time waiting on the driver.

Process churn is continuous

Sampling /proc/stat's processes counter over 5 seconds on an otherwise idle box:

procs created in 5s: 21  (≈ 4/sec)

That's one pipeline per second give or take. Parent PIDs are kvm_system (2705) and NanoKVM-Server (2720):

PID  PPID  STAT  COMMAND
4008 2720  S     sh -c cat /proc/cvitek/vi_dbg | grep -A 17 VIDevFPS | awk '{print $3}'
4015 2705  S     sh -c cat /proc/cvitek/vi_dbg | grep VIFPS | awk '{print $3}'
4016 4015  D     cat /proc/cvitek/vi_dbg         <- uninterruptible sleep on the driver

Because each pipeline takes 1 second and they fire roughly every second, there's effectively always one process in D state contributing to load average.

Load average drifts up over uptime

Fresh boot: load ~1.0. After 99 days of uptime:

load average: 8.08 7.84 7.47     (single-core RISC-V SoC)
NanoKVM-Server %CPU: 77

A reboot resets it. On one core, load in the 4-8 range is enough that the NanoKVM-Server HTTP workers start getting blocked long enough for /api/auth/login to return 500 under any concurrency. I suspect that's what the "web UI takes 9 minutes to load" reports are seeing (#223), though I haven't verified that part.

Where the code is

Two C++ call sites, both in support/sg2002/:

  • support/sg2002/kvm_system/main/lib/system_state/system_state.cpp:245

    fp = popen("cat /proc/cvitek/vi_dbg | grep VIFPS | awk '{print $3}'", "r");

    In kvm_update_hdmi_state(). The static uint8_t check_times = 4 seed with ++check_times > 5 fires on iterations 2, 8, 14… so once every 6 ticks of the caller loop.

  • support/sg2002/additional/kvm/src/kvm_vision.cpp:220

    char cmd[100] = "cat /proc/cvitek/vi_dbg | grep -A 17 VIDevFPS | awk '{print $3}'";
    FILE* fp = popen(cmd, "r");

The NanoKVM-Server side is linked against libkvm.so. server/common/kvm_vision.go has // #cgo LDFLAGS: -lkvm, and:

$ strings server/dl_lib/libkvm.so | grep vi_dbg
cat /proc/cvitek/vi_dbg | grep -A 17 VIDevFPS | awk '{print $3}'

So the Go server's shell children are coming from libkvm.so through cgo, not from anything in server/*.go.

Bonus bug worth fixing while you're in there

system_state.cpp:245-249:

fp = popen("cat /proc/cvitek/vi_dbg | grep VIFPS | awk '{print $3}'", "r");
if (fp == NULL) {
    pclose(fp);   // pclose(NULL) — undefined behavior
    return;
}

If popen returns NULL, pclose(NULL) is undefined. Doesn't matter in practice because popen rarely fails here, but worth cleaning up.

Suggested directions

The 1-second driver read is the underlying cost and none of these eliminates it. Each one is still a real win on its own:

  • Replace the shell pipelines with native fopen/read plus a small parser. Removes the user/sys CPU of the shell children (a few percent of one core continuously) and all the fork/exec/pipe churn. Also fixes the pclose(NULL) bug.
  • Share the parsed result between callers. Right now kvm_system and libkvm.so are independently polling the same slow file and paying for it twice. One shared cache with a short TTL (seconds) would halve the read rate immediately.
  • Look at poll cadence. HDMI-state transitions don't happen at 1 Hz. Widening the interval to 10-30 seconds would remove most of the D-state time regardless of how the read is done.
  • Separately: investigate why the proc read blocks for ~1 second. That's in the Cvitek driver/stats layer and probably a different conversation with different maintainers.

Related reports that might share this cause

The "slow after days of uptime, reboot fixes it" pattern in several of these lines up with what I'm measuring. Not all of them will trace to this (some are clearly HDMI or network), but enough of them fit the pattern to make it worth flagging.

Offer

I can submit a PR for the popen → native-read refactor (both C++ sites) and the pclose(NULL) fix if that'd be useful. The Go-side change would need someone who can rebuild libkvm.so.


Used Claude to help read the code and draft this. All measurements are from my own device; code references verified locally.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions