Skip to content

Fix lxc.hook.start not working on cgroup v2 systems#586

Open
mattelacchiato wants to merge 1 commit into
alexreinert:masterfrom
mattelacchiato:fix/cgroupv2-hook-workaround
Open

Fix lxc.hook.start not working on cgroup v2 systems#586
mattelacchiato wants to merge 1 commit into
alexreinert:masterfrom
mattelacchiato:fix/cgroupv2-hook-workaround

Conversation

@mattelacchiato

Copy link
Copy Markdown

Problem

On systems using cgroup v2 (unified hierarchy), such as Debian Trixie with LXC 6.0, the lxc.hook.start container hook does not work correctly:

  1. The hook runs partially — find /var/* -exec rm deletes pre-generated files
  2. The subsequent mknod calls fail due to cgroup v2 device restrictions
  3. The hook aborts before writing /var/hm_mode

Without /var/hm_mode containing HM_MODE='NORMAL', lighttpd and ReGaHss refuse to start (S50lighttpd and S70ReGaHss both check [[ "${HM_MODE}" != "NORMAL" ]] && exit 0), making the WebUI completely inaccessible.

Root Cause

cgroup v2 handles device access control differently from cgroup v1. The lxc.cgroup.devices.allow directives in lxc.config are cgroup v1 specific. On cgroup v2 systems, device node creation via mknod inside the container hook fails silently, leaving the hook in a broken state where it has already deleted /var contents but cannot complete its setup.

Fix

This commit detects cgroup v2 by checking for /sys/fs/cgroup/cgroup.controllers and, when present:

  • Removes lxc.hook.start from the generated LXC config (prevents the hook from deleting files and then failing)
  • Generates /var/hm_mode and related files (board_serial, rf_board_serial, etc.) from the host side, writing to /tmp/pivccu-var which is bind-mounted as /var in the container
  • Creates device nodes (/dev/raw-uart, /dev/mmd_hmip, /dev/eq3loop, /dev/mmd_bidcos) via lxc-attach after container start, using full paths (/bin/mknod) since PATH may not include /sbin in systemd service context

On cgroup v1 systems, behavior is completely unchanged — the hook runs as before.

Testing

Tested on:

  • Raspberry Pi 4 Model B
  • Debian Trixie (aarch64)
  • LXC 6.0.4
  • Kernel 6.12.75+rpt-rpi-v8
  • piVCCU3 3.85.7-98
  • HM-MOD-RPI-PCB (identified as RPI-RF-MOD)

Verified:

  • WebUI accessible (HTTP 200)
  • /var/hm_mode correctly populated with HM_MODE='NORMAL'
  • Device nodes created (/dev/raw-uart, /dev/mmd_hmip, /dev/eq3loop, /dev/mmd_bidcos)
  • lighttpd and ReGaHss start automatically
  • Fix survives service restarts
  • No impact on cgroup v1 systems (conditional check)

🤖 Generated with Claude Code

On systems using cgroup v2 (unified hierarchy), such as Debian Trixie
with LXC 6.0, the lxc.hook.start container hook does not work correctly:

1. The hook's `find /var/* -exec rm` deletes pre-generated files
2. The subsequent `mknod` calls fail due to cgroup v2 device restrictions
3. The hook aborts before writing /var/hm_mode

Without /var/hm_mode containing HM_MODE='NORMAL', lighttpd and ReGaHss
refuse to start, making the WebUI inaccessible.

This commit detects cgroup v2 by checking for
/sys/fs/cgroup/cgroup.controllers and, when present:

- Removes lxc.hook.start from the generated LXC config
- Generates /var/hm_mode and related files from the host side
  (writing to /tmp/pivccu-var which is bind-mounted as /var)
- Creates device nodes via lxc-attach after container start,
  using full paths (/bin/mknod) since PATH differs in service context

On cgroup v1 systems, behavior is unchanged — the hook runs as before.

Tested on Raspberry Pi 4, Debian Trixie, LXC 6.0.4, kernel 6.12.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant