Skip to content

Inherited File Descriptor Issue When Checkpointing Processes in Wayland/Hyprland Sessions #2951

@Abdullah-Badawy1

Description

@Abdullah-Badawy1

Hi
I've been exploring CRIU and ran into an interesting real-world problem while trying to checkpoint a simple Python HTTP server running inside a Hyprland (Wayland compositor) session. I'd like to describe the issue and discuss whether it represents an opportunity for improvement in CRIU.

What I Was Doing

I started a Python HTTP server and attempted to checkpoint and restore it using CRIU:

python3 -m http.server 8080 &
SERVER_PID=$!

sudo criu dump \
    -t $SERVER_PID \
    -D /tmp/webserver-ckpt \
    --shell-job \
    --display-stats

sudo criu restore \
    -D /tmp/webserver-ckpt \
    --shell-job \
    -d

The Error I Got

The restore failed with:

504149: Error (criu/files-reg.c:2175): File run/user/1000/hypr/
4b07770b9ef1cceb2e6f56d33538aaffb9186b9c_1773096523_1849152812/
hyprland.log has bad size 5230917 (expect 5222646)
504149: Error (criu/files.c:1221): Unable to open fd=8 id=0x24
Error (criu/cr-restore.c:2324): Restoring FAILED.

Root Cause Analysis

After investigating, I traced the problem to Linux's file descriptor inheritance model. When the Python server was launched from a terminal inside a Hyprland session, it silently inherited open file descriptors from its parent processes, including Hyprland's active runtime log file at /run/user/1000/hypr/.../hyprland.log.

The process tree looked like this:

Hyprland (compositor) — has hyprland.log open for writing
    └── terminal emulator
            └── shell
                    └── python3 -m http.server  ← what we dumped
                            └── fd 8 → hyprland.log (inherited)

CRIU correctly recorded the file size at dump time (5,222,646 bytes). However, because Hyprland continued actively writing to its log between the dump and restore operations, the file had grown to 5,230,917 bytes by restore time — a difference of about 8KB. CRIU's file validation caught this mismatch and hard-failed the restore.

Why This Matters

This issue is not specific to Hyprland. It affects any process checkpointed inside an active desktop session on Wayland or X11, where compositor and session manager log files are routinely inherited by child processes. As CRIU adoption grows in developer workflows and desktop environments, this will become an increasingly common pain point. Currently there is no documentation warning users about this, and the error message CRIU produces does not suggest a clear resolution path.

My Question

I'd love to get your perspective on whether this is worth pursuing as a proper fix or improvement in CRIU, and which angle — smarter fd classification, better error messaging, or documentation — would be most valuable and feasible.

Thank you for your time.

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions