Skip to content

vm-checkpoint task indefinitely stuck while interacting with GC #6032

Open
@ydirson

Description

@ydirson

During a vm-checkpoint on XCP-ng 8.3 (so using xcp-emu-manager), I got a case of xe vm-checkpoint never returning. According to the logs xenopsd got non-responsive but we fail to see why.
The log shows a SR GC between the checkpoint start and its failure, featuring errors of its own, involving the VDI holding the VM we're attempting to checkpoint.

[10:13 host1 ~]# date
Wed Oct  2 10:13:41 CEST 2024
[10:13 host1 ~]# xe task-list uuid=42646262-60ba-6896-c759-e86f9551ee83 params=name-label,status,progress,created
name-label ( RO)    : VM.checkpoint
        status ( RO): pending
      progress ( RO): 0.056
       created ( RO): 20241001T11:11:07Z
[10:25 host1 ~]# xe vm-list  uuid=069bf5db-5e87-0f51-322d-901f8a01a742 params=VBDs
VBDs (SRO)    : 800c783f-5511-00a9-804e-76de14e89bcf; 0047698a-8f07-6366-7d30-ccaf7f2b5293

[10:25 host1 ~]# xe vbd-list uuid=0047698a-8f07-6366-7d30-ccaf7f2b5293 params=vdi-uuid 
vdi-uuid ( RO)    : 371e7067-d032-49ec-9dd1-552e0c5c68a9

The problem seems to be manyfold:

  • possible locking issue allowing the VDI of a checkpointing VM to get involved in a GC
  • ... likely causing Failed to read from xenopsd because timeout reached. reported by emu-manager, but xenopsd does not show anything
  • XAPI not noticing that the emu-manager process it called has indeed finished with an error, and keeping the task pending

xsensource.log
daemon.log
SMlog

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions