Skip to content

Conversation

@BengangY
Copy link
Contributor

@BengangY BengangY commented Jul 8, 2025

The user attempted to designate a new master, but the operation failed.

The root cause is as follows:
After the new proposed master successfully sent the commit_new_master API call to the old master, it attempted to send a logout request. However, at this point, the old master was already rebooting its xapi service, causing the logout to fail. As a result, the process of designating the new master was marked as failed, and the status changed to broken. In high-load environments, there can be a delay in sending the logout request, increasing the likelihood that it is sent after the old master has already started rebooting.

If commit_new_master has already been successful, the success of the subsequent logout operation should not be considered critical. Therefore, the solution is to ignore the result of the logout request if commit_new_master was successful.

@lindig
Copy link
Contributor

lindig commented Jul 8, 2025

I like the clear writeup.

@BengangY BengangY force-pushed the private/bengangy/CA-413412 branch from 7ec16e4 to 1e21df9 Compare July 9, 2025 01:44
with _ ->
(* This is an emergency mode function, so we don't care about the error
in logout *)
debug "%s: The logout failed in emergency mode function" __FUNCTION__ ;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an anonymous function, thi means that using FUNCTION here will deliver a strange name, I recommend defining a new binding before `finally is called:

let __FUN = __FUNCTION__ in

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log is not too ugly - this is from an anonymous function defined in a named function:

Jul  7 14:16:49 eu1-dt034 xapi: [debug||2965 /var/lib/xcp/xapi|VM.sysprep R:0023873edb88|Vm_sysprep] Vm_sysprep.trigger.(fun): notified domain 54

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add a mechanism to add the function name automatically. I always forget to add it...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would require a PPX because you can't refer to the calling function in the implementation of the logging function. I would be surprised if?(fun=__FUNCTION__) would work.

The user attempted to designate a new master, but the operation failed.

The root cause is as follows:
After the new proposed master successfully sent the `commit_new_master`
API call to the old master, it attempted to send a `logout` request.
However, at this point, the old master was already rebooting its xapi
service, causing the `logout` to fail. As a result, the process of
designating the new master was marked as failed, and the status changed
to `broken`. In high-load environments, there can be a delay in sending
the logout request, increasing the likelihood that it is sent after the
old master has already started rebooting.

If `commit_new_master` has already been successful, the success of the
subsequent `logout` operation should not be considered critical.
Therefore, the solution is to ignore the result of the `logout` request
if `commit_new_master` was successful.

Signed-off-by: Bengang Yuan <[email protected]>
@BengangY BengangY force-pushed the private/bengangy/CA-413412 branch from 1e21df9 to 12826e8 Compare July 9, 2025 09:08
@BengangY BengangY marked this pull request as ready for review July 9, 2025 09:08
@BengangY BengangY added this pull request to the merge queue Jul 9, 2025
Merged via the queue into xapi-project:master with commit 278ab35 Jul 9, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants