Skip to content

Conversation

BengangY
Copy link
Contributor

No description provided.

Christian Lindig and others added 26 commits August 28, 2025 16:06
Desclare the string parameter holding unattend.xml as secret to avoid
logging it.

Signed-off-by: Christian Lindig <[email protected]>

# Conflicts:
#	ocaml/idl/datamodel_vm.ml
#	ocaml/xapi/vm_sysprep.mli
#	ocaml/xapi/xapi_vm.mli
This is simply a cherry pick of @lindig 's commit from
#6547 now that we have fixed
#6622. Let me know if
anything more is needed.

> CP-308455 VM.sysprep declare XML content as SecretString
> Desclare the string parameter holding unattend.xml as secret to avoid
logging it.
>
> Signed-off-by: Christian Lindig <[email protected]>

Conflicts:
- ocaml/idl/datamodel_vm.ml
- ocaml/xapi/vm_sysprep.mli
- ocaml/xapi/xapi_vm.mli
…onf (#6586)"

This reverts commit 05e6317, reversing
changes made to 1fbdaae.

Signed-off-by: Gabriel Buica <[email protected]>
This reverts @psafont's PR, #6586, this causing the `/etc/resolve.conf`
to be overwritten during update, after the dhclient writes it. This
happens because `set_dns` no longer checks the mode before applying the
DNS config,

This reverts commit 05e6317, reversing
changes made to 1fbdaae.
Before shutting down, xapi calls `xapi_pre_shutdown` to execute several
pre-shutdown scripts, which require the current host's UUID as a
parameter.

Currently, xapi obtains this UUID in a redundant manner:
1. It retrieves the UUID from the local inventory file.
2. It queries the database for the host's reference using the UUID.
3. It queries the database again for the host's UUID using the reference
   obtained in step 2.

Steps 2 and 3 are unnecessary since the UUID is already available from
step 1. Moreover, when the master stops, the slave fails to query the
database, increasing xapi shutdown times on the slave.

The solution is to directly use the UUID obtained in step 1, eliminating
the redundant database queries.

Signed-off-by: Bengang Yuan <[email protected]>
Absolute metric should work as follows:

    Timestamp    = 300, 600, 900, 1200
    Step         = 300 seconds
    ABSOLUTE DS  =    1,  2,   3,    4

But they do not seem to have ever worked correctly - they were previously
divided by the interval but incorrectly handled NaNs, resulting in wrong
behaviour. Then the refactoring (including 73ca3cc ("CA-404597: rrd - Pass
Gauge and Absolute data source values as-is")) broke them further.

Divide absolute metrics by the interval in process_ds_value. Fix the unit test
to expect the right behaviour - it passes with this fix.

Signed-off-by: Andrii Sultanov <[email protected]>
This is a simplified version of commit 846ce82. The feature
introduced that commit is to remove the network device renaming
functionality. When the feature is merged, there will be 3 states of
releases:
1. without the feature.
2. with the feature but renaming is still working.
3. with the feature and renaming has been removed.

Originally, the upgrade path was intended to support transitions from
state 2 to state 3 only. However, it has become necessary to also
support upgrades directly from state 1 to state 3.

This commit is to enable the release in state 1 to upgrade to state 3.
The change is kept extremely small so it can be merged independently
without waiting for the full feature to be merged.

In details, during the upgrade, the host-installer can't know the
management interface as the eth<N> will not present in the environment
the host-installer running. So it needs the MAC address to find out the
management interface for setting up network.

Signed-off-by: Ming Lu <[email protected]>
This is a simplified version of commit 846ce82. The feature
introduced that commit is to remove the network device renaming
functionality. When the feature is merged, there will be 3 states of
releases:
1. without the feature.
2. with the feature but renaming is still working.
3. with the feature and renaming has been removed.

Originally, the upgrade path was intended to support transitions from
state 2 to state 3 only. However, it has become necessary to also
support upgrades directly from state 1 to state 3.

This commit is to enable the release in state 1 to upgrade to state 3.
The change is kept extremely small so it can be merged independently
without waiting for the full feature to be merged.

In details, during the upgrade, the host-installer can't know the
management interface as the eth<N> will not present in the environment
the host-installer running. So it needs the MAC address to find out the
management interface for setting up network.
Absolute metric should work as follows:

    Timestamp    = 300, 600, 900, 1200
    Step         = 300 seconds
    ABSOLUTE DS  =    1,  2,   3,    4

But they do not seem to have ever worked correctly - they were
previously divided by the interval but incorrectly handled NaNs,
resulting in wrong behaviour. Then the refactoring (including 73ca3cc
("CA-404597: rrd - Pass Gauge and Absolute data source values as-is"))
broke them further.

Divide absolute metrics by the interval in process_ds_value. Fix the
unit test to expect the right behaviour - it passes with this fix.

---

This means absolute metrics are now correctly calculated as unit/s,
changing the behaviour in
#6640

I am not 100% sure this fix goes all the way either - unit tests should
cover expected values in RRAs as well, and the logic is still really
confusing to me.
Cross-pool migrations set scheduled_to_be_resident_on to the destination host,
reserving memory and vGPUs. If the VM is still halted when migration is
finished, the field is not cleared on the destination, preserving the
reservations even though they're not necessarily going to be used anytime soon.

Call force_state_reset_keep_current_operations in pool_migrate_complete on the
destination to clear the reservations among other things at the end of the
migration.

This fixes an issue when VMs migrated across pools in a halted state would take
up memory in xapi's view (but not in RRD's view), which is not intuitive and
could prevent further migrations from claiming enough free memory on the host.

Signed-off-by: Andrii Sultanov <[email protected]>
This metric represents the rate of VM migrations with vGPUs per second.
The total count can be calculated as AVERAGE * seconds. For example,
for one-day data granularity, the total count for one day is
AVERAGE * 86400, in which 86400 is the seconds of one day.

Signed-off-by: Ming Lu <[email protected]>
  see cgroups v2 "no internal processes" rule

  if cgroup.subtree_control is not empty, and we attach a pid
  to cgroup.procs, kernel would return EBUSY

Signed-off-by: Chunjie Zhu <[email protected]>
)

Cross-pool migrations set `scheduled_to_be_resident_on` to the
destination host, reserving memory and vGPUs. If the VM is still halted
when migration is finished, the field is not cleared on the destination,
preserving the reservations even though they're not necessarily going to
be used anytime soon.

Call `force_state_reset_keep_current_operations` in
`pool_migrate_complete` on the destination to clear the reservations
among other things at the end of the migration.

This fixes an issue when VMs migrated across pools in a halted state
would take up memory in xapi's view (but not in RRD's view), which is
not intuitive and could prevent further migrations from claiming enough
free memory on the host.

---

I've tested this manually, confirming that migrating halted and running
VMs works as designed, with the `scheduled_to_be_resident_on` field
cleared at the end of the migration (in the first case it's because of
the fix, in the second one it's because the VM is actually started on
the host)
This metric represents the rate of VM migrations with vGPUs per second.
The total count can be calculated as AVERAGE * seconds. For example,
for one-day data granularity, the total count for one day is
AVERAGE * 86400, in which 86400 is the seconds of one day.
Currently a manually disabled host will be re-enabled on toolstack restarts
and host reboots, which will provoke VM migrations in an HA cluster. If
maintenance requires many restarts, that could be painful.

To allow for keeping a host persistently disabled across toolstack
restarts and host reboots, add a new localdb flag "host_auto_enable"
(set through the parameter on Host.disable). This coexists with the internal
flag of host_disabled_until_reboot, which is only set on host poweroff
internally and cannot be controlled by the user directly.

With host_auto_enable set to false, xapi will not re-enable a host on
its own no matter what: toolstack restarts, host reboots, calls to
consider_enabling_host (triggered by PBD plugs etc.) will have no effect.
Only a manual call to Host.enable will re-enable the host.

Expose the new parameter in the CLI. Also fix up the comment in
xapi_host_helpers.mli.

Signed-off-by: Andrii Sultanov <[email protected]>
Before shutting down, xapi calls `xapi_pre_shutdown` to execute several
pre-shutdown scripts, which require the current host's UUID as a
parameter.

Currently, xapi obtains this UUID in a redundant manner:
1. It retrieves the UUID from the local inventory file.
2. It queries the database for the host's reference using the UUID.
3. It queries the database again for the host's UUID using the reference
obtained in step 2.

Steps 2 and 3 are unnecessary since the UUID is already available from
step 1. Moreover, when the master stops, the slave fails to query the
database, increasing xapi shutdown times on the slave.

The solution is to directly use the UUID obtained in step 1, eliminating
the redundant database queries.
  see cgroups v2 "no internal processes" rule

  if cgroup.subtree_control is not empty, and we attach a pid
  to cgroup.procs, kernel would return EBUSY
* Use the decoder from the OCaml standard library instead of
  our own implementation, which this patch removes.
* Validate UTF-8/XML conformance for maps and sets, in addition to
  strings.

This is XSA-474 / CVE-2025-58146.

Signed-off-by: Christian Lindig <[email protected]>
Reviewed-by: Edwin Török <[email protected]>
Signed-off-by: Andrii Sultanov <[email protected]>
Currently `Host.disable` only disables the host for the current xapi's
runtime - it will be re-enabled on toolstack restarts and host reboots
(if other conditions are met). This greatly complicates manual
maintenance in an HA cluster when reboots are needed, since VMs will
start moving to the host as soon as it's enabled.

To allow for keeping a host persistently disabled across toolstack
restarts and host reboots, add a new localdb flag `host_auto_enable`
(set through the parameter on Host.disable). This coexists with the
internal flag of `host_disabled_until_reboot`, which is only set on host
poweroff internally and cannot be controlled by the user directly.

With `host_auto_enable` set to false, xapi will not re-enable a host on
its own no matter what: toolstack restarts, host reboots, calls to
consider_enabling_host (triggered by PBD plugs etc.) will have no
effect. Only a manual call to `Host.enable` will re-enable the host.

Expose the new parameter in the CLI. Also fix up the comment in
`xapi_host_helpers.mli`.

I've verified the new functionality manually.
* Use the decoder from the OCaml standard library instead of our own
implementation, which this patch removes.
* Validate UTF-8/XML conformance for maps and sets, in addition to
strings.

This is XSA-474 / CVE-2025-58146.

Reviewed-by: Edwin Török <[email protected]>

Patch from: https://xenbits.xen.org/xsa/advisory-474.html
@BengangY BengangY marked this pull request as ready for review September 11, 2025 07:54
@BengangY BengangY merged commit 343972b into feature/dynamic-firewalld-control Sep 11, 2025
73 of 74 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants