Skip to content

Conversation

@x56Jason
Copy link

Description

This PR is to backport the dependency commits from upstream for supporting Intel new ClearwaterForest platform

Test

  • Sanity test, including Qemu build, run VM and run memory stress test in VM.
  • Run VM on EMR platform where these speculation control bits in CPUID.7.2.EDX
    are seen on host, the guest can also see these bits with the backport.
  • Run Qemu scripts/kvm/vmxcap utility in VM on EMR platform, we can see
    "user wait pause", and "tertiary processor-based controls" section can
    be seen in VM.
  • Run VM on EMR platform with "-cpu host,-vmx-enable-user-wait-pause", we
    can still see "WAITPKG instructions = true" in cupid in VM.
  • Run VM on EMR platform, with this PR, bit 6 and bit 13 of CPUID.7.0.EBX
    can be seen in VM, while without this PR, these 2 bits can't be seen in VM.
  • Run VM migration on EMR platform. Migrate VM with
    "-cpu host,tsc-freq=950000000", the destination VM can see a stable tsc,
    while without the "tsc-freq" parameter, the destination VM prints a warning
    "tsc: Marking TSC unstable due to clocksource watchdog" after migration.

AkeKoomsin-IGEL and others added 19 commits March 20, 2025 09:45
commit 33cc882 upstream.

Current QEMU can expose waitpkg to guests when it is available. However,
VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE is still not recognized and
masked by QEMU. This can lead to an unexpected situation when a L1
hypervisor wants to expose waitpkg to a L2 guest. The L1 hypervisor can
assume that VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE exists as waitpkg is
available. The L1 hypervisor then can accidentally expose waitpkg to the
L2 guest. This will cause invalid opcode exception in the L2 guest when
it executes waitpkg related instructions.

This patch adds VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE support, and
sets up dependency between the bit and CPUID_7_0_ECX_WAITPKG. QEMU should
not expose waitpkg feature if VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE is
not available to avoid unexpected invalid opcode exception in L2 guests.

Intel-SIG: commit 33cc882 target/i386: add support for VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE

Signed-off-by: Ake Koomsin <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Jason Zeng <[email protected]>
commit 0c49c91 upstream.

On parts that enumerate IA32_VMX_BASIC MSR bit as 1, any exception vector
can be delivered with or without an error code if the other consistency
checks are satisfied.

Intel-SIG: commit 0c49c91 target/i386: enumerate bit 56 of MSR_IA32_VMX_BASIC

Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Jason Zeng <[email protected]>
commit fe01af5 upstream.

The VMX feature bit depends on general availability of WAITPKG,
not the other way round.

Intel-SIG: commit fe01af5 target/i386: fix feature dependency for WAITPKG

Fixes: 33cc882 ("target/i386: add support for VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE", 2023-08-28)
Cc: [email protected]
Reviewed-by: Zhao Liu <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Jason Zeng <[email protected]>
commit c1acad9 upstream.

FRED, i.e., the Intel flexible return and event delivery architecture,
defines simple new transitions that change privilege level (ring
transitions).

The new transitions defined by the FRED architecture are FRED event
delivery and, for returning from events, two FRED return instructions.
FRED event delivery can effect a transition from ring 3 to ring 0, but
it is used also to deliver events incident to ring 0.  One FRED
instruction (ERETU) effects a return from ring 0 to ring 3, while the
other (ERETS) returns while remaining in ring 0.  Collectively, FRED
event delivery and the FRED return instructions are FRED transitions.

In addition to these transitions, the FRED architecture defines a new
instruction (LKGS) for managing the state of the GS segment register.
The LKGS instruction can be used by 64-bit operating systems that do
not use the new FRED transitions.

WRMSRNS is an instruction that behaves exactly like WRMSR, with the
only difference being that it is not a serializing instruction by
default.  Under certain conditions, WRMSRNS may replace WRMSR to improve
performance.  FRED uses it to switch RSP0 in a faster manner.

Search for the latest FRED spec in most search engines with this search
pattern:

  site:intel.com FRED (flexible return and event delivery) specification

The CPUID feature flag CPUID.(EAX=7,ECX=1):EAX[17] enumerates FRED, and
the CPUID feature flag CPUID.(EAX=7,ECX=1):EAX[18] enumerates LKGS, and
the CPUID feature flag CPUID.(EAX=7,ECX=1):EAX[19] enumerates WRMSRNS.

Add CPUID definitions for FRED/LKGS/WRMSRNS, and expose them to KVM guests.

Because FRED relies on LKGS and WRMSRNS, add that to feature dependency
map.

Intel-SIG: commit c1acad9 target/i386: add support for FRED in CPUID enumeration

Tested-by: Shan Kang <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Message-ID: <[email protected]>
[Fix order of dependencies, add dependencies from LM to FRED. - Paolo]
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Jason Zeng <[email protected]>
commit f88ddc4 upstream.

The CR4.FRED bit, i.e., CR4[32], is no longer a reserved bit when FRED
is exposed to guests, otherwise it is still a reserved bit.

Intel-SIG: commit f88ddc4 target/i386: mark CR4.FRED not reserved

Tested-by: Shan Kang <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Reviewed-by: Zhao Liu <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Jason Zeng <[email protected]>
commit 2e64187 upstream.

Report secondary vm-exit controls and the VMX controls used to
save/load FRED MSRs.

Intel-SIG: commit 2e64187 vmxcap: add support for VMX FRED controls

Tested-by: Shan Kang <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Jason Zeng <[email protected]>
commit ef202d6 upstream.

Allow VMX nested-exception support to be exposed in KVM guests, thus
nested KVM guests can enumerate it.

Intel-SIG: commit ef202d6 target/i386: enumerate VMX nested-exception support

Tested-by: Shan Kang <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Jason Zeng <[email protected]>
commit 4ebd98e upstream.

FRED CPU states are managed in 9 new FRED MSRs, in addtion to a few
existing CPU registers and MSRs, e.g., CR4.FRED and MSR_IA32_PL0_SSP.

Save/restore/migrate FRED MSRs if FRED is exposed to the guest.

Intel-SIG: commit 4ebd98e target/i386: Add get/set/migrate support for FRED MSRs

Tested-by: Shan Kang <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Jason Zeng <[email protected]>
commit a23bc65 upstream.

Macro CR4_FRED_MASK is defined twice, delete one.

Intel-SIG: commit a23bc65 target/i386: Delete duplicated macro definition CR4_FRED_MASK

Signed-off-by: Xin Li (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Jason Zeng <[email protected]>
commit 7c6ec5b upstream.

Add definitions of
  1) VM-exit activate secondary controls bit
  2) VM-entry load FRED bit
which are required to enable nested FRED.

Intel-SIG: commit 7c6ec5b target/i386: Add VMX control bits for nested FRED support

Reviewed-by: Zhao Liu <[email protected]>
Signed-off-by: Xin Li (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Jason Zeng <[email protected]>
commit ab89145 upstream.

Because the index value of the VMCS field encoding of FRED injected-event
data (one of the newly added VMCS fields for FRED transitions), 0x52, is
larger than any existing index value, raise the highest index value used
for any VMCS encoding to 0x52.

Because the index value of the VMCS field encoding of Secondary VM-exit
controls, 0x44, is larger than any existing index value, raise the highest
index value used for any VMCS encoding to 0x44.

Intel-SIG: commit ab89145 target/i386: Raise the highest index value used for any VMCS encoding

Co-developed-by: Xin Li <[email protected]>
Signed-off-by: Xin Li <[email protected]>
Signed-off-by: Lei Wang <[email protected]>
Signed-off-by: Xin Li (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Jason Zeng <[email protected]>
commit 10eaf9c upstream

Following 5 bits in CPUID.7.2.EDX are supported by KVM. Add their
supports in QEMU. Each of them indicates certain bits of IA32_SPEC_CTRL
are supported. Those bits can control CPU speculation behavior which can
be used to defend against side-channel attacks.

bit0: intel-psfd
  if 1, indicates bit 7 of the IA32_SPEC_CTRL MSR is supported. Bit 7 of
  this MSR disables Fast Store Forwarding Predictor without disabling
  Speculative Store Bypass

bit1: ipred-ctrl
  If 1, indicates bits 3 and 4 of the IA32_SPEC_CTRL MSR are supported.
  Bit 3 of this MSR enables IPRED_DIS control for CPL3. Bit 4 of this
  MSR enables IPRED_DIS control for CPL0/1/2

bit2: rrsba-ctrl
  If 1, indicates bits 5 and 6 of the IA32_SPEC_CTRL MSR are supported.
  Bit 5 of this MSR disables RRSBA behavior for CPL3. Bit 6 of this MSR
  disables RRSBA behavior for CPL0/1/2

bit3: ddpd-u
  If 1, indicates bit 8 of the IA32_SPEC_CTRL MSR is supported. Bit 8 of
  this MSR disables Data Dependent Prefetcher.

bit4: bhi-ctrl
  if 1, indicates bit 10 of the IA32_SPEC_CTRL MSR is supported. Bit 10
  of this MSR enables BHI_DIS_S behavior.

Intel-SIG: 10eaf9c target/i386: Add more features enumerated by CPUID.7.2.EDX

Signed-off-by: Chao Gao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Jason Zeng <[email protected]>
commit 8dee384 upstream.

This allows modifying the bits in "-cpu max"/"-cpu host" depending on
the guest CPU vendor (which, at least by default, is the host vendor in
the case of KVM).

For example, machine check architecture differs between Intel and AMD,
and bits from AMD should be dropped when configuring the guest for
an Intel model.

Intel-SIG: commit 8dee384 target/i386: pass X86CPU to x86_cpu_get_supported_feature_word

Cc: Xiaoyao Li <[email protected]>
Cc: John Allen <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Jason Zeng <[email protected]>
commit a3b5376 upstream.

No such constraint that subleaf index needs to be less than 64.

Intel-SIG: commit a3b5376 i386/cpuid: Remove subleaf constraint on CPUID leaf 1F

Signed-off-by: Xiaoyao Li <[email protected]>
Reviewed-by:Yang Weijiang <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Jason Zeng <[email protected]>
commit 00c8a93 upstream.

Currently, QEMU always constructs a all-zero CPUID entry for
CPUID[0xD 0x3f].

It's meaningless to construct such a leaf as the end of leaf 0xD. Rework
the logic of how subleaves of 0xD are constructed to get rid of such
all-zero value of subleaf 0x3f.

Intel-SIG: commit 00c8a93 target/i386: Don't construct a all-zero entry for CPUID[0xD 0x3f]

Signed-off-by: Xiaoyao Li <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Jason Zeng <[email protected]>
commit 7dddc3b upstream.

- CPUID.(EAX=07H,ECX=0H):EBX[bit 6]: x87 FPU Data Pointer updated only
  on x87 exceptions if 1.

- CPUID.(EAX=07H,ECX=0H):EBX[bit 13]: Deprecates FPU CS and FPU DS
  values if 1. i.e., X87 FCS and FDS are always zero.

Define names for them so that they can be exposed to guest with -cpu host.

Also define the bit field MACROs so that named cpu models can add it as
well in the future.

Intel-SIG: commit 7dddc3b target/i386: Enable fdp-excptn-only and zero-fcs-fds

Signed-off-by: Xiaoyao Li <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Jason Zeng <[email protected]>
commit 5ab6391 upstream.

When times == 1, the CPUID leaf 2 is not stateful.

Intel-SIG: commit 5ab6391 target/i386: Construct CPUID 2 as stateful iff times > 1

Signed-off-by: Xiaoyao Li <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Jason Zeng <[email protected]>
commit 87c88db upstream.

When user sets tsc-frequency explicitly, the invtsc feature is actually
migratable because the tsc-frequency is supposed to be fixed during the
migration.

See commit d99569d ("kvm: Allow invtsc migration if tsc-khz
is set explicitly") for referrence.

Intel-SIG: commit 87c88db target/i386: Make invtsc migratable when user sets tsc-khz explicitly

Signed-off-by: Xiaoyao Li <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Jason Zeng <[email protected]>
commit 93dcc93 upstream.

Intel-SIG: commit 93dcc93 target/i386/cpu: Fix notes for CPU models

Fixes: 644e3c5 ("missing vmx features for Skylake-Server and Cascadelake-Server")
Signed-off-by: Han Han <[email protected]>
Reviewed-by: Chenyi Qiang <[email protected]>
Reviewed-by: Michael Tokarev <[email protected]>
Signed-off-by: Michael Tokarev <[email protected]>
Signed-off-by: Jason Zeng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants