Skip to content

Comments

i386,amd64: Explicitly set ECX=0 in do_cpuid() to be future-proof#2027

Closed
liweitianux wants to merge 1 commit intofreebsd:mainfrom
liweitianux:fix/do_cpuid
Closed

i386,amd64: Explicitly set ECX=0 in do_cpuid() to be future-proof#2027
liweitianux wants to merge 1 commit intofreebsd:mainfrom
liweitianux:fix/do_cpuid

Conversation

@liweitianux
Copy link
Contributor

@liweitianux liweitianux commented Feb 16, 2026

In principle, do_cpuid() should only be used for CPUID leaves without
sub-leaves. Even accessing sub-leaf zero (ECX=0), one must use
cpuid_count(ax, 0) rather than cpuid(ax).

However, one might assume do_cpuid(ax) is equivalent to
cpuid_count(ax, 0), but the old do_cpuid() did not initialize ECX before
executing the CPUID instruction. If ECX contained a non-zero value, the
instruction could return unexpected results, potentially leading to
subtle and hard-to-debug issues, especially in ported code.

To be future-proof and to help port code, adjust do_cpuid(ax) to be
cpuid_count(ax, 0) to explicitly set ECX=0.

It's believed that this change does not fix any real bugs in FreeBSD.

See also the DragonFly commit:
DragonFlyBSD/DragonFlyBSD@0087a1d

@github-actions
Copy link

github-actions bot commented Feb 16, 2026

Thank you for taking the time to contribute to FreeBSD!

All issues resolved.

@kostikbel
Copy link
Member

BTW, can you point out the locations in code where we use do_cpuid() for CPUID leaves with sub-leafs, instead of using cpuid_count()? I think that such places must be fixed.

I do not object against the change, but I think that it does not fix real bugs, unless you can provide examples from above.

@liweitianux
Copy link
Contributor Author

liweitianux commented Feb 18, 2026

BTW, can you point out the locations in code where we use do_cpuid() for CPUID leaves with sub-leafs, instead of using cpuid_count()? I think that such places must be fixed.

I do not object against the change, but I think that it does not fix real bugs, unless you can provide examples from above.

Yes I understand. It also surprised me when tuxillo showed me the do_cpuid() bug when we were investigating the NVMM+UEFI bug. It turned out the do_cpuid() change indeed fixed the FreeBSD boot panic in NVMM on Intel CPUs, as explained in my DragonFly commit:

This issue was observed on Intel CPUs when booting a FreeBSD 14.x/15.x
ISO under NVMM, where it caused a general protection fault (#GP) shortly
after the FreeBSD kernel was loaded:

qemu-system-x86_64: NVMM: Mem Assist Failed [gpa=0xbfff8]
qemu-system-x86_64: NVMM: Failed to execute a VCPU.
Abort trap (core dumped)

It occurred when NVMM tried to handle the reading of
IA32_ARCH_CAPABILITIES MSR but the second do_cpuid() returned the
incorrect results indicating that the MSR was unavailable.

The relevant code in NVMM is:

https://github.com/DragonFlyBSD/DragonFlyBSD/blob/0087a1d163488a57787a9a6431dd94070b1988d4/sys/dev/virtual/nvmm/x86/nvmm_x86_vmx.c#L1900-L1909

		if (exit->u.rdmsr.msr == MSR_IA32_ARCH_CAPABILITIES) {
			cpuid_desc_t descs;
			x86_get_cpuid(0x00000000, &descs);
			if (descs.eax < 7) {
				goto error;
			}
			x86_get_cpuid(0x00000007, &descs);
			if (!(descs.edx & CPUID_0_07_EDX_ARCH_CAP)) {
				goto error;  // XXX: incorrectly hit this
			}
			...

The second x86_get_cpuid() returned incorrect result and the code goto'ed error.

So we've been quite lucky with the old do_cpuid() :)

@liweitianux
Copy link
Contributor Author

liweitianux commented Feb 18, 2026

BTW, can you point out the locations in code where we use do_cpuid() for CPUID leaves with sub-leafs, instead of using cpuid_count()? I think that such places must be fixed.

Oh, I see your point. However, I think it's an extra burden to verify whether a CPUID leaf has sub-leaves. I think it also prevents misusing by explicitly setting ECX (sub-leaf) to 0.

@liweitianux
Copy link
Contributor Author

BTW, can you point out the locations in code where we use do_cpuid() for CPUID leaves with sub-leafs, instead of using cpuid_count()? I think that such places must be fixed.

Oh, I see your point. However, I think it's an extra burden to verify whether a CPUID leaf has sub-leaves. I think it also prevents misusing by explicitly setting ECX (sub-leaf) to 0.

I mean, I might simply assume do_cpuid(ax) == cpuid_count(ax, 0) and use the former for such CPUID leaves (such as the NVMM case), and that would lead to hard-to-debug bugs in the future. 😃

@kostikbel
Copy link
Member

IOW, this was some porting bug.

Another note, why re-ordering the functions? Make the patch where only do_cpuid() is changed to call cpuid_count(), leaving the functions in the place (AKA diff reduction).

@liweitianux
Copy link
Contributor Author

liweitianux commented Feb 18, 2026

IOW, this was some porting bug.

Yep. I agree with you. I'll update NVMM to be completely correct :)

Another note, why re-ordering the functions? Make the patch where only do_cpuid() is changed to call cpuid_count(), leaving the functions in the place (AKA diff reduction).

They're static inline functions, so do_cpuid() must be moved after cpuid_count() to use it.

@kostikbel
Copy link
Member

I suggest to adjust the commit message to note that this is not a fix for existing bug in FreeBSD, but a porting aid.

@liweitianux liweitianux changed the title i386,amd64: Fix do_cpuid() to explicitly set ECX=0 i386,amd64: Explicitly set ECX=0 in do_cpuid() to be future-proof Feb 19, 2026
In principle, do_cpuid() should only be used for CPUID leaves without
sub-leaves.  Even accessing sub-leaf zero (ECX=0), one must use
cpuid_count(ax, 0) rather than cpuid(ax).

However, one might assume do_cpuid(ax) is equivalent to
cpuid_count(ax, 0), but the old do_cpuid() did not initialize ECX before
executing the CPUID instruction.  If ECX contained a non-zero value, the
instruction could return unexpected results, potentially leading to
subtle and hard-to-debug issues, especially in ported code.

To be future-proof and to help port code, adjust do_cpuid(ax) to be
cpuid_count(ax, 0) to explicitly set ECX=0.

It's believed that this change does not fix any real bugs in FreeBSD.

See also the DragonFly commit:
DragonFlyBSD/DragonFlyBSD@0087a1d

Signed-off-by: Aaron LI <aly@aaronly.me>
@liweitianux
Copy link
Contributor Author

I suggest to adjust the commit message to note that this is not a fix for existing bug in FreeBSD, but a porting aid.

Hi. I've amended the commit message as you suggested. Please have another look. Thank you.

@emaste emaste closed this Feb 20, 2026
@emaste emaste added the merged Closed commit that's been merged label Feb 20, 2026
freebsd-git pushed a commit that referenced this pull request Feb 20, 2026
In principle, do_cpuid() should only be used for CPUID leaves without
sub-leaves.  Even accessing sub-leaf zero (ECX=0), one must use
cpuid_count(ax, 0) rather than cpuid(ax).

However, one might assume do_cpuid(ax) is equivalent to
cpuid_count(ax, 0), but the old do_cpuid() did not initialize ECX before
executing the CPUID instruction.  If ECX contained a non-zero value, the
instruction could return unexpected results, potentially leading to
subtle and hard-to-debug issues, especially in ported code.

To be future-proof and to help port code, adjust do_cpuid(ax) to be
cpuid_count(ax, 0) to explicitly set ECX=0.

It's believed that this change does not fix any real bugs in FreeBSD.

See also the DragonFly commit:
DragonFlyBSD/DragonFlyBSD@0087a1d

Signed-off-by: Aaron LI <aly@aaronly.me>
Reviewed by: kib
Pull Request: #2027
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merged Closed commit that's been merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants