AMD Venice MCA phase-1 patches for VeLinux (6.6 kernel) #110

mohanasv2 · 2026-01-27T05:36:34Z

AMD Venice MCA phase-1 patches for VeLinux (6.6 kernel)

x86/mce: Don't remove sysfs if thresholding sysfs init fails
x86/mce: Remove old CMCI storm mitigation code
x86/mce: Add per-bank CMCI storm mitigation
x86/mce: Ensure user polling settings are honored when restarting timer
x86/mce/amd: Add default names for MCA banks and blocks
x86/mce/amd: Fix threshold limit reset
x86/mce/amd: Rename threshold restart function
x86/mce/amd: Remove return value for mce_threshold_{create,remove}_device()
x86/mce/amd: Remove smca_banks_map
x86/mce/amd: Remove shared threshold bank plumbing
x86/mce/amd: Put list_head in threshold_bank
x86/mce: Remove __mcheck_cpu_init_early()
x86/mce: Set CR4.MCE last during init
x86/mce: Define BSP-only init
x86/mce: Define BSP-only SMCA init
x86/MCE/AMD: Split amd_mce_is_memory_error()
x86/mce: Define amd_mce_usable_address()
x86/mce: Cleanup mce_usable_address()
x86/mce: Remove redundant check from mce_device_create()
x86/mce: Dynamically size space for machine check records
x86/mce: Clean up TP_printk() output line of the 'mce_record' tracepoint
tracing: Add the ::ppin field to the mce_record tracepoint
tracing: Add the ::microcode field to the mce_record tracepoint
x86/mce: Switch to new Intel CPU model defines
x86/mce: Remove unused variable and return value in machine_check_poll()
x86/mce: Rename mce_setup() to mce_prep_record()
x86/mce: Define mce_prep_record() helpers for common and per-CPU fields
x86/mce: Use mce_prep_record() helpers for apei_smca_report_x86_error()
x86/mce: Add wrapper for struct mce to export vendor specific info
x86/mce: Make several functions return bool
x86/mce: Make four functions return bool
x86/mce: Break up __mcheck_cpu_apply_quirks()
x86/mce: Do 'UNKNOWN' vendor check early
x86/mce: Cleanup bank processing on init
x86/cpu/intel: Replace PAT erratum model/family magic numbers with symbolic IFM references
x86/mce: Convert family/model mixed checks to VFM-based checks
x86/mce: Separate global and per-CPU quirks
x86/mce: Move machine_check_poll() status checks to helper functions
x86/MCE/AMD: Add support for new MCA_SYND{1,2} registers
x86/msr: Rename 'mce_rdmsrl()' to 'mce_rdmsrq()'
x86/msr: Rename 'mce_wrmsrl()' to 'mce_wrmsrq()'
x86/mce: Add a clear_bank() helper

The Venice MCA backport to VeLinux includes 50+ patches, planned for integration in two phases. Phase 1 consists of 42 patches, which have been submitted as part of this PR. The remaining patches will be submitted in the next phase.

This patch series contains 42 commits that modernize and refactor x86 Machine Check Exception (MCE) handling across Intel and AMD platforms. The changes improve error reporting, tracepoint exposure, helper abstractions, vendor-specific quirk handling, and initialization robustness, while aligning with upstream kernel conventions in naming, readability, and sysfs infrastructure.

Key Bug Fixes
- Bank initialization cleanup: Unified bank preparation into __mcheck_cpu_init_prepare_banks(), removing redundant flags and ensuring vendor settings apply before polling.
- Return type consistency: Converted multiple functions to return bool instead of 0/1 for clarity and correctness.
- Redundant checks removed: Eliminated unnecessary MCA support checks in mce_device_create().
- Timeout/error handling: Simplified machine_check_poll() by removing unused variables and return values after CMCI storm rework.
Feature Additions
- New AMD registers: Added support for MCA_SYND1 and MCA_SYND2 on Zen4 systems, exporting supplemental error info (e.g., FRU text).
- Tracepoint extensions:
- Added ::microcode field to record active microcode revision.
- Added ::ppin field to expose Protected Processor Inventory Number.
- Cleaned up TP_printk() output for better readability.
- Wrapper struct: Introduced mce_hw_err to encapsulate struct mce, preventing UAPI bloat and enabling vendor-specific extensions.
- Dynamic buffer sizing: Allocated machine check record space based on CPU count, scaling beyond the historical fixed buffer.
Logic & Performance Improvements
- Helper abstractions:
- Split mce_prep_record() into common and per-CPU helpers.
- Defined clear_bank() and status-check helpers for vendor-specific actions.
- Split amd_mce_is_memory_error() into legacy and SMCA-specific helpers.
- Quirk handling:
- Separated global vs per-CPU quirks.
- Moved “UNKNOWN vendor” check to BSP-only init.
- Broke up __mcheck_cpu_apply_quirks() into vendor-specific helpers.
- Naming consistency:
- Renamed mce_setup() → mce_prep_record().
- Renamed MSR accessors mce_wrmsrl() → mce_wrmsrq() and mce_rdmsrl() → mce_rdmsrq().
- Intel errata handling: Replaced magic family/model numbers with symbolic IFM macros for PAT erratum checks.
Robustness & Safety
- Initialization resilience:
- BSP-only SMCA init ensures handlers are set once per system.
- Vendor-specific quirks applied consistently across CPUs.
- Error address usability:
- Defined amd_mce_usable_address() for AMD-specific validation.
- Cleaned up mce_usable_address() with Intel-specific helpers.

Unit Test:

NOTE: The test cases listed below will work only after both Phase 1 and Phase 2 MCA patches are merged. Therefore, these test cases should be executed only after all Venice MCA patches have been integrated.

Without patches:

dmesg | grep -i mce
[ 0.000000] Linux version 6.6.95-base-mce+ (amd@host) (gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) [Intel-SIG] backporting KVM: x86: Advertise AVX10.1 CPUID to userspace #59 SMP PREEMPT_DYNAMIC Mon Jan 12 16:22:36 IST 2026
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.6.95-base-mce+ root=UUID=7ee00f0d-484a-4038-bdea-ab9e659efaa5 ro quiet
[ 0.024602] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.6.95-base-mce+ root=UUID=7ee00f0d-484a-4038-bdea-ab9e659efaa5 ro quiet
[ 0.843764] BOOT_IMAGE=/boot/vmlinuz-6.6.95-base-mce+
[ 1.009275] usb usb1: Manufacturer: Linux 6.6.95-base-mce+ ehci_hcd
[ 1.011461] usb usb2: Manufacturer: Linux 6.6.95-base-mce+ xhci-hcd
[ 1.011585] usb usb3: Manufacturer: Linux 6.6.95-base-mce+ xhci-hcd
[ 1.012162] usb usb4: Manufacturer: Linux 6.6.95-base-mce+ xhci-hcd
[ 1.012250] usb usb5: Manufacturer: Linux 6.6.95-base-mce+ xhci-hcd
[ 1.013054] usb usb6: Manufacturer: Linux 6.6.95-base-mce+ xhci-hcd
[ 1.013129] usb usb7: Manufacturer: Linux 6.6.95-base-mce+ xhci-hcd
[ 2.213820] MCE: In-kernel MCE decoding enabled.
With patch:

dmesg | grep -i mce
[ 0.000000] Linux version 6.6.95-mce-full+ (amd@host) (gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) Ve5.15 brbe #57 SMP PREEMPT_DYNAMIC Mon Jan 12 15:16:20 IST 2026
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.6.95-mce-full+ root=UUID=7ee00f0d-484a-4038-bdea-ab9e659efaa5 ro quiet
[ 0.024368] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.6.95-mce-full+ root=UUID=7ee00f0d-484a-4038-bdea-ab9e659efaa5 ro quiet
[ 0.533450] mce: HEST corrected error threshold limit: 10
[ 0.844504] BOOT_IMAGE=/boot/vmlinuz-6.6.95-mce-full+
[ 1.009294] usb usb1: Manufacturer: Linux 6.6.95-mce-full+ ehci_hcd
[ 1.011542] usb usb2: Manufacturer: Linux 6.6.95-mce-full+ xhci-hcd
[ 1.011663] usb usb3: Manufacturer: Linux 6.6.95-mce-full+ xhci-hcd
[ 1.012220] usb usb4: Manufacturer: Linux 6.6.95-mce-full+ xhci-hcd
[ 1.012309] usb usb5: Manufacturer: Linux 6.6.95-mce-full+ xhci-hcd
[ 1.013122] usb usb6: Manufacturer: Linux 6.6.95-mce-full+ xhci-hcd
[ 1.013202] usb usb7: Manufacturer: Linux 6.6.95-mce-full+ xhci-hcd
[ 2.066804] MCE: In-kernel MCE decoding enabled.

commit 4c113a5b28bfd589e2010b5fc8867578b0135ed7 upstream Currently, the MCE subsystem sysfs interface will be removed if the thresholding sysfs interface fails to be created. A common failure is due to new MCA bank types that are not recognized and don't have a short name set. The MCA thresholding feature is optional and should not break the common MCE sysfs interface. Also, new MCA bank types are occasionally introduced, and updates will be needed to recognize them. But likewise, this should not break the common sysfs interface. Keep the MCE sysfs interface regardless of the status of the thresholding sysfs interface. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250624-wip-mca-updates-v4-1-236dd74f645f@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit 3ed57b4 upstream When a "storm" of corrected machine check interrupts (CMCI) is detected this code mitigates by disabling CMCI interrupt signalling from all of the banks owned by the CPU that saw the storm. There are problems with this approach: 1) It is very coarse grained. In all likelihood only one of the banks was generating the interrupts, but CMCI is disabled for all. This means Linux may delay seeing and processing errors logged from other banks. 2) Although CMCI stands for Corrected Machine Check Interrupt, it is also used to signal when an uncorrected error is logged. This is a problem because these errors should be handled in a timely manner. Delete all this code in preparation for a finer grained solution. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Tested-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lore.kernel.org/r/20231115195450.12963-2-tony.luck@intel.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit 7eae17c upstream This is the core functionality to track CMCI storms at the machine check bank granularity. Subsequent patches will add the vendor specific hooks to supply input to the storm detection and take actions on the start/end of a storm. machine_check_poll() is called both by the CMCI interrupt code, and for periodic polls from a timer. Add a hook in this routine to maintain a bitmap history for each bank showing whether the bank logged an corrected error or not each time it is polled. In normal operation the interval between polls of these banks determines how far to shift the history. The 64 bit width corresponds to about one second. When a storm is observed a CPU vendor specific action is taken to reduce or stop CMCI from the bank that is the source of the storm. The bank is added to the bitmap of banks for this CPU to poll. The polling rate is increased to once per second. During a storm each bit in the history indicates the status of the bank each time it is polled. Thus the history covers just over a minute. Declare a storm for that bank if the number of corrected interrupts seen in that history is above some threshold (defined as 5 in this series, could be tuned later if there is data to suggest a better value). A storm on a bank ends if enough consecutive polls of the bank show no corrected errors (defined as 30, may also change). That calls the CPU vendor specific function to revert to normal operational mode, and changes the polling rate back to the default. [ bp: Massage. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231115195450.12963-3-tony.luck@intel.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit 00c092de6f28ebd32208aef83b02d61af2229b60 upstream Users can disable MCA polling by setting the "ignore_ce" parameter or by setting "check_interval=0". This tells the kernel to *not* start the MCE timer on a CPU. If the user did not disable CMCI, then storms can occur. When these happen, the MCE timer will be started with a fixed interval. After the storm subsides, the timer's next interval is set to check_interval. This disregards the user's input through "ignore_ce" and "check_interval". Furthermore, if "check_interval=0", then the new timer will run faster than expected. Create a new helper to check these conditions and use it when a CMCI storm ends. [ bp: Massage. ] Fixes: 7eae17c ("x86/mce: Add per-bank CMCI storm mitigation") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250624-wip-mca-updates-v4-2-236dd74f645f@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit d66e1e90b16055d2f0ee76e5384e3f119c3c2773 upstream Ensure that sysfs init doesn't fail for new/unrecognized bank types or if a bank has additional blocks available. Most MCA banks have a single thresholding block, so the block takes the same name as the bank. Unified Memory Controllers (UMCs) are a special case where there are two blocks and each has a unique name. However, the microarchitecture allows for five blocks. Any new MCA bank types with more than one block will be missing names for the extra blocks. The MCE sysfs will fail to initialize in this case. Fixes: 87a6d40 ("x86/mce/AMD: Update sysfs bank names for SMCA systems") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250624-wip-mca-updates-v4-3-236dd74f645f@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit 5f6e3b720694ad771911f637a51930f511427ce1 upstream The MCA threshold limit must be reset after servicing the interrupt. Currently, the restart function doesn't have an explicit check for this. It makes some assumptions based on the current limit and what's in the registers. These assumptions don't always hold, so the limit won't be reset in some cases. Make the reset condition explicit. Either an interrupt/overflow has occurred or the bank is being initialized. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250624-wip-mca-updates-v4-4-236dd74f645f@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit 9af8b441cf6953f683b825fbf241a979ea7521e8 upstream It operates per block rather than per bank. So rename it for clarity. No functional changes. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250624-wip-mca-updates-v4-5-236dd74f645f@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

…vice() commit 4d2161b9e8ba64076f520ec2f00eefb00722c15e upstream The return values are not checked, so set return type to 'void'. Also, move function declarations to internal.h, since these functions are only used within the MCE subsystem. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/20250624-wip-mca-updates-v4-6-236dd74f645f@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit b249288abde5190bb113ea5acef8af4ceac4957c upstream The MCx_MISC0[BlkPtr] field was used on legacy systems to hold a register offset for the next MCx_MISC* register. In this way, an implementation-specific number of registers can be discovered at runtime. The MCAX/SMCA register space simplifies this by always including the MCx_MISC[1-4] registers. The MCx_MISC0[BlkPtr] field is used to indicate (true/false) whether any MCx_MISC[1-4] registers are present. Currently, MCx_MISC0[BlkPtr] is checked early and cached to be used during sysfs init later. This is unnecessary as the MCx_MISC0 register is read again later anyway. Remove the smca_banks_map variable as it is effectively redundant, and use a direct register/bit check instead. [ bp: Zap smca_get_block_address() too. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250825-wip-mca-updates-v5-3-865768a2eef8@amd.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit d35fb3121a36170bba951c529847a630440e4174 upstream Legacy AMD systems include an integrated Northbridge that is represented by MCA bank 4. This is the only non-core MCA bank in legacy systems. The Northbridge is physically shared by all the CPUs within an AMD "Node". However, in practice the "shared" MCA bank can only by managed by a single CPU within that AMD Node. This is known as the "Node Base Core" (NBC). For example, only the NBC will be able to read the MCA bank 4 registers; they will be Read-as-Zero for other CPUs. Also, the MCA Thresholding interrupt will only signal the NBC; the other CPUs will not receive it. This is enforced by hardware, and it should not be managed by software. The current AMD Thresholding code attempts to deal with the "shared" MCA bank by micromanaging the bank's sysfs kobjects. However, this does not follow the intended kobject use cases. It is also fragile, and it has caused bugs in the past. Modern AMD systems do not need this shared MCA bank support, and it should not be needed on legacy systems either. Remove the shared threshold bank code. Also, move the threshold struct definitions to mce/amd.c, since they are no longer needed in amd_nb.c. [Backport Changes] 1. In arch/x86/include/asm/amd_nb.h, the upstream patch removes the refcount.h include, but this header is already removed in the current source tree. Therefore, the removal step was skipped since the expected change is already reflected in the existing code. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20241206161210.163701-2-yazen.ghannam@amd.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit c4bac5c640e3782bf30c07c4d82042d0202fe224 upstream The threshold_bank structure is a container for one or more threshold_block structures. Currently, the container has a single pointer to the 'first' threshold_block structure which then has a linked list of the remaining threshold_block structures. This results in an extra level of indirection where the 'first' block is checked before iterating over the remaining blocks. Remove the indirection by including the head of the block list in the threshold_bank structure which already acts as a container for all the bank's thresholding blocks. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250624-wip-mca-updates-v4-8-236dd74f645f@amd.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit 9f34032ec0deef58bd0eb7475f1981adfa998648 upstream The __mcheck_cpu_init_early() function was introduced so that some vendor-specific features are detected before the first MCA polling event done in __mcheck_cpu_init_generic(). Currently, __mcheck_cpu_init_early() is only used on AMD-based systems and additional code will be needed to support various system configurations. However, the current and future vendor-specific code should be done during vendor init. This keeps all the vendor code in a common location and simplifies the generic init flow. Move all the __mcheck_cpu_init_early() code into mce_amd_feature_init(). Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250825-wip-mca-updates-v5-6-865768a2eef8@amd.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit cfffcf97997bd35f4a59e035523d1762568bdbad upstream Set the CR4.MCE bit as the last step during init. This brings the MCA init order closer to what is described in the x86 docs. x86 docs: AMD Intel MCG_CTL MCA_CONFIG MCG_EXT_CTL MCi_CTL MCi_CTL MCG_CTL CR4.MCE CR4.MCE Current Linux: AMD Intel CR4.MCE CR4.MCE MCG_CTL MCG_CTL MCA_CONFIG MCG_EXT_CTL MCi_CTL MCi_CTL Updated Linux: AMD Intel MCG_CTL MCG_CTL MCA_CONFIG MCG_EXT_CTL MCi_CTL MCi_CTL CR4.MCE CR4.MCE The new init flow will match Intel's docs, but there will still be a mismatch for AMD regarding MCG_CTL. However, there is no known issue with this ordering, so leave it for now. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/20250908-wip-mca-updates-v6-0-eef5d6c74b9c@amd.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit 669ce4984b729ad5b4c6249d4a8721ae52398bfb upstream Currently, MCA initialization is executed identically on each CPU as they are brought online. However, a number of MCA initialization tasks only need to be done once. Define a function to collect all 'global' init tasks and call this from the BSP only. Start with CPU features. [Backport Changes] 1. In file arch/x86/kernel/cpu/mce/core.c, within the newly added function mca_bsp_init(), the call to rdmsrq() was replaced with the existing equivalent call rdmsrl() because the upstream commit c435e608cf59f that globally renamed rdmsrl() to rdmsrq() is not available yet in the current source tree. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250908-wip-mca-updates-v6-0-eef5d6c74b9c@amd.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit c6e465b8d45a1bc717d196ee769ee5a9060de8e2 upstream Currently, on AMD systems, MCA interrupt handler functions are set during CPU init. However, the functions only need to be set once for the whole system. Assign the handlers only during BSP init. Do so only for SMCA systems to maintain the old behavior for legacy systems. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/20250908-wip-mca-updates-v6-0-eef5d6c74b9c@amd.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit 495a91d upstream Define helper functions for legacy and SMCA systems in order to reuse individual checks in later changes. Describe what each function is checking for, and correct the XEC bitmask for SMCA. No functional change intended. [ bp: Use "else in amd_mce_is_memory_error() to make the conditional balanced, for readability. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com> Link: https://lore.kernel.org/r/20230613141142.36801-2-yazen.ghannam@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit 48da1ad upstream Currently, all valid MCA_ADDR values are assumed to be usable on AMD systems. However, this is not correct in most cases. Notifiers expecting usable addresses may then operate on inappropriate values. Define a helper function to do AMD-specific checks for a usable memory address. List out all known cases. [ bp: Tone down the capitalized words. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230613141142.36801-3-yazen.ghannam@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit 1bae0cf upstream Move Intel-specific checks into a helper function. Explicitly use "bool" for return type. No functional change intended. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230613141142.36801-4-yazen.ghannam@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit 612905e upstream mce_device_create() is called only from mce_cpu_online() which in turn will be called iff MCA support is available. That is, at the time of mce_device_create() call it's guaranteed that MCA support is available. No need to duplicate this check so remove it. [ bp: Massage commit message. ] Signed-off-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231107165529.407349-1-nik.borisov@suse.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit 108c649 upstream Systems with a large number of CPUs may generate a large number of machine check records when things go seriously wrong. But Linux has a fixed-size buffer that can only capture a few dozen errors. Allocate space based on the number of CPUs (with a minimum value based on the historical fixed buffer that could store 80 records). [ bp: Rename local var from tmpp to something more telling: gpool. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Reviewed-by: Avadhut Naik <avadhut.naik@amd.com> Link: https://lore.kernel.org/r/20240307192704.37213-1-tony.luck@intel.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit ac5e80e upstream - Only capitalize entries where that makes sense - Print separate values separately - Rename 'PROCESSOR' to vendor & CPUID Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Avadhut Naik <avadhut.naik@amd.com> Cc: "Tony Luck" <tony.luck@intel.com> Link: https://lore.kernel.org/r/ZgZpn/zbCJWYdL5y@gmail.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit 9843064 upstream Machine Check Error information from 'struct mce' is exposed to userspace through the mce_record tracepoint. Currently, however, the PPIN (Protected Processor Inventory Number) field of 'struct mce' is not exposed. Add a PPIN field to the tracepoint as it provides a unique identifier for the system (or socket in case of multi-socket systems) on which the MCE has been received. Also, add a comment explaining the kind of information that can be and should be added to the tracepoint. Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20240401171455.1737976-2-avadhut.naik@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit 186d7ef upstream Currently, the microcode field (Microcode Revision) of 'struct mce' is not exposed to userspace through the mce_record tracepoint. Knowing the microcode version on which the MCE was received is critical information for debugging. If the version is not recorded, later attempts to acquire the version might result in discrepancies since it can be changed at runtime. Add microcode version to the tracepoint to prevent ambiguity over the active version on the system when the MCE was received. Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20240401171455.1737976-3-avadhut.naik@amd.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit 4a5f2dd upstream New CPU #defines encode vendor and family as well as model. [ bp: Squash *three* mce patches into one, fold in fix: https://lore.kernel.org/r/20240429022051.63360-1-tony.luck@intel.com ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/all/20240424181511.41772-1-tony.luck%40intel.com Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit 5b9d292 upstream The recent CMCI storm handling rework removed the last case that checks the return value of machine_check_poll(). Therefore the "error_seen" variable is no longer used, so remove it. Fixes: 3ed57b4 ("x86/mce: Remove old CMCI storm mitigation code") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240523155641.2805411-3-yazen.ghannam@amd.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit 5ad21a2 upstream There is no MCE "setup" done in mce_setup(). Rather, this function initializes and prepares an MCE record. Rename the function to highlight what it does. No functional change is intended. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/r/20240730182958.4117158-2-yazen.ghannam@amd.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit f9bbb8a upstream Generally, MCA information for an error is gathered on the CPU that reported the error. In this case, CPU-specific information from the running CPU will be correct. However, this will be incorrect if the MCA information is gathered while running on a CPU that didn't report the error. One example is creating an MCA record using mce_prep_record() for errors reported from ACPI. Split mce_prep_record() so that there is a helper function to gather common, i.e. not CPU-specific, information and another helper for CPU-specific information. Leave mce_prep_record() defined as-is for the common case when running on the reporting CPU. Get MCG_CAP in the global helper even though the register is per-CPU. This value is not already cached per-CPU like other values. And it does not assist with any per-CPU decoding or handling. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/r/20240730182958.4117158-3-yazen.ghannam@amd.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit 793aa4b upstream Current AMD systems can report MCA errors using the ACPI Boot Error Record Table (BERT). The BERT entries for MCA errors will be an x86 Common Platform Error Record (CPER) with an MSR register context that matches the MCAX/SMCA register space. However, the BERT will not necessarily be processed on the CPU that reported the MCA errors. Therefore, the correct CPU number needs to be determined and the information saved in struct mce. Use the newly defined mce_prep_record_*() helpers to get the correct data. Also, add an explicit check to verify that a valid CPU number was found from the APIC ID search. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/r/20240730182958.4117158-4-yazen.ghannam@amd.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Abhishek Rajput <Abhishek.Rajput@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit 750fd23926f1507cc826b5a4fdd4bfc7283e7723 upstream Currently, exporting new additional machine check error information involves adding new fields for the same at the end of the struct mce. This additional information can then be consumed through mcelog or tracepoint. However, as new MSRs are being added (and will be added in the future) by CPU vendors on their newer CPUs with additional machine check error information to be exported, the size of struct mce will balloon on some CPUs, unnecessarily, since those fields are vendor-specific. Moreover, different CPU vendors may export the additional information in varying sizes. The problem particularly intensifies since struct mce is exposed to userspace as part of UAPI. It's bloating through vendor-specific data should be avoided to limit the information being sent out to userspace. Add a new structure mce_hw_err to wrap the existing struct mce. The same will prevent its ballooning since vendor-specifc data, if any, can now be exported through a union within the wrapper structure and through __dynamic_array in mce_record tracepoint. Furthermore, new internal kernel fields can be added to the wrapper struct without impacting the user space API. [ bp: Restore reverse x-mas tree order of function vars declarations. ] [Backport Changes] 1. In arch/x86/kernel/cpu/mce/core.c, within the function mce_panic() deviations are shown due to line number changes.This is because the declaration of struct page *p was removed from the top of the function and moved inside the if condition (if (final && (final->status & MCI_STATUS_ADDRV))) in upstream merge commit b4442ca. Backporting that commit would introduce additional dependencies. Suggested-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Link: https://lore.kernel.org/r/20241022194158.110073-2-avadhut.naik@amd.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit c845cb8dbd2e1a804babfd13648026c3a7cfbc0b upstream Make several functions that return 0 or 1 return a boolean value for better readability. No functional changes are intended. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lore.kernel.org/r/20241212140103.66964-2-qiuxu.zhuo@intel.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit c46945c9cac8437a674edb9d8fbe71511fb4acee upstream Make those functions whose callers only care about success or failure return a boolean value for better readability. Also, update the call sites accordingly as the polarities of all the return values have been flipped. No functional changes. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lore.kernel.org/r/20241212140103.66964-4-qiuxu.zhuo@intel.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit 51a12c28bb9a043e9444db5bd214b00ec161a639 upstream Split each vendor specific part into its own helper function. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Link: https://lore.kernel.org/r/20241212140103.66964-5-qiuxu.zhuo@intel.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit a46b2bbe1e36e7faab5010f68324b7d191c5c09f upstream The 'UNKNOWN' vendor check is handled as a quirk that is run on each online CPU. However, all CPUs are expected to have the same vendor. Move the 'UNKNOWN' vendor check to the BSP-only init so it is done early and once. Remove the unnecessary return value from the quirks check. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250908-wip-mca-updates-v6-0-eef5d6c74b9c@amd.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit 0f134c53246366c00664b640f9edc9be5db255b3 upstream Unify the bank preparation into __mcheck_cpu_init_clear_banks(), rename that function to what it does now - prepares banks. Do this so that generic and vendor banks init goes first so that settings done during that init can take effect before the first bank polling takes place. Move __mcheck_cpu_check_banks() into __mcheck_cpu_init_prepare_banks() as it already loops over the banks. The MCP_DONTLOG flag is no longer needed, since the MCA polling function is now called only if boot-time logging should be done. [Backport Changes] 1. In file arch/x86/kernel/cpu/mce/core.c, within the function __mcheck_cpu_check_banks(), the call to wrmsrq() was replaced with the existing equivalent call wrmsrl() because the upstream commit 78255eb239733 that globally renamed wrmsrl() to wrmsrq() is not available yet in the current source tree. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250825-wip-mca-updates-v5-5-865768a2eef8@amd.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

…mbolic IFM references commit fd82221 upstream There's an erratum that prevents the PAT from working correctly: https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/pentium-dual-core-specification-update.pdf # Document 316515 Version 010 The kernel currently disables PAT support on those CPUs, but it does it with some magic numbers. Replace the magic numbers with the new "IFM" macros. Make the check refer to the last affected CPU (INTEL_CORE_YONAH) rather than the first fixed one. This makes it easier to find the documentation of the erratum since Intel documents where it is broken and not where it is fixed. I don't think the Pentium Pro (or Pentium II) is actually affected. But the old check included them, so it can't hurt to keep doing the same. I'm also not completely sure about the "Pentium M" CPUs (models 0x9 and 0xd). But, again, they were included in in the old checks and were close Pentium III derivatives, so are likely affected. While we're at it, revise the comment referring to the erratum name and making sure it is a quote of the language from the actual errata doc. That should make it easier to find in the future when the URL inevitably changes. Why bother with this in the first place? It actually gets rid of one of the very few remaining direct references to c->x86{,_model}. No change in functionality intended. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Len Brown <len.brown@intel.com> Link: https://lore.kernel.org/r/20240829220042.1007820-1-dave.hansen@linux.intel.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit 359d7a98e3e3f88dbf45411427b284bb3bbbaea5 upstream Convert family/model mixed checks to VFM-based checks to make the code more compact. Simplify. [ bp: Drop the "what" from the commit message - it should be visible from the diff alone. ] Suggested-by: Sohil Mehta <sohil.mehta@intel.com> Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lore.kernel.org/r/20241212140103.66964-6-qiuxu.zhuo@intel.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit 7eee1e92684507f64ec6a75fecbd27e37174b888 upstream Many quirks are global configuration settings and a handful apply to each CPU. Move the per-CPU quirks to vendor init to execute them on each online CPU. Set the global quirks during BSP-only init so they're only executed once and early. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250908-wip-mca-updates-v6-0-eef5d6c74b9c@amd.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit 91af6842e9945d064401ed2d6e91539a619760d1 upstream There are a number of generic and vendor-specific status checks in machine_check_poll(). These are used to determine if an error should be skipped. Move these into helper functions. Future vendor-specific checks will be added to the helpers. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/20250908-wip-mca-updates-v6-0-eef5d6c74b9c@amd.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit d4fca1358ea9096f2f6ed942e2cb3a820073dfc1 upstream Starting with Zen4, AMD's Scalable MCA systems incorporate two new registers: MCA_SYND1 and MCA_SYND2. These registers will include supplemental error information in addition to the existing MCA_SYND register. The data within these registers is considered valid if MCA_STATUS[SyndV] is set. Userspace error decoding tools like rasdaemon gather related hardware error information through the tracepoints. Therefore, export these two registers through the mce_record tracepoint so that tools like rasdaemon can parse them and output the supplemental error information like FRU text contained in them. [ bp: Massage. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Link: https://lore.kernel.org/r/20241022194158.110073-4-avadhut.naik@amd.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit ebe29309c4d2821d5fdccd5393eba9c77540e260 upstream Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit 8e44e83f57c3289a41507eb79a315400629978ae upstream Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

commit 5c6f123c419b6e20f84ac1683089a52f449273aa upstream Add a helper at the end of the MCA polling function to collect vendor and/or feature actions. Start with a basic skeleton for now. Actions for AMD thresholding and deferred errors will be added later. [ bp: Drop the obvious comment too. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/20250908-wip-mca-updates-v6-0-eef5d6c74b9c@amd.com Signed-off-by: Rahul Kumar <Kumar.Rahul2@amd.com> Signed-off-by: mohanasv2 <mohanasv@amd.com>

yghannam and others added 30 commits January 27, 2026 10:59

qzhuo2 and others added 12 commits January 27, 2026 10:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMD Venice MCA phase-1 patches for VeLinux (6.6 kernel) #110

AMD Venice MCA phase-1 patches for VeLinux (6.6 kernel) #110

Uh oh!

mohanasv2 commented Jan 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

AMD Venice MCA phase-1 patches for VeLinux (6.6 kernel) #110

Are you sure you want to change the base?

AMD Venice MCA phase-1 patches for VeLinux (6.6 kernel) #110

Uh oh!

Conversation

mohanasv2 commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unit Test:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

mohanasv2 commented Jan 27, 2026 •

edited

Loading