[linux-nvidia-6.17]: Backport nvgrace-gpu hugepfnmap, ecc patches and miscellaneous cleanups #287

ankita-nv · 2026-01-21T05:18:26Z

This pull requests handles the following:

Huge PFNMAP code patch series backport from v6.19 [1]
ECC series code patch series backport from v6.19 [2][3][4]
GGL ECC series backport from v6.19 [5]
New EGM patch to handle ECC errors
Cleanup to remove older non-upstreamed ECC handling code.
Cleanup to remove PASID patch as it is handled in Qemu
Cleanup to remove older KVM code for GPU memory mapping.

Verified for GPU and EGM functionality within the VM with the Qemu branch [4]:

Link: https://lore.kernel.org/all/20251127170632.3477-1-ankita@nvidia.com/ [1]
Link: https://lore.kernel.org/all/20251102184434.2406-1-ankita@nvidia.com/ [2]
Link: https://lore.kernel.org/all/20260115202849.2921-1-ankita@nvidia.com/ [3]
Link: https://lore.kernel.org/all/20251211070603.338701-2-ankita@nvidia.com/ [4]
Link: https://lore.kernel.org/all/20251013185903.1372553-2-jiaqiyan@google.com/ [5]
Link: https://github.com/NVIDIA/QEMU/tree/nvidia_stable-10.1 [6]

LP: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia/+bug/2138892

… pfn_address_space" This reverts commit 4d57f26.

…ion" This reverts commit 5e7f83d.

…cacheable in VMA" This reverts commit 93277da. [ankita: minor merge conflict to fix the if check] Signed-off-by: Ankit Agrawal <ankita@nvidia.com>

…MMUFD_VFIO_CONTAINER" This reverts commit 7f517f8.

This reverts commit 8498a2e.

… poison errors handling" This reverts commit 2ad32de. [ankita: minor cleanup due to fix header files and remove all code within CONFIG_MEMORY_FAULURE in egm.c] Signed-off-by: Ankit Agrawal <ankita@nvidia.com>

…oison handling" This reverts commit eda3c2f. [ankita: code changes to address conflict to remove h_node, the code within CONFIG_MEMORY_FAILURE and header files in main.c. Also removes unused variable nvdev in nvgrace_gpu_remove] Signed-off-by: Ankit Agrawal <ankita@nvidia.com>

…ruct pfn" This reverts commit a106f3e.

…() for mapped pfn" This reverts commit 43a226d.

This reverts commit 22ae182.

This reverts commit 387f70d.

Poison (or ECC) errors can be very common on a large size cluster. The kernel MM currently handles ECC errors / poison only on memory page backed by struct page. The handling is currently missing for the PFNMAP memory that does not have struct pages. The series adds such support. Implement a new ECC handling for memory without struct pages. Kernel MM expose registration APIs to allow modules that are managing the device to register its device memory region. MM then tracks such regions using interval tree. The mechanism is largely similar to that of ECC on pfn with struct pages. If there is an ECC error on a pfn, all the mapping to it are identified and a SIGBUS is sent to the user space processes owning those mappings. Note that there is one primary difference versus the handling of the poison on struct pages, which is to skip unmapping to the faulty PFN. This is done to handle the huge PFNMAP support added recently [1] that enables VM_PFNMAP vmas to map at PMD or PUD level. A poison to a PFN mapped in such as way would need breaking the PMD/PUD mapping into PTEs that will get mirrored into the S2. This can greatly increase the cost of table walks and have a major performance impact. nvgrace-gpu-vfio-pci module maps the device memory to user VA (Qemu) using remap_pfn_range without being added to the kernel [2]. These device memory PFNs are not backed by struct page. So make nvgrace-gpu-vfio-pci module make use of the mechanism to get poison handling support on the device memory. This patch (of 3): The GHES code allows calling of memory_failure() on the PFNs that pass the pfn_valid() check. This contract is broken for the remapped PFNs which fails the check and ghes_do_memory_failure() returns without triggering memory_failure(). Update code to allow memory_failure() call on PFNs failing pfn_valid(). Link: https://lkml.kernel.org/r/20251102184434.2406-1-ankita@nvidia.com Link: https://lkml.kernel.org/r/20251102184434.2406-2-ankita@nvidia.com Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com> Cc: Aniket Agashe <aniketa@nvidia.com> Cc: Ankit Agrawal <ankita@nvidia.com> Cc: Borislav Betkov <bp@alien8.de> Cc: David Hildenbrand <david@redhat.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Kirti Wankhede <kwankhede@nvidia.com> Cc: Len Brown <lenb@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Matthew R. Ochs <mochs@nvidia.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Neo Jia <cjia@nvidia.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Smita Koralahalli Channabasappa <smita.koralahallichannabasappa@amd.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tarun Gupta <targupta@nvidia.com> Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Cc: Vikram Sethi <vsethi@nvidia.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 30d0a12) Signed-off-by: Ankit Agrawal <ankita@nvidia.com>

Poison (or ECC) errors can be very common on a large size cluster. The kernel MM currently does not handle ECC errors / poison on a memory region that is not backed by struct pages. If a memory region mapped using remap_pfn_range() for example, but not added to the kernel, MM will not have associated struct pages. Add a new mechanism to handle memory failure on such memory. Make kernel MM expose a function to allow modules managing the device memory to register the device memory SPA and the address space associated it. MM maintains this information as an interval tree. On poison, MM can search for the range that the poisoned PFN belong and use the address_space to determine the mapping VMA. In this implementation, kernel MM follows the following sequence that is largely similar to the memory_failure() handler for struct page backed memory: 1. memory_failure() is triggered on reception of a poison error. An absence of struct page is detected and consequently memory_failure_pfn() is executed. 2. memory_failure_pfn() collects the processes mapped to the PFN. 3. memory_failure_pfn() sends SIGBUS to all the processes mapping the faulty PFN using kill_procs(). Note that there is one primary difference versus the handling of the poison on struct pages, which is to skip unmapping to the faulty PFN. This is done to handle the huge PFNMAP support added recently [1] that enables VM_PFNMAP vmas to map at PMD or PUD level. A poison to a PFN mapped in such as way would need breaking the PMD/PUD mapping into PTEs that will get mirrored into the S2. This can greatly increase the cost of table walks and have a major performance impact. Link: https://lore.kernel.org/all/20240826204353.2228736-1-peterx@redhat.com/ [1] Link: https://lkml.kernel.org/r/20251102184434.2406-3-ankita@nvidia.com Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Cc: Aniket Agashe <aniketa@nvidia.com> Cc: Borislav Betkov <bp@alien8.de> Cc: David Hildenbrand <david@redhat.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Kirti Wankhede <kwankhede@nvidia.com> Cc: Len Brown <lenb@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Matthew R. Ochs <mochs@nvidia.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Neo Jia <cjia@nvidia.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shuai Xue <xueshuai@linux.alibaba.com> Cc: Smita Koralahalli Channabasappa <smita.koralahallichannabasappa@amd.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tarun Gupta <targupta@nvidia.com> Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Cc: Vikram Sethi <vsethi@nvidia.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 2ec4196) Signed-off-by: Ankit Agrawal <ankita@nvidia.com>

When APEI fails to handle a stage-2 synchronous external abort (SEA), today KVM injects an asynchronous SError to the VCPU then resumes it, which usually results in unpleasant guest kernel panic. One major situation of guest SEA is when vCPU consumes recoverable uncorrected memory error (UER). Although SError and guest kernel panic effectively stops the propagation of corrupted memory, guest may re-use the corrupted memory if auto-rebooted; in worse case, guest boot may run into poisoned memory. So there is room to recover from an UER in a more graceful manner. Alternatively KVM can redirect the synchronous SEA event to VMM to - Reduce blast radius if possible. VMM can inject a SEA to VCPU via KVM's existing KVM_SET_VCPU_EVENTS API. If the memory poison consumption or fault is not from guest kernel, blast radius can be limited to the triggering thread in guest userspace, so VM can keep running. - Allow VMM to protect from future memory poison consumption by unmapping the page from stage-2, or to interrupt guest of the poisoned page so guest kernel can unmap it from stage-1 page table. - Allow VMM to track SEA events that VM customers care about, to restart VM when certain number of distinct poison events have happened, to provide observability to customers in log management UI. Introduce an userspace-visible feature to enable VMM handle SEA: - KVM_CAP_ARM_SEA_TO_USER. As the alternative fallback behavior when host APEI fails to claim a SEA, userspace can opt in this new capability to let KVM exit to userspace during SEA if it is not owned by host. - KVM_EXIT_ARM_SEA. A new exit reason is introduced for this. KVM fills kvm_run.arm_sea with as much as possible information about the SEA, enabling VMM to emulate SEA to guest by itself. - Sanitized ESR_EL2. The general rule is to keep only the bits useful for userspace and relevant to guest memory. - Flags indicating if faulting guest physical address is valid. - Faulting guest physical and virtual addresses if valid. Signed-off-by: Jiaqi Yan <jiaqiyan@google.com> Co-developed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://msgid.link/20251013185903.1372553-2-jiaqiyan@google.com Signed-off-by: Oliver Upton <oupton@kernel.org> (backported from commit ad9c62b) [ankita: Minor conflict resolution to include KVM_CAP_GUEST_MEMFD_FLAGS] Signed-off-by: Ankit Agrawal <ankita@nvidia.com>

Test how KVM handles guest SEA when APEI is unable to claim it, and KVM_CAP_ARM_SEA_TO_USER is enabled. The behavior is triggered by consuming recoverable memory error (UER) injected via EINJ. The test asserts two major things: 1. KVM returns to userspace with KVM_EXIT_ARM_SEA exit reason, and has provided expected fault information, e.g. esr, flags, gva, gpa. 2. Userspace is able to handle KVM_EXIT_ARM_SEA by injecting SEA to guest and KVM injects expected SEA into the VCPU. Tested on a data center server running Siryn AmpereOne processor that has RAS support. Several things to notice before attempting to run this selftest: - The test relies on EINJ support in both firmware and kernel to inject UER. Otherwise the test will be skipped. - The under-test platform's APEI should be unable to claim the SEA. Otherwise the test will be skipped. - Some platform doesn't support notrigger in EINJ, which may cause APEI and GHES to offline the memory before guest can consume injected UER, and making test unable to trigger SEA. Signed-off-by: Jiaqi Yan <jiaqiyan@google.com> Link: https://msgid.link/20251013185903.1372553-3-jiaqiyan@google.com Signed-off-by: Oliver Upton <oupton@kernel.org> (cherry picked from commit feee9ef) Signed-off-by: Ankit Agrawal <ankita@nvidia.com>

Document the new userspace-visible features and APIs for handling synchronous external abort (SEA) - KVM_CAP_ARM_SEA_TO_USER: How userspace enables the new feature. - KVM_EXIT_ARM_SEA: exit userspace gets when it needs to handle SEA and what userspace gets while taking the SEA. Signed-off-by: Jiaqi Yan <jiaqiyan@google.com> Link: https://msgid.link/20251013185903.1372553-4-jiaqiyan@google.com [ oliver: make documentation concise, remove implementation detail ] Signed-off-by: Oliver Upton <oupton@kernel.org> (cherry picked from commit 4debb5e) Signed-off-by: Ankit Agrawal <ankita@nvidia.com>

Refactor vfio_pci_mmap_huge_fault to take out the implementation to map the VMA to the PTE/PMD/PUD as a separate function. Export the new function to be used by nvgrace-gpu module. Move the alignment check code to verify that pfn and VMA VA is aligned to the page order to the header file and make it inline. No functional change is intended. Cc: Shameer Kolothum <skolothumtho@nvidia.com> Cc: Alex Williamson <alex@shazbot.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Reviewed-by: Shameer Kolothum <skolothumtho@nvidia.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Link: https://lore.kernel.org/r/20251127170632.3477-2-ankita@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 9b92bc7) Signed-off-by: Ankit Agrawal <ankita@nvidia.com>

NVIDIA's Grace based systems have large device memory. The device memory is mapped as VM_PFNMAP in the VMM VMA. The nvgrace-gpu module could make use of the huge PFNMAP support added in mm [1]. To make use of the huge pfnmap support, fault/huge_fault ops based mapping mechanism needs to be implemented. Currently nvgrace-gpu module relies on remap_pfn_range to do the mapping during VM bootup. Replace it to instead rely on fault and use vfio_pci_vmf_insert_pfn to setup the mapping. Moreover to enable huge pfnmap, nvgrace-gpu module is updated by adding huge_fault ops implementation. The implementation establishes mapping according to the order request. Note that if the PFN or the VMA address is unaligned to the order, the mapping fallbacks to the PTE level. Link: https://lore.kernel.org/all/20240826204353.2228736-1-peterx@redhat.com/ [1] Cc: Shameer Kolothum <skolothumtho@nvidia.com> Cc: Alex Williamson <alex@shazbot.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Vikram Sethi <vsethi@nvidia.com> Reviewed-by: Zhi Wang <zhiw@nvidia.com> Reviewed-by: Shameer Kolothum <skolothumtho@nvidia.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Link: https://lore.kernel.org/r/20251127170632.3477-3-ankita@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 9db6548) Signed-off-by: Ankit Agrawal <ankita@nvidia.com>

Remove code duplication in vfio_pci_core_mmap by calling vfio_pci_core_setup_barmap to perform the bar mapping. No functional change is intended. Cc: Donald Dutile <ddutile@redhat.com> Reviewed-by: Shameer Kolothum <skolothumtho@nvidia.com> Reviewed-by: Zhi Wang <zhiw@nvidia.com> Suggested-by: Alex Williamson <alex@shazbot.org> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Link: https://lore.kernel.org/r/20251127170632.3477-4-ankita@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 7f5764e) Signed-off-by: Ankit Agrawal <ankita@nvidia.com>

Split the function that check for the GPU device being ready on the probe. Move the code to wait for the GPU to be ready through BAR0 register reads to a separate function. This would help reuse the code. This also fixes a bug where the return status in case of timeout gets overridden by return from pci_enable_device. With the fix, a timeout generate an error as initially intended. Fixes: d85f69d ("vfio/nvgrace-gpu: Check the HBM training and C2C link status") Reviewed-by: Zhi Wang <zhiw@nvidia.com> Reviewed-by: Shameer Kolothum <skolothumtho@nvidia.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Link: https://lore.kernel.org/r/20251127170632.3477-5-ankita@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 7d05507) Signed-off-by: Ankit Agrawal <ankita@nvidia.com>

Introduce a new flag reset_done to notify that the GPU has just been reset and the mapping to the GPU memory is zapped. Implement the reset_done handler to set this new variable. It will be used later in the patches to wait for the GPU memory to be ready before doing any mapping or access. Cc: Jason Gunthorpe <jgg@ziepe.ca> Reviewed-by: Shameer Kolothum <skolothumtho@nvidia.com> Suggested-by: Alex Williamson <alex@shazbot.org> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Link: https://lore.kernel.org/r/20251127170632.3477-6-ankita@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (backported from commit dfe7654) [ankita: Minor conflict resolution to include egm_node] Signed-off-by: Ankit Agrawal <ankita@nvidia.com>

Speculative prefetches from CPU to GPU memory until the GPU is ready after reset can cause harmless corrected RAS events to be logged on Grace systems. It is thus preferred that the mapping not be re-established until the GPU is ready post reset. The GPU readiness can be checked through BAR0 registers similar to the checking at the time of device probe. It can take several seconds for the GPU to be ready. So it is desirable that the time overlaps as much of the VM startup as possible to reduce impact on the VM bootup time. The GPU readiness state is thus checked on the first fault/huge_fault request or read/write access which amortizes the GPU readiness time. The first fault and read/write checks the GPU state when the reset_done flag - which denotes whether the GPU has just been reset. The memory_lock is taken across map/access to avoid races with GPU reset. Also check if the memory is enabled, before waiting for GPU to be ready. Otherwise the readiness check would block for 30s. Lastly added PM handling wrapping on read/write access. Cc: Shameer Kolothum <skolothumtho@nvidia.com> Cc: Alex Williamson <alex@shazbot.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Vikram Sethi <vsethi@nvidia.com> Reviewed-by: Shameer Kolothum <skolothumtho@nvidia.com> Suggested-by: Alex Williamson <alex@shazbot.org> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Link: https://lore.kernel.org/r/20251127170632.3477-7-ankita@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (backported from commit a23b106) [ankita: Minor conflict resolution for header files] Signed-off-by: Ankit Agrawal <ankita@nvidia.com>

The memory failure handling implementation for the PFNMAP memory with no struct pages is faulty. The VA of the mapping is determined based on the the PFN. It should instead be based on the file mapping offset. At the occurrence of poison, the memory_failure_pfn is triggered on the poisoned PFN. Introduce a callback function that allows mm to translate the PFN to the corresponding file page offset. The kernel module using the registration API must implement the callback function and provide the translation. The translated value is then used to determine the VA information and sending the SIGBUS to the usermode process mapped to the poisoned PFN. The callback is also useful for the driver to be notified of the poisoned PFN, which may then track it. Link: https://lkml.kernel.org/r/20251211070603.338701-2-ankita@nvidia.com Fixes: 2ec4196 ("mm: handle poisoning of pfn without struct pages") Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Matthew R. Ochs <mochs@nvidia.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Neo Jia <cjia@nvidia.com> Cc: Vikram Sethi <vsethi@nvidia.com> Cc: Yishai Hadas <yishaih@nvidia.com> Cc: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit e6dbcb7) Signed-off-by: Ankit Agrawal <ankita@nvidia.com>

nvmochs · 2026-01-21T18:03:53Z

I reviewed Ankit's branch prior to him posting this PR, so we've already iterated on my feedback.

@ankita-nv I do have one additional comment. Now that your 2-patch series is present in linux-next, can you pick these patches from there instead of backporting from LKML?
linux-next:
e5f19b619fa0 vfio/nvgrace-gpu: register device memory for poison handling
205e6d17cdf5 mm: add stubs for PFNMAP memory failure registration functions

That will allow us to drop the NVIDIA: SAUCE tags from those 2 patches. Also please be sure to add "linux-next" after the SHA on the cherry-pick line (pick with -s -x).

e.g. (cherry picked from commit linux-next)

Add stubs to address CONFIG_MEMORY_FAILURE disabled. Suggested-by: Alex Williamson <alex@shazbot.org> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Link: https://lore.kernel.org/r/20260115202849.2921-2-ankita@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit 205e6d17cdf5b7f7b221bf64be9850eabce429c9 linux-next) Signed-off-by: Ankit Agrawal <ankita@nvidia.com>

The nvgrace-gpu module [1] maps the device memory to the user VA (Qemu) without adding the memory to the kernel. The device memory pages are PFNMAP and not backed by struct page. The module can thus utilize the MM's PFNMAP memory_failure mechanism that handles ECC/poison on regions with no struct pages. The kernel MM code exposes register/unregister APIs allowing modules to register the device memory for memory_failure handling. Make nvgrace-gpu register the GPU memory with the MM on open. The module registers its memory region, the address_space with the kernel MM for ECC handling and implements a callback function to convert the PFN to the file page offset. The callback functions checks if the PFN belongs to the device memory region and is also contained in the VMA range, an error is returned otherwise. Link: https://lore.kernel.org/all/20240220115055.23546-1-ankita@nvidia.com/ [1] Suggested-by: Alex Williamson <alex@shazbot.org> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Reviewed-by: Jiaqi Yan <jiaqiyan@google.com> Link: https://lore.kernel.org/r/20260115202849.2921-3-ankita@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org> (cherry picked from commit e5f19b619fa0b691ccb537d72240bd20eb72087c linux-next) Signed-off-by: Ankit Agrawal <ankita@nvidia.com>

…y_failure EGM carveout memory is mapped directly into userspace (QEMU) and is not added to the kernel. It is not managed by the kernel page allocator and has no struct pages. The module can thus utilize the Linux memory manager's memory_failure mechanism for regions with no struct pages. The Linux MM code exposes register/unregister APIs allowing modules to register such memory regions for memory_failure handling. Register the EGM PFN range with the MM memory_failure infrastructure on open, and unregister it on the last close. Provide a PFN-to-VMA offset callback that validates the PFN is within the EGM region and the VMA, then converts it to a file offset and records the poisoned offset in the existing hashtable for reporting to userspace. Signed-off-by: Ankit Agrawal <ankita@nvidia.com>

ankita-nv · 2026-01-21T18:49:56Z

Thanks Matt, branch refreshed.

nvmochs

No further comments/issues from me.

Acked-by: Matthew R. Ochs <mochs@nvidia.com>

clsotog · 2026-01-22T06:38:22Z

@ankita-nv
The last patch have not been sent for upstream review?

ankita-nv · 2026-01-22T15:35:37Z

No, it is not yet. It is an update in the EGM code based on the new poison handling code that was recently upstream.

clsotog · 2026-01-22T15:37:34Z

No, it is not yet. It is an update in the EGM code based on the new poison handling code that was recently upstream.

Alright thanks for the response.

clsotog

Acked-by: Carol L Soto <csoto@nvidia.com>

nvmochs · 2026-01-22T17:04:11Z

PR sent to Canonical.

ankita-nv and others added 23 commits January 21, 2026 04:01

Revert "NVIDIA: SAUCE: vfio/nvgrace-egm: Prevent double-unregister of…

dfadfbc

… pfn_address_space" This reverts commit 4d57f26.

Revert "NVIDIA: SAUCE: vfio/nvgrace-gpu: Avoid resmem pfn unregistrat…

006b422

…ion" This reverts commit 5e7f83d.

Revert "NVIDIA: SAUCE: KVM: arm64: Allow exec fault on memory mapped …

bd0febb

…cacheable in VMA" This reverts commit 93277da. [ankita: minor merge conflict to fix the if check] Signed-off-by: Ankit Agrawal <ankita@nvidia.com>

Revert "NVIDIA: SAUCE: arm64: configs: Replace VFIO_CONTAINER with IO…

9eb8008

…MMUFD_VFIO_CONTAINER" This reverts commit 7f517f8.

Revert "NVIDIA: SAUCE: WAR: Expose PCI PASID capability to userspace"

230f470

This reverts commit 8498a2e.

Revert "NVIDIA: SAUCE: vfio/nvgrace-egm: Register EGM for runtime ECC…

ca576fe

… poison errors handling" This reverts commit 2ad32de. [ankita: minor cleanup due to fix header files and remove all code within CONFIG_MEMORY_FAULURE in egm.c] Signed-off-by: Ankit Agrawal <ankita@nvidia.com>

Revert "NVIDIA: SAUCE: mm: Change ghes code to allow poison of non-st…

0230499

…ruct pfn" This reverts commit a106f3e.

Revert "NVIDIA: SAUCE: mm: Add poison error check in fixup_user_fault…

b98ce8c

…() for mapped pfn" This reverts commit 43a226d.

Revert "NVIDIA: SAUCE: mm: correctly identify pfn without struct pages"

d5be6ba

This reverts commit 22ae182.

Revert "NVIDIA: SAUCE: mm: handle poisoning of pfn without struct pages"

11cdd90

This reverts commit 387f70d.

ankita-nv added 3 commits January 21, 2026 18:23

ankita-nv force-pushed the 24.04_linux-nvidia-6.17-next-0120-cleanup-ecc-pfnmap-egm-0121 branch from 16a866a to 72033a0 Compare January 21, 2026 18:26

nvmochs self-requested a review January 21, 2026 18:52

nvmochs requested review from clsotog and nirmoy January 21, 2026 18:52

nvmochs approved these changes Jan 21, 2026

View reviewed changes

clsotog approved these changes Jan 22, 2026

View reviewed changes

nvmochs mentioned this pull request Jan 26, 2026

[linux-nvidia-6.17] Backport support for FEAT_LS64 #292

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[linux-nvidia-6.17]: Backport nvgrace-gpu hugepfnmap, ecc patches and miscellaneous cleanups #287

[linux-nvidia-6.17]: Backport nvgrace-gpu hugepfnmap, ecc patches and miscellaneous cleanups #287

Uh oh!

ankita-nv commented Jan 21, 2026 •

edited by nvmochs

Loading

Uh oh!

nvmochs commented Jan 21, 2026

Uh oh!

ankita-nv commented Jan 21, 2026

Uh oh!

nvmochs left a comment

Uh oh!

clsotog commented Jan 22, 2026 •

edited

Loading

Uh oh!

ankita-nv commented Jan 22, 2026

Uh oh!

clsotog commented Jan 22, 2026

Uh oh!

clsotog left a comment

Uh oh!

nvmochs commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[linux-nvidia-6.17]: Backport nvgrace-gpu hugepfnmap, ecc patches and miscellaneous cleanups #287

Are you sure you want to change the base?

[linux-nvidia-6.17]: Backport nvgrace-gpu hugepfnmap, ecc patches and miscellaneous cleanups #287

Uh oh!

Conversation

ankita-nv commented Jan 21, 2026 • edited by nvmochs Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nvmochs commented Jan 21, 2026

Uh oh!

ankita-nv commented Jan 21, 2026

Uh oh!

nvmochs left a comment

Choose a reason for hiding this comment

Uh oh!

clsotog commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ankita-nv commented Jan 22, 2026

Uh oh!

clsotog commented Jan 22, 2026

Uh oh!

clsotog left a comment

Choose a reason for hiding this comment

Uh oh!

nvmochs commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ankita-nv commented Jan 21, 2026 •

edited by nvmochs

Loading

clsotog commented Jan 22, 2026 •

edited

Loading