fix(visualizer): match Vulkan physical device to CUDA device on multi-GPU systems#1315
Merged
MrNeRF merged 1 commit intoJun 18, 2026
Merged
Conversation
…-GPU pickPhysicalDevice() selected the first discrete GPU in Vulkan's enumeration order without regard to which GPU CUDA uses. On multi-GPU systems (especially two identical cards), Vulkan's order can differ from CUDA's, so the viewer initializes on a different physical GPU than the trainer. The CUDA<->Vulkan zero-copy interop then exports a memory block on the CUDA device and tries to import it into Vulkan on the other card. The import fails with VK_ERROR_OUT_OF_DEVICE_MEMORY, the legacy fallback path performs cross-device CUDA work that raises cudaErrorIllegalAddress, and the poisoned CUDA context makes every subsequent allocation fail. The user-visible result is an immediate, misleading "CUDA out of memory" crash on a GPU with tens of GB free. Prefer the discrete GPU whose UUID matches CUDA device 0, falling back to the previous "first discrete GPU" behavior when no match is found so single-GPU systems are unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Updates Vulkan physical device selection to prefer the discrete GPU that matches the CUDA device UUID, improving CUDA↔Vulkan external-memory interop reliability on multi-GPU systems.
Changes:
- Added a UUID-based matcher between Vulkan physical devices and a CUDA device.
- Updated GPU selection to prefer the CUDA-matched discrete GPU while preserving legacy fallback behavior.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+1251
to
+1253
| if (vulkanDeviceMatchesCudaDevice(device, 0)) { | ||
| physical_device_ = device; | ||
| break; |
Comment on lines
+100
to
+113
| [[nodiscard]] bool vulkanDeviceMatchesCudaDevice(const VkPhysicalDevice device, const int cuda_device) { | ||
| cudaDeviceProp cuda_props{}; | ||
| if (cudaGetDeviceProperties(&cuda_props, cuda_device) != cudaSuccess) { | ||
| return false; | ||
| } | ||
| VkPhysicalDeviceIDProperties vk_id{}; | ||
| vk_id.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_ID_PROPERTIES; | ||
| VkPhysicalDeviceProperties2 vk_props2{}; | ||
| vk_props2.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_PROPERTIES_2; | ||
| vk_props2.pNext = &vk_id; | ||
| vkGetPhysicalDeviceProperties2(device, &vk_props2); | ||
| static_assert(sizeof(cuda_props.uuid.bytes) == VK_UUID_SIZE); | ||
| return std::memcmp(cuda_props.uuid.bytes, vk_id.deviceUUID, VK_UUID_SIZE) == 0; | ||
| } |
| // cards, Vulkan's enumeration order can differ from CUDA's; matching by | ||
| // UUID lets pickPhysicalDevice keep the viewer on the same card as the | ||
| // trainer so CUDA<->Vulkan external-memory interop can import the block. | ||
| [[nodiscard]] bool vulkanDeviceMatchesCudaDevice(const VkPhysicalDevice device, const int cuda_device) { |
Owner
|
Thx! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
On multi-GPU systems, starting GUI training crashes immediately with a misleading
"CUDA out of memory" error - even with tens of GB of VRAM free. The root cause is
that the Vulkan viewer and the CUDA trainer can end up on different physical GPUs,
which breaks the CUDA↔Vulkan zero-copy interop.
Root cause
VulkanContext::pickPhysicalDevice()selects the first discrete GPU in Vulkan'senumeration order and never reconciles it with the GPU CUDA uses (device 0). On
machines with two GPUs - especially two identical cards - Vulkan's enumeration order
does not necessarily match CUDA's, so the viewer initializes on one card and the
trainer on the other.
When training starts, the exportable-interop allocator exports a CUDA VMM block on the
CUDA device and tries to import it into Vulkan on the other card:
exportable_storage.cpp Exportable CUDA block: device_ptr=0x... committed=1184 MiB
vulkan_context.cpp Vulkan: vkAllocateMemory(import) failed: VK_ERROR_OUT_OF_DEVICE_MEMORY
training_manager.cpp Exportable-interop allocator failed (...); falling back to legacy Vulkan-external allocator
pinned_memory_allocator.cpp cudaEventQuery failed: an illegal memory access was encountered
tensor.cpp cudaErrorIllegalAddress: an illegal memory access was encountered
training_manager.cpp Failed to initialize SplatData: CUDA out of memory: failed to allocate 21033300 bytes (0.02 GB).
The import fails with
VK_ERROR_OUT_OF_DEVICE_MEMORY; the legacy fallback then performscross-device CUDA work that raises
cudaErrorIllegalAddress, which poisons the CUDAcontext so every subsequent allocation fails. The allocator surfaces those failures as
"out of memory" — hence the misleading message on a nearly empty GPU.
Notably,
verifyCudaMatchesVulkanDevice()incuda_vulkan_interop.cppalready detectsthis exact mismatch and suggests setting
CUDA_VISIBLE_DEVICES, confirming the two APIscan diverge — this PR makes the selection align automatically.
Fix
In
pickPhysicalDevice(), prefer the discrete GPU whose UUID matches CUDA device 0(via
VkPhysicalDeviceIDProperties::deviceUUIDvscudaDeviceProp::uuid). If no matchis found, fall back to the previous "first discrete GPU" behavior, so single-GPU
systems are completely unaffected.
Reproduction
"CUDA out of memory" crash.
set CUDA_VISIBLE_DEVICES=<index matching the Vulkan card>.Testing
(
Training tensors share one CUDA-exportable VMM block imported into Vulkan — zero-copy viewer interop), and training runs normally with no env var.