Skip to content

fix(visualizer): match Vulkan physical device to CUDA device on multi-GPU systems#1315

Merged
MrNeRF merged 1 commit into
MrNeRF:masterfrom
iliesaya:fix/multi-gpu-cuda-vulkan-device-match
Jun 18, 2026
Merged

fix(visualizer): match Vulkan physical device to CUDA device on multi-GPU systems#1315
MrNeRF merged 1 commit into
MrNeRF:masterfrom
iliesaya:fix/multi-gpu-cuda-vulkan-device-match

Conversation

@iliesaya

Copy link
Copy Markdown
Contributor

Summary

On multi-GPU systems, starting GUI training crashes immediately with a misleading
"CUDA out of memory" error - even with tens of GB of VRAM free. The root cause is
that the Vulkan viewer and the CUDA trainer can end up on different physical GPUs,
which breaks the CUDA↔Vulkan zero-copy interop.

Root cause

VulkanContext::pickPhysicalDevice() selects the first discrete GPU in Vulkan's
enumeration order
and never reconciles it with the GPU CUDA uses (device 0). On
machines with two GPUs - especially two identical cards - Vulkan's enumeration order
does not necessarily match CUDA's, so the viewer initializes on one card and the
trainer on the other.

When training starts, the exportable-interop allocator exports a CUDA VMM block on the
CUDA device and tries to import it into Vulkan on the other card:

exportable_storage.cpp Exportable CUDA block: device_ptr=0x... committed=1184 MiB
vulkan_context.cpp Vulkan: vkAllocateMemory(import) failed: VK_ERROR_OUT_OF_DEVICE_MEMORY
training_manager.cpp Exportable-interop allocator failed (...); falling back to legacy Vulkan-external allocator
pinned_memory_allocator.cpp cudaEventQuery failed: an illegal memory access was encountered
tensor.cpp cudaErrorIllegalAddress: an illegal memory access was encountered
training_manager.cpp Failed to initialize SplatData: CUDA out of memory: failed to allocate 21033300 bytes (0.02 GB).

The import fails with VK_ERROR_OUT_OF_DEVICE_MEMORY; the legacy fallback then performs
cross-device CUDA work that raises cudaErrorIllegalAddress, which poisons the CUDA
context so every subsequent allocation fails. The allocator surfaces those failures as
"out of memory" — hence the misleading message on a nearly empty GPU.

Notably, verifyCudaMatchesVulkanDevice() in cuda_vulkan_interop.cpp already detects
this exact mismatch and suggests setting CUDA_VISIBLE_DEVICES, confirming the two APIs
can diverge — this PR makes the selection align automatically.

Fix

In pickPhysicalDevice(), prefer the discrete GPU whose UUID matches CUDA device 0
(via VkPhysicalDeviceIDProperties::deviceUUID vs cudaDeviceProp::uuid). If no match
is found, fall back to the previous "first discrete GPU" behavior, so single-GPU
systems are completely unaffected
.

Reproduction

  • Hardware: 2× identical NVIDIA GPUs (reproduced on 2× RTX 4090, driver 572.61, CUDA 12.8).
  • Load any COLMAP/transforms dataset and start training in the GUI → instant
    "CUDA out of memory" crash.
  • Workaround without this patch: set CUDA_VISIBLE_DEVICES=<index matching the Vulkan card>.

Testing

  • Before: GUI training crashes at model init with the log above (GPU ~0.5 GB used of 24 GB).
  • After: Vulkan selects the CUDA-matched 4090, interop succeeds
    (Training tensors share one CUDA-exportable VMM block imported into Vulkan — zero-copy viewer interop), and training runs normally with no env var.
  • Single-GPU path unchanged (fallback preserves prior behavior).

…-GPU

pickPhysicalDevice() selected the first discrete GPU in Vulkan's
enumeration order without regard to which GPU CUDA uses. On multi-GPU
systems (especially two identical cards), Vulkan's order can differ from
CUDA's, so the viewer initializes on a different physical GPU than the
trainer.

The CUDA<->Vulkan zero-copy interop then exports a memory block on the
CUDA device and tries to import it into Vulkan on the other card. The
import fails with VK_ERROR_OUT_OF_DEVICE_MEMORY, the legacy fallback path
performs cross-device CUDA work that raises cudaErrorIllegalAddress, and
the poisoned CUDA context makes every subsequent allocation fail. The
user-visible result is an immediate, misleading "CUDA out of memory"
crash on a GPU with tens of GB free.

Prefer the discrete GPU whose UUID matches CUDA device 0, falling back to
the previous "first discrete GPU" behavior when no match is found so
single-GPU systems are unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 17, 2026 13:34

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates Vulkan physical device selection to prefer the discrete GPU that matches the CUDA device UUID, improving CUDA↔Vulkan external-memory interop reliability on multi-GPU systems.

Changes:

  • Added a UUID-based matcher between Vulkan physical devices and a CUDA device.
  • Updated GPU selection to prefer the CUDA-matched discrete GPU while preserving legacy fallback behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1251 to +1253
if (vulkanDeviceMatchesCudaDevice(device, 0)) {
physical_device_ = device;
break;
Comment on lines +100 to +113
[[nodiscard]] bool vulkanDeviceMatchesCudaDevice(const VkPhysicalDevice device, const int cuda_device) {
cudaDeviceProp cuda_props{};
if (cudaGetDeviceProperties(&cuda_props, cuda_device) != cudaSuccess) {
return false;
}
VkPhysicalDeviceIDProperties vk_id{};
vk_id.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_ID_PROPERTIES;
VkPhysicalDeviceProperties2 vk_props2{};
vk_props2.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_PROPERTIES_2;
vk_props2.pNext = &vk_id;
vkGetPhysicalDeviceProperties2(device, &vk_props2);
static_assert(sizeof(cuda_props.uuid.bytes) == VK_UUID_SIZE);
return std::memcmp(cuda_props.uuid.bytes, vk_id.deviceUUID, VK_UUID_SIZE) == 0;
}
// cards, Vulkan's enumeration order can differ from CUDA's; matching by
// UUID lets pickPhysicalDevice keep the viewer on the same card as the
// trainer so CUDA<->Vulkan external-memory interop can import the block.
[[nodiscard]] bool vulkanDeviceMatchesCudaDevice(const VkPhysicalDevice device, const int cuda_device) {
@shadygm shadygm self-requested a review June 17, 2026 18:32
@MrNeRF

MrNeRF commented Jun 18, 2026

Copy link
Copy Markdown
Owner

Thx!

@MrNeRF MrNeRF merged commit e2182b4 into MrNeRF:master Jun 18, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants