Skip to content

Vulkan mistakes #3

@Kbz-8

Description

@Kbz-8

Here lies a list of numerous mistakes in this Vulkan renderer that might become issues one day:

Not using a Vulkan loader

By using the direct Vulkan library linking (-lvulkan) you are missing some good features of this API like :

  • Loading only used Vulkan functions
  • Being able to use an ICD
  • Loading device-optimized function pointers
  • Avoiding layer functions between your app and the driver

I know this is only an mlx but if you want to use Vulkan you should go for the optimized way, otherwise just stick to a higher level API like SDL_GPU or WebGPU.

Not using a dedicated GPU allocator

Some drivers (like NVK) only allow 1024 living allocations per Vulkan applications. By doing a new allocation per image/buffer you are using a very not scalable method that will surely cause issues later. I've seen lot of students creating a single new images per frame or some other black magic things like so.

An easy fix would be to use a widely known and industry tested GPU allocator like VMA or make yours like I did here in C++ for educational purposes.

Difficult to understand function names

Some function names are just not helping to understand what the function does like mlx___vulkan_img_sampler, which creates a new sampler, or mlx___vulkan_img_mem_sync which flushes device memory (?) while I'd expect it to insert a memory barrier on it or something.

Not recreating the swapchain when necessary

I know this mlx can't handle window resize once it is created so swapchain recreation is not needed. Thus said, on Wayland, sometimes the compositor creates the window then resizes it on the first rendering which invalidates the swapchain and implies a recreation.

Not waiting for frame's fence before acquiring new swapchain image

This is linked to the issue just above. Waiting for the frame's fence after the swapchain image acquisition can lead to deadlocks which is explained here.

I'd recommend you to move the vkWaitForFences above the vkAcquireNextImageKHR.

Missing A LOT of potential error messages

A lot of times you just return 1; on a Vulkan issue without any error messages like here, here, or here.

Use the transfer queue

Almost all drivers have a dedicated transfer queue. Most of your GPU operations are transfers and memory synchronization. By only using the graphics queue you are bloating the rendering process with random memory management operations and you are missing some important drivers optimizations.

Try to use a push constant

Your uniform buffer may be optimized to reduce its size. If you manage to do that you should pass it to the shader via a push constant. This would remove memory allocations for the buffer, increase the rendering speed and simplify the rendering process.

Difficult to navigate and understand architecture and code

While navigating through the code to make this issue I found it hard to read as there are lots of poorly named variables and the renderer's global architecture is too simple. It would be better to have a separated file for pipelines, swapchain, descriptors and buffer managements, ...

Blob of minor mistakes

  • Using printf for some error messages like in here
  • Use better name for this and use direct Vulkan physical device types enums.
  • Why 3*(vkwin->swch_images_nb+1) ? I guess the 3 is the frame in flight count. If so, use your constant define. And if so this is useless. You just need a single semaphore per swapchain image, not FRAME_IN_FLIGHT_COUNT * SWAPCHAIN_IMAGE_COUNT.
  • Why do you have SPIR-V shader files AND them embedded in a header ?

This issue presents a non exhaustive list of mistakes that may be modified in the future as I did not have a lot a time to write it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions