Vulkan mistakes

Here lies a list of numerous mistakes in this Vulkan renderer that might become issues one day:

## Not using a Vulkan loader
By using the direct Vulkan library linking (`-lvulkan`) you are missing some good features of this API like :
 * Loading only used Vulkan functions
 * Being able to use an ICD
 * Loading device-optimized function pointers
 * Avoiding layer functions between your app and the driver
 
I know this is only an mlx but if you want to use Vulkan you should go for the optimized way, otherwise just stick to a higher level API like SDL_GPU or WebGPU.

## Not using a dedicated GPU allocator
Some drivers (like NVK) only allow 1024 living allocations per Vulkan applications. By doing a new allocation per image/buffer you are using a *very not scalable* method that will surely cause issues later. I've seen lot of students creating a single new images per frame or some other black magic things like so.

An easy fix would be to use a widely known and industry tested GPU allocator like [VMA](https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator) or make yours like I did [here](https://github.com/Kbz-8/GimmeGPUMemory) in C++ for educational purposes.

## *Difficult to understand* function names
Some function names are just not helping to understand what the function does like `mlx___vulkan_img_sampler`, which creates a new sampler, or `mlx___vulkan_img_mem_sync` which flushes device memory (?) while I'd expect it to insert a memory barrier on it or something.

## Not recreating the swapchain when necessary
I know this mlx can't handle window resize once it is created so swapchain recreation is not needed. Thus said, on Wayland, sometimes the compositor creates the window then resizes it on the first rendering which invalidates the swapchain and implies a recreation.

## Not waiting for frame's fence before acquiring new swapchain image
This is linked to the issue just above. Waiting for the frame's fence [after the swapchain image acquisition](https://github.com/42school/mlx_CLXV/blob/4d1ddd11b55489196bbdd8eaf1b1d0700e779a71/src/gpu/mlx___vulkan_draw.c#L248) can lead to deadlocks which is explained [here](https://vulkan-tutorial.com/Drawing_a_triangle/Swap_chain_recreation#page_Fixing-a-deadlock).

I'd recommend you to move the `vkWaitForFences` above the `vkAcquireNextImageKHR`.

## Missing **A LOT** of potential error messages
A lot of times you just `return 1;` on a Vulkan issue without any error messages like [here](https://github.com/42school/mlx_CLXV/blob/4d1ddd11b55489196bbdd8eaf1b1d0700e779a71/src/gpu/mlx___vulkan_window.c#L236), [here](https://github.com/42school/mlx_CLXV/blob/4d1ddd11b55489196bbdd8eaf1b1d0700e779a71/src/gpu/mlx___vulkan_window.c#L248), or [here](https://github.com/42school/mlx_CLXV/blob/4d1ddd11b55489196bbdd8eaf1b1d0700e779a71/src/gpu/mlx___vulkan_draw.c#L276).

## Use the transfer queue
Almost all drivers have a dedicated transfer queue. Most of your GPU operations are transfers and memory synchronization. By only using the graphics queue you are bloating the rendering process with random memory management operations and you are missing some important drivers optimizations.

## Try to use a push constant
Your uniform buffer may be optimized to reduce its size. If you manage to do that you should pass it to the shader via a push constant. This would remove memory allocations for the buffer, increase the rendering speed and simplify the rendering process.

## Difficult to navigate and understand architecture and code
While navigating through the code to make this issue I found it hard to read as there are lots of poorly named variables and the renderer's global architecture is *too simple*. It would be better to have a separated file for pipelines, swapchain, descriptors and buffer managements, ...

## Blob of minor mistakes
* Using `printf` for some error messages like in [here](https://github.com/42school/mlx_CLXV/blob/4d1ddd11b55489196bbdd8eaf1b1d0700e779a71/src/gpu/mlx___vulkan_draw.c#L241C7-L241C13)
* Use better name for [this](https://github.com/42school/mlx_CLXV/blob/4d1ddd11b55489196bbdd8eaf1b1d0700e779a71/src/gpu/mlx___vulkan_init.c#L14) and use direct Vulkan physical device types enums.
* Why [`3*(vkwin->swch_images_nb+1)`](https://github.com/42school/mlx_CLXV/blob/4d1ddd11b55489196bbdd8eaf1b1d0700e779a71/src/gpu/mlx___vulkan_window.c#L231C1-L231C35) ? I guess the `3` is the frame in flight count. If so, use your constant define. And if so this is useless. You just need a single semaphore per swapchain image, not `FRAME_IN_FLIGHT_COUNT * SWAPCHAIN_IMAGE_COUNT`.
* Why do you have SPIR-V shader files AND them embedded in a header ?

<br>

### This issue presents a non exhaustive list of mistakes that may be modified in the future as I did not have a lot a time to write it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vulkan mistakes #3

Not using a Vulkan loader

Not using a dedicated GPU allocator

Difficult to understand function names

Not recreating the swapchain when necessary

Not waiting for frame's fence before acquiring new swapchain image

Missing A LOT of potential error messages

Use the transfer queue

Try to use a push constant

Difficult to navigate and understand architecture and code

Blob of minor mistakes

This issue presents a non exhaustive list of mistakes that may be modified in the future as I did not have a lot a time to write it.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Vulkan mistakes #3

Description

Not using a Vulkan loader

Not using a dedicated GPU allocator

Difficult to understand function names

Not recreating the swapchain when necessary

Not waiting for frame's fence before acquiring new swapchain image

Missing A LOT of potential error messages

Use the transfer queue

Try to use a push constant

Difficult to navigate and understand architecture and code

Blob of minor mistakes

This issue presents a non exhaustive list of mistakes that may be modified in the future as I did not have a lot a time to write it.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions