You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: next/getting-started/adapter-and-device/the-adapter.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -223,7 +223,7 @@ This works, as long as the **callback mode** we set in the callback info is at l
223
223
As of version `v24.0.0.2`, `wgpu-native` does not implement `wgpuInstanceProcessEvents`. In this very case, we may skip it because the adapter requests ends right within the call to `wgpuInstanceRequestAdapter`.
224
224
```
225
225
226
-
This is an OK solution, although we still need to manage ourselves the `requestEnded` test and the `sleep()` operation. This solution is **all right for the adapter/device request** and I do not want to make this chapter any longer, so we will wait for chapter [The Command Queue](../the-command-queue.md) to see **another way**, which gives finer control over the pending asynchronous operations.
226
+
This is an OK solution, although we still need to manage ourselves the `requestEnded` test and the `sleep()` operation. This solution is **all right for the adapter/device request** and I do not want to make this chapter any longer, so we will wait for chapter [Playing with buffers](../playing-with-buffers.md) to see **another way**, which gives finer control over the pending asynchronous operations.
**WIP***In this version of the guide, this chapter moves back in the "getting started" section, between command queue and the first (compute) shader.*
5
+
6
+
In this chapter:
7
+
8
+
- We see **how to create and manipulate buffers**.
9
+
- We refine our control of **asynchronous operations**.
10
+
11
+
Buffers
12
+
-------
13
+
14
+
Asynchronous operations
15
+
-----------------------
16
+
17
+
##### The good way
18
+
19
+
**To keep track of ongoing asynchronous operations**, each function that starts such an operation **returns a `WGPUFuture`**, which is some sort of internal ID that **identifies the operation**:
Although it is technically just an integer value, the `WGPUFuture` should be treated as an **opaque handle**, i.e., one should not try to deduce anything from the very value of this ID.
27
+
```
28
+
29
+
This *future* can then be passed to `wgpuInstanceWaitAny` to mean "wait until this asynchronous operation completes"! Here is the signature of `wgpuInstanceWaitAny`:
@@ -26,54 +26,137 @@ One important thing to keep in mind when doing graphics programming: we have **t
26
26
1.**The code we write runs on the CPU**, and some of it triggers operations on the GPU. The only exception are *shaders*, which actually run on GPU.
27
27
2. Processors are "**far away**", meaning that communicating between them **takes time**.
28
28
29
-
They are not too far, but for high performance applications like real time graphics or when manipulating large amounts of data like in machine learning, this matters.
29
+
They are not too far, but for high performance applications like real time graphics or when manipulating large amounts of data like in machine learning, this matters. For two reasons:
30
+
31
+
### Bandwidth
32
+
33
+
Since the GPU is meant for **massive parallel data processing**, its performance can easily be **bound by the memory transfers** rather than the actual computation.
34
+
35
+
As it turns out, the **memory bandwidth limits** are more often hit within the GPU itself, **between its storage and its compute units**, but the CPU-GPU bandwidth is also limited, which one feels when trying to transfer large textures too often for instance.
36
+
37
+
```{note}
38
+
The connection between the **CPU memory** (RAM) and **GPU memory (vRAM)** depends on the type of GPU. Some GPUs are **integrated** within the same chip as the CPU, so they share the same memory. A **discrete** GPU is typically connected through a PCIe wire. And an **external** GPU would be connected with a Thunderbolt wire for instance. Each has a different bandwidth.
39
+
```
30
40
31
41
### Latency
32
42
33
43
**Even the smallest bit of information** needs some time for the round trip to and back from the GPU. As a consequence, functions that send instructions to the GPU return almost immediately: they **do not wait for the instruction to have actually been executed** because that would require to wait for the GPU to transfer back the "I'm done" information.
34
44
35
-
Instead, the commands intended for the GPU are **batched** and fired through a **command queue**. The GPU consumes this queue **whenever it is ready**.
45
+
Instead, the commands intended for the GPU are **batched** and fired through a **command queue**. The GPU consumes this queue **whenever it is ready**. This is what we detail in this chapter.
46
+
47
+
### Timelines
36
48
37
49
The CPU-side of our program, i.e., the C++ code that we write, lives in the **Content timeline**. The other side of the command queue is in the **Queue timeline**, running on the GPU.
38
50
39
51
```{note}
40
52
There is also a **Device timeline** defined in [WebGPU's documentation](https://www.w3.org/TR/webgpu/#programming-model-timelines). It corresponds to the GPU operations for which our code actually waits for an immediate answer (called "synchronous" calls), but unlike the JavaScript API, it is roughly the same as the content timeline in our C++ case.
41
53
```
42
54
43
-
### Bandwidth
55
+
In the remainder of this chapter:
44
56
45
-
Since the GPU is meant for **massive parallel data processing**, its performance can easily be **bound by the memory transfers** rather than the actual computation.
57
+
- We see **how to manipulate the queue**.
58
+
- We refine our control of **asynchronous operations**.
46
59
47
-
As it turns out, the **memory bandwidth limits** are more often hit within the GPU itself, **between its storage and its compute units**, but the CPU-GPU bandwidth is also limited, which one feels when trying to transfer large textures too often for instance.
60
+
Manipulating the Queue
61
+
----------------------
62
+
63
+
### Queue operations
64
+
65
+
Our WebGPU device has **a single queue**, which is used to send both **commands** and **data**. We can get it with `wgpuDeviceGetQueue`.
66
+
67
+
```{lit} C++, Get Queue
68
+
WGPUQueue queue = wgpuDeviceGetQueue(device);
69
+
```
70
+
71
+
Naturally, we must also release the queue once we no longer use it, at the end of the program:
72
+
73
+
```{lit} C++, Release things (prepend)
74
+
// At the end
75
+
wgpuQueueRelease(queue);
76
+
```
48
77
49
78
```{note}
50
-
The connection between the CPU memory (RAM) and GPU memory (vRAM) depends on the type of GPU. Some GPUs are *integrated* within the same chip as the CPU, so they share the same memory. A *discrete* GPU is typically connected through a PCIe wire. And an *external* GPU would be connected with a Thunderbolt wire for instance. Each has a different bandwidth.
79
+
**Other graphics API** allow one to build **multiple queues** per device, and future version of WebGPU might as well. But for now, one queue is already more than enough for us to play with!
51
80
```
52
81
53
-
Queue operations
54
-
----------------
82
+
Looking at `webgpu.h`, we find mainly **3 different means** to submit work to this queue:
55
83
56
-
**WIP line**
84
+
-`wgpuQueueSubmit` sends **commands**, i.e., instructions of what to execute on the GPU.
85
+
-`wgpuQueueWriteBuffer` sends **data** from a CPU-side buffer to a **GPU-side buffer**.
86
+
-`wgpuQueueWriteTexture` sends **data** from a CPU-side buffer to a **GPU-side texture**.
87
+
88
+
We can note that all these functions have a `void` return type: they send instructions/data to the GPU and return immediately **without waiting from an answer from the GPU**.
89
+
90
+
The only way to **get information back** is through `wgpuQueueOnSubmittedWorkDone`, which is an **asynchronous operation** that gets invoked once the GPU confirms that it has (tried to) execute the commands. We show an example below.
57
91
58
-
##### The good way
92
+
###Submitting commands
59
93
60
-
**To keep track of ongoing asynchronous operations**, each function that starts such an operation **returns a `WGPUFuture`**, which is some sort of internal ID that **identifies the operation**:
wgpuQueueSubmit(queue, /* number of commands */, /* pointer to the command array */);
64
98
```
65
99
66
-
```{note}
67
-
Although it is technically just an integer value, the `WGPUFuture` should be treated as an **opaque handle**, i.e., one should not try to deduce anything from the very value of this ID.
100
+
We recognize here the typical way of sending arrays (briefly mentioned in [The Device](adapter-and-device/the-adapter.md) chapter). WebGPU is a C API so whenever it needs to receive an array of things, we first provide **the array size** and then **a pointer to the first element**.
101
+
102
+
#### Array argument
103
+
104
+
If we have a **single element**, it is simply done like so:
105
+
106
+
```C++
107
+
// With a single command:
108
+
WGPUCommandBuffer command = /* [...] */;
109
+
wgpuQueueSubmit(queue, 1, &command);
110
+
wgpuCommandBufferRelease(command); // release command buffer once submitted
68
111
```
69
112
70
-
This *future* can then be passed to `wgpuInstanceWaitAny` to mean "wait until this asynchronous operation completes"! Here is the signature of `wgpuInstanceWaitAny`:
113
+
If we know at **compile time** ("statically") the number of commands, we may use a C array, or a `std::array` (which is safer):
In any case, do not forgot to **release** the command buffers once they have been submitted:
148
+
149
+
```C++
150
+
// Release:
151
+
for (auto cmd : commands) {
152
+
wgpuCommandBufferRelease(cmd);
153
+
}
154
+
```
155
+
156
+
> 🤔 Hey but what about **creating these buffers**, to begin with?
157
+
158
+
A command buffer, which has type `WGPUCommandBuffer`, is not a buffer that we directly create! This buffer uses a special format that is left to the discretion of your driver/hardware. To build this buffer, we use a **command encoder**.
0 commit comments