Skip to content

Commit 0cb0b98

Browse files
committed
Start next version of The Command Queue
1 parent f432061 commit 0cb0b98

File tree

4 files changed

+143
-19
lines changed

4 files changed

+143
-19
lines changed

next/getting-started/adapter-and-device/the-adapter.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -223,7 +223,7 @@ This works, as long as the **callback mode** we set in the callback info is at l
223223
As of version `v24.0.0.2`, `wgpu-native` does not implement `wgpuInstanceProcessEvents`. In this very case, we may skip it because the adapter requests ends right within the call to `wgpuInstanceRequestAdapter`.
224224
```
225225

226-
This is an OK solution, although we still need to manage ourselves the `requestEnded` test and the `sleep()` operation. This solution is **all right for the adapter/device request** and I do not want to make this chapter any longer, so we will wait for chapter [The Command Queue](../the-command-queue.md) to see **another way**, which gives finer control over the pending asynchronous operations.
226+
This is an OK solution, although we still need to manage ourselves the `requestEnded` test and the `sleep()` operation. This solution is **all right for the adapter/device request** and I do not want to make this chapter any longer, so we will wait for chapter [Playing with buffers](../playing-with-buffers.md) to see **another way**, which gives finer control over the pending asynchronous operations.
227227

228228
##### With emscripten
229229

next/getting-started/index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,6 @@ project-setup
1111
hello-webgpu
1212
adapter-and-device/index
1313
the-command-queue
14+
playing-with-buffers
15+
our-first-shader
1416
```
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
Playing with buffers
2+
====================
3+
4+
**WIP** *In this version of the guide, this chapter moves back in the "getting started" section, between command queue and the first (compute) shader.*
5+
6+
In this chapter:
7+
8+
- We see **how to create and manipulate buffers**.
9+
- We refine our control of **asynchronous operations**.
10+
11+
Buffers
12+
-------
13+
14+
Asynchronous operations
15+
-----------------------
16+
17+
##### The good way
18+
19+
**To keep track of ongoing asynchronous operations**, each function that starts such an operation **returns a `WGPUFuture`**, which is some sort of internal ID that **identifies the operation**:
20+
21+
```C++
22+
WGPUFuture adapterRequest = wgpuInstanceRequestAdapter(instance, &options, callbackInfo);
23+
```
24+
25+
```{note}
26+
Although it is technically just an integer value, the `WGPUFuture` should be treated as an **opaque handle**, i.e., one should not try to deduce anything from the very value of this ID.
27+
```
28+
29+
This *future* can then be passed to `wgpuInstanceWaitAny` to mean "wait until this asynchronous operation completes"! Here is the signature of `wgpuInstanceWaitAny`:
30+
31+
```C++
32+
WGPUWaitStatus wgpuInstanceWaitAny(WGPUInstance instance, size_t futureCount, WGPUFutureWaitInfo * futures, uint64_t timeoutNS);
33+
```
34+
35+
```C++
36+
uint64_t timeoutNS = 200 * 1000; // 200 ms
37+
WGPUWaitStatus status = wgpuInstanceWaitAny(instance, 1, &adapterRequest, timeoutNS);
38+
```
39+

next/getting-started/the-command-queue.md

Lines changed: 101 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -26,54 +26,137 @@ One important thing to keep in mind when doing graphics programming: we have **t
2626
1. **The code we write runs on the CPU**, and some of it triggers operations on the GPU. The only exception are *shaders*, which actually run on GPU.
2727
2. Processors are "**far away**", meaning that communicating between them **takes time**.
2828

29-
They are not too far, but for high performance applications like real time graphics or when manipulating large amounts of data like in machine learning, this matters.
29+
They are not too far, but for high performance applications like real time graphics or when manipulating large amounts of data like in machine learning, this matters. For two reasons:
30+
31+
### Bandwidth
32+
33+
Since the GPU is meant for **massive parallel data processing**, its performance can easily be **bound by the memory transfers** rather than the actual computation.
34+
35+
As it turns out, the **memory bandwidth limits** are more often hit within the GPU itself, **between its storage and its compute units**, but the CPU-GPU bandwidth is also limited, which one feels when trying to transfer large textures too often for instance.
36+
37+
```{note}
38+
The connection between the **CPU memory** (RAM) and **GPU memory (vRAM)** depends on the type of GPU. Some GPUs are **integrated** within the same chip as the CPU, so they share the same memory. A **discrete** GPU is typically connected through a PCIe wire. And an **external** GPU would be connected with a Thunderbolt wire for instance. Each has a different bandwidth.
39+
```
3040

3141
### Latency
3242

3343
**Even the smallest bit of information** needs some time for the round trip to and back from the GPU. As a consequence, functions that send instructions to the GPU return almost immediately: they **do not wait for the instruction to have actually been executed** because that would require to wait for the GPU to transfer back the "I'm done" information.
3444

35-
Instead, the commands intended for the GPU are **batched** and fired through a **command queue**. The GPU consumes this queue **whenever it is ready**.
45+
Instead, the commands intended for the GPU are **batched** and fired through a **command queue**. The GPU consumes this queue **whenever it is ready**. This is what we detail in this chapter.
46+
47+
### Timelines
3648

3749
The CPU-side of our program, i.e., the C++ code that we write, lives in the **Content timeline**. The other side of the command queue is in the **Queue timeline**, running on the GPU.
3850

3951
```{note}
4052
There is also a **Device timeline** defined in [WebGPU's documentation](https://www.w3.org/TR/webgpu/#programming-model-timelines). It corresponds to the GPU operations for which our code actually waits for an immediate answer (called "synchronous" calls), but unlike the JavaScript API, it is roughly the same as the content timeline in our C++ case.
4153
```
4254

43-
### Bandwidth
55+
In the remainder of this chapter:
4456

45-
Since the GPU is meant for **massive parallel data processing**, its performance can easily be **bound by the memory transfers** rather than the actual computation.
57+
- We see **how to manipulate the queue**.
58+
- We refine our control of **asynchronous operations**.
4659

47-
As it turns out, the **memory bandwidth limits** are more often hit within the GPU itself, **between its storage and its compute units**, but the CPU-GPU bandwidth is also limited, which one feels when trying to transfer large textures too often for instance.
60+
Manipulating the Queue
61+
----------------------
62+
63+
### Queue operations
64+
65+
Our WebGPU device has **a single queue**, which is used to send both **commands** and **data**. We can get it with `wgpuDeviceGetQueue`.
66+
67+
```{lit} C++, Get Queue
68+
WGPUQueue queue = wgpuDeviceGetQueue(device);
69+
```
70+
71+
Naturally, we must also release the queue once we no longer use it, at the end of the program:
72+
73+
```{lit} C++, Release things (prepend)
74+
// At the end
75+
wgpuQueueRelease(queue);
76+
```
4877

4978
```{note}
50-
The connection between the CPU memory (RAM) and GPU memory (vRAM) depends on the type of GPU. Some GPUs are *integrated* within the same chip as the CPU, so they share the same memory. A *discrete* GPU is typically connected through a PCIe wire. And an *external* GPU would be connected with a Thunderbolt wire for instance. Each has a different bandwidth.
79+
**Other graphics API** allow one to build **multiple queues** per device, and future version of WebGPU might as well. But for now, one queue is already more than enough for us to play with!
5180
```
5281

53-
Queue operations
54-
----------------
82+
Looking at `webgpu.h`, we find mainly **3 different means** to submit work to this queue:
5583

56-
**WIP line**
84+
- `wgpuQueueSubmit` sends **commands**, i.e., instructions of what to execute on the GPU.
85+
- `wgpuQueueWriteBuffer` sends **data** from a CPU-side buffer to a **GPU-side buffer**.
86+
- `wgpuQueueWriteTexture` sends **data** from a CPU-side buffer to a **GPU-side texture**.
87+
88+
We can note that all these functions have a `void` return type: they send instructions/data to the GPU and return immediately **without waiting from an answer from the GPU**.
89+
90+
The only way to **get information back** is through `wgpuQueueOnSubmittedWorkDone`, which is an **asynchronous operation** that gets invoked once the GPU confirms that it has (tried to) execute the commands. We show an example below.
5791

58-
##### The good way
92+
### Submitting commands
5993

60-
**To keep track of ongoing asynchronous operations**, each function that starts such an operation **returns a `WGPUFuture`**, which is some sort of internal ID that **identifies the operation**:
94+
We submit commands using the following procedure:
6195

6296
```C++
63-
WGPUFuture adapterRequest = wgpuInstanceRequestAdapter(instance, &options, callbackInfo);
97+
wgpuQueueSubmit(queue, /* number of commands */, /* pointer to the command array */);
6498
```
6599
66-
```{note}
67-
Although it is technically just an integer value, the `WGPUFuture` should be treated as an **opaque handle**, i.e., one should not try to deduce anything from the very value of this ID.
100+
We recognize here the typical way of sending arrays (briefly mentioned in [The Device](adapter-and-device/the-adapter.md) chapter). WebGPU is a C API so whenever it needs to receive an array of things, we first provide **the array size** and then **a pointer to the first element**.
101+
102+
#### Array argument
103+
104+
If we have a **single element**, it is simply done like so:
105+
106+
```C++
107+
// With a single command:
108+
WGPUCommandBuffer command = /* [...] */;
109+
wgpuQueueSubmit(queue, 1, &command);
110+
wgpuCommandBufferRelease(command); // release command buffer once submitted
68111
```
69112

70-
This *future* can then be passed to `wgpuInstanceWaitAny` to mean "wait until this asynchronous operation completes"! Here is the signature of `wgpuInstanceWaitAny`:
113+
If we know at **compile time** ("statically") the number of commands, we may use a C array, or a `std::array` (which is safer):
71114

72115
```C++
73-
WGPUWaitStatus wgpuInstanceWaitAny(WGPUInstance instance, size_t futureCount, WGPUFutureWaitInfo * futures, uint64_t timeoutNS);
116+
// With a statically know number of commands:
117+
WGPUCommandBuffer commands[3];
118+
commands[0] = /* [...] */;
119+
commands[1] = /* [...] */;
120+
commands[2] = /* [...] */;
121+
wgpuQueueSubmit(queue, 3, commands);
122+
123+
// or, safer and avoid repeating the array size:
124+
// (requires to #include <array>)
125+
std::array<WGPUCommandBuffer, 3> commands;
126+
commands[0] = /* [...] */;
127+
commands[1] = /* [...] */;
128+
commands[2] = /* [...] */;
129+
wgpuQueueSubmit(queue, commands.size(), commands.data());
74130
```
75131
132+
Or, if command buffers are **dynamically accumulated**, we use a `std::vector`:
133+
76134
```C++
77-
uint64_t timeoutNS = 200 * 1000; // 200 ms
78-
WGPUWaitStatus status = wgpuInstanceWaitAny(instance, 1, &adapterRequest, timeoutNS);
135+
// With a dynamical number of commands:
136+
// (requires to #include <vector>)
137+
std::vector<WGPUCommandBuffer> commands;
138+
commands.push_back(/* [...] */);
139+
if (someRuntimeCondition) {
140+
commands.push_back(/* [...] */);
141+
}
142+
wgpuQueueSubmit(queue, commands.size(), commands.data());
79143
```
144+
145+
#### Command buffers
146+
147+
In any case, do not forgot to **release** the command buffers once they have been submitted:
148+
149+
```C++
150+
// Release:
151+
for (auto cmd : commands) {
152+
wgpuCommandBufferRelease(cmd);
153+
}
154+
```
155+
156+
> 🤔 Hey but what about **creating these buffers**, to begin with?
157+
158+
A command buffer, which has type `WGPUCommandBuffer`, is not a buffer that we directly create! This buffer uses a special format that is left to the discretion of your driver/hardware. To build this buffer, we use a **command encoder**.
159+
160+
### Command encoder
161+
162+
**WIP line**

0 commit comments

Comments
 (0)