Skip to content

Commit f53fe05

Browse files
taj-praphlinus
andauthored
[Vello Hybrid]: Clipping (Spatiotemporal Allocation) (#957)
## Context Hooks up and addresses all the TODO's in Raph's sketch (#934). See this [thread](https://xi.zulipchat.com/#narrow/channel/197075-vello/topic/Spatiotemporal.20allocation.20.28hybrid.29/near/513442829)[#vello > Spatiotemporal allocation (hybrid) @ 💬](https://xi.zulipchat.com/#narrow/channel/197075-vello/topic/Spatiotemporal.20allocation.20.28hybrid.29/near/513442829). ## Notes - Made `GpuResources` non-optional - I thought this was more trouble than it was worth. - Adds in new `clear_slots` pipeline to support fine grained clearing of slots in slot textures (needed for spatiotemporal allocation) - Regenerates test snapshots - Enables the clipping test suite for `vello_hybrid` - There are many performance wins to be had. This PR is pretty big already, so I think these are worth following up separately. - Not re-creating buffers for each render pass (re-using the allocations between calls) - Using a staging belt (to prevent allocating an extra staging buffer per `write_buffer`) - Perhaps allowing more than 1 column of slots per slot texture. I've copied the documentation from `schedule.rs` below for information about how spatiotemporal allocation works: # Scheduling - Draw commands are either issued to the final target or slots in a clip texture. - Rounds represent a draw in up to 3 render targets (two clip textures and a final target). - The clip texture stores slots for many clip depths. Once our clip textures are full, we flush rounds (i.e. execute render passes) to free up space. Note that a slot refers to 1 wide tile's worth of pixels in the clip texture. - The `free` vector contains the indices of the slots that are available for use in the two clip textures. ## Example Consider the following scene of drawing a single wide tile with three overlapping rectangles with decreasing width clipping regions. ```rs const WIDTH: f64 = 100.0; const HEIGHT: f64 = Tile::HEIGHT as f64; const OFFSET: f64 = WIDTH / 3.0; let colors = [RED, GREEN, BLUE]; for i in 0..3 { let clip_rect = Rect::new((i as f64) * OFFSET, 0.0, 100, HEIGHT); ctx.push_clip_layer(&clip_rect.to_path(0.1)); ctx.set_paint(colors[i]); ctx.fill_rect(&Rect::new(0.0, 0.0, WIDTH, HEIGHT)); } for _ in 0..3 { ctx.pop_layer(); } ``` This single wide tile scene should produce the below rendering: ``` ┌────────────────────────────┌────────────────────────────┌───────────────────────────── │ ── ─── │ / / / /│ ────────────── │ │ ──── ──── │ / / / / │──────── │ │── ───── │ / / / / │ │ │ ───Red │ / /Green / / │ Blue │ │ ──── ── │ / / / / │ ───────│ │ ─── ──── │ / / / / │ ────────────── │ │ ── │ / / / / │─────── │ └────────────────────────────└────────────────────────────└────────────────────────────┘ ``` How the scene is scheduled into rounds and draw calls are shown below: ### Round 0 In this round, we don't have any preserved slots or slots that we need to sample from. Simply, draw unclipped primitives. ### Draw to texture 0: In Slot N - 1 of texture 0, draw the unclipped green rectangle. Slot N - 1: ``` ┌──────────────────────────────────────────────────────────────────────────────────────┐ │ / / / / / / / / / / / / │ │ / / / / / / / / / / / / │ │ / / / / / / / / / / / / │ │ / / / / / / Green / / / / / / │ │ / / / / / / / / / / / / │ │ / / / / / / / / / / / / │ │ / / / / / / / / / / / / │ └──────────────────────────────────────────────────────────────────────────────────────┘ ``` ### Draw to texture 1: In Slot N - 2 of texture 1, draw unclipped red rectangle and, in slot N - 1, draw the unclipped blue rectangle. Slot N - 2: ``` ┌──────────────────────────────────────────────────────────────────────────────────────┐ │ ── ─── ── ─── │ │ ──── ──── ── ──── ──── ──│ │── ───── ──── ── ───── ──── │ │ ───── ──── Red ───── ──── │ │ ──── ──── ──── ──── │ │ ─── ──── ─── ──── │ │ ─── ─── │ └──────────────────────────────────────────────────────────────────────────────────────┘ ``` Slot N - 1: ``` ┌──────────────────────────────────────────────────────────────────────────────────────┐ │ ────────────────────────────────────────── │ │─────────────────────────────────────────── │ │ │ │ Blue ───────────────│ │ ──────────────────────────── │ │ ──────────────────────────── │ │─────────────── │ └──────────────────────────────────────────────────────────────────────────────────────┘ ``` ### Round 1 At this point, we have three slots that contain our unclipped rectangles. In this round, we start to sample those pixels to apply clipping (texture 1 samples from texture 0 and the render target view samples from texture 1). ### Draw to texture 0: Slot N - 1 of texture 0 contains our unclipped green rectangle. In this draw, we sample the pixels from slot N - 2 from texture 1 to draw the blue rectangle into this slot. Slot N - 1: ``` ┌─────────────────────────────────────────────────────────┌───────────────────────────── │ / / / / / / / /│ ────────────── │ │ / / / / / / / / │──────── │ │ / / / / / / / / │ │ │ / / / Green / / / / │ Blue │ │ / / / / / / / / │ ───────│ │ / / / / / / / / │ ────────────── │ │ / / / / / / / / │─────── │ └─────────────────────────────────────────────────────────└────────────────────────────┘ ``` ### Draw to texture 1: Then, into Slot N - 2 of texture 1, which contains our red rectangle, we sample the pixels from slot N - 1 of texture 0 which contain our green and blue rectangles. ``` ┌────────────────────────────┌────────────────────────────┌───────────────────────────── │ ── ─── │ / / / /│ ────────────── │ │ ──── ──── │ / / / / │──────── │ │── ───── │ / / / / │ │ │ ───Red │ / /Green / / │ Blue │ │ ──── ── │ / / / / │ ───────│ │ ─── ──── │ / / / / │ ────────────── │ │ ── │ / / / / │─────── │ └────────────────────────────└────────────────────────────└────────────────────────────┘ ``` ### Draw to render target At this point, we can sample the pixels from slot N - 1 of texture 1 to draw the final rendition. ## Nuances - When there are no clip/blend regions, we can render directly to the final target. - The above example provides an intuitive explanation for how rounds after 3 clip depths are scheduled. At clip depths 1 and 2, we can draw directly to the final target within a single round. - Before drawing into any slot, we need to clear it. If all slots can be cleared or are free, we can use a `LoadOp::Clear` operation. Otherwise, we need to clear the dirty slots using a fine grained render pass. For more information about this algorithm, see this [Zulip thread]. [Zulip thread]: https://xi.zulipchat.com/#narrow/channel/197075-vello/topic/Spatiotemporal.20allocation.20.28hybrid.29/near/513442829 --------- Co-authored-by: Raph Levien <[email protected]>
1 parent 576819e commit f53fe05

File tree

16 files changed

+1389
-450
lines changed

16 files changed

+1389
-450
lines changed

Cargo.lock

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

sparse_strips/vello_dev_macros/src/lib.rs

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -150,8 +150,7 @@ pub fn vello_test(attr: TokenStream, item: TokenStream) -> TokenStream {
150150

151151
// These tests currently don't work with `vello_hybrid`.
152152
skip_hybrid |= {
153-
input_fn_name_str.contains("clip")
154-
|| input_fn_name_str.contains("compose")
153+
input_fn_name_str.contains("compose")
155154
|| input_fn_name_str.contains("gradient")
156155
|| input_fn_name_str.contains("image")
157156
|| input_fn_name_str.contains("layer")

sparse_strips/vello_hybrid/Cargo.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,9 @@ publish = false
1515
workspace = true
1616

1717
[dependencies]
18-
vello_common = { workspace = true }
1918
bytemuck = { workspace = true, features = ["derive"] }
19+
thiserror = { workspace = true }
20+
vello_common = { workspace = true }
2021
wgpu = { workspace = true }
2122

2223
[dev-dependencies]

sparse_strips/vello_hybrid/examples/render_to_file.rs

Lines changed: 10 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@ use vello_common::kurbo::{Affine, Stroke};
1111
use vello_common::pico_svg::{Item, PicoSvg};
1212
use vello_common::pixmap::Pixmap;
1313
use vello_hybrid::{DimensionConstraints, Scene};
14-
use wgpu::RenderPassDescriptor;
1514

1615
/// Main entry point for the headless rendering example.
1716
/// Takes two command line arguments:
@@ -91,28 +90,20 @@ async fn run() {
9190
width: width.into(),
9291
height: height.into(),
9392
};
94-
renderer.prepare(&device, &queue, &scene, &render_size);
9593
// Copy texture to buffer
9694
let mut encoder = device.create_command_encoder(&wgpu::CommandEncoderDescriptor {
9795
label: Some("Vello Render To Buffer"),
9896
});
99-
{
100-
let mut pass = encoder.begin_render_pass(&RenderPassDescriptor {
101-
label: Some("Render Pass"),
102-
color_attachments: &[Some(wgpu::RenderPassColorAttachment {
103-
view: &texture_view,
104-
resolve_target: None,
105-
ops: wgpu::Operations {
106-
load: wgpu::LoadOp::Clear(wgpu::Color::TRANSPARENT),
107-
store: wgpu::StoreOp::Store,
108-
},
109-
})],
110-
depth_stencil_attachment: None,
111-
occlusion_query_set: None,
112-
timestamp_writes: None,
113-
});
114-
renderer.render(&scene, &mut pass);
115-
}
97+
renderer
98+
.render(
99+
&scene,
100+
&device,
101+
&queue,
102+
&mut encoder,
103+
&render_size,
104+
&texture_view,
105+
)
106+
.unwrap();
116107

117108
// Create a buffer to copy the texture data
118109
let bytes_per_row = (u32::from(width) * 4).next_multiple_of(256);

sparse_strips/vello_hybrid/examples/webgl/src/lib.rs

Lines changed: 23 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -170,13 +170,6 @@ impl AppState {
170170
height: self.height,
171171
};
172172

173-
self.renderer_wrapper.renderer.prepare(
174-
&self.renderer_wrapper.device,
175-
&self.renderer_wrapper.queue,
176-
&self.scene,
177-
&render_size,
178-
);
179-
180173
let surface_texture = self.renderer_wrapper.surface.get_current_texture().unwrap();
181174
let surface_texture_view = surface_texture
182175
.texture
@@ -186,26 +179,18 @@ impl AppState {
186179
.renderer_wrapper
187180
.device
188181
.create_command_encoder(&wgpu::CommandEncoderDescriptor { label: None });
189-
{
190-
let mut pass = encoder.begin_render_pass(&wgpu::RenderPassDescriptor {
191-
label: None,
192-
color_attachments: &[Some(wgpu::RenderPassColorAttachment {
193-
view: &surface_texture_view,
194-
resolve_target: None,
195-
ops: wgpu::Operations {
196-
load: wgpu::LoadOp::Clear(wgpu::Color::BLACK),
197-
store: wgpu::StoreOp::Store,
198-
},
199-
})],
200-
depth_stencil_attachment: None,
201-
occlusion_query_set: None,
202-
timestamp_writes: None,
203-
});
204-
205-
self.renderer_wrapper
206-
.renderer
207-
.render(&self.scene, &mut pass);
208-
}
182+
183+
self.renderer_wrapper
184+
.renderer
185+
.render(
186+
&self.scene,
187+
&self.renderer_wrapper.device,
188+
&self.renderer_wrapper.queue,
189+
&mut encoder,
190+
&render_size,
191+
&surface_texture_view,
192+
)
193+
.unwrap();
209194

210195
self.renderer_wrapper.queue.submit([encoder.finish()]);
211196
surface_texture.present();
@@ -504,32 +489,24 @@ pub async fn render_scene(scene: vello_hybrid::Scene, width: u16, height: u16) {
504489
width: width as u32,
505490
height: height as u32,
506491
};
507-
renderer.prepare(&device, &queue, &scene, &render_size);
508-
509492
let surface_texture = surface.get_current_texture().unwrap();
510493
let surface_texture_view = surface_texture
511494
.texture
512495
.create_view(&wgpu::TextureViewDescriptor::default());
513496

514497
let mut encoder =
515498
device.create_command_encoder(&wgpu::CommandEncoderDescriptor { label: None });
516-
{
517-
let mut pass = encoder.begin_render_pass(&wgpu::RenderPassDescriptor {
518-
label: None,
519-
color_attachments: &[Some(wgpu::RenderPassColorAttachment {
520-
view: &surface_texture_view,
521-
resolve_target: None,
522-
ops: wgpu::Operations {
523-
load: wgpu::LoadOp::Clear(wgpu::Color::BLACK),
524-
store: wgpu::StoreOp::Store,
525-
},
526-
})],
527-
depth_stencil_attachment: None,
528-
occlusion_query_set: None,
529-
timestamp_writes: None,
530-
});
531-
renderer.render(&scene, &mut pass);
532-
}
499+
500+
renderer
501+
.render(
502+
&scene,
503+
&device,
504+
&queue,
505+
&mut encoder,
506+
&render_size,
507+
&surface_texture_view,
508+
)
509+
.unwrap();
533510

534511
queue.submit([encoder.finish()]);
535512
surface_texture.present();

sparse_strips/vello_hybrid/examples/winit/src/main.rs

Lines changed: 12 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@ use vello_common::color::{AlphaColor, Srgb};
1313
use vello_common::kurbo::{Affine, Vec2};
1414
use vello_hybrid::{RenderSize, Renderer, Scene};
1515
use vello_hybrid_scenes::{AnyScene, get_example_scenes};
16-
use wgpu::RenderPassDescriptor;
1716
use winit::{
1817
application::ApplicationHandler,
1918
event::{ElementState, KeyEvent, MouseButton, MouseScrollDelta, WindowEvent},
@@ -271,12 +270,6 @@ impl ApplicationHandler for App<'_> {
271270
width: surface.config.width,
272271
height: surface.config.height,
273272
};
274-
self.renderers[surface.dev_id].as_mut().unwrap().prepare(
275-
&device_handle.device,
276-
&device_handle.queue,
277-
&self.scene,
278-
&render_size,
279-
);
280273

281274
let surface_texture = surface
282275
.surface
@@ -293,26 +286,18 @@ impl ApplicationHandler for App<'_> {
293286
.create_command_encoder(&wgpu::CommandEncoderDescriptor {
294287
label: Some("Vello Render to Surface pass"),
295288
});
296-
{
297-
let mut pass = encoder.begin_render_pass(&RenderPassDescriptor {
298-
label: Some("Render to Texture Pass"),
299-
color_attachments: &[Some(wgpu::RenderPassColorAttachment {
300-
view: &texture_view,
301-
resolve_target: None,
302-
ops: wgpu::Operations {
303-
load: wgpu::LoadOp::Clear(wgpu::Color::BLACK),
304-
store: wgpu::StoreOp::Store,
305-
},
306-
})],
307-
depth_stencil_attachment: None,
308-
occlusion_query_set: None,
309-
timestamp_writes: None,
310-
});
311-
self.renderers[surface.dev_id]
312-
.as_mut()
313-
.unwrap()
314-
.render(&self.scene, &mut pass);
315-
}
289+
self.renderers[surface.dev_id]
290+
.as_mut()
291+
.unwrap()
292+
.render(
293+
&self.scene,
294+
&device_handle.device,
295+
&device_handle.queue,
296+
&mut encoder,
297+
&render_size,
298+
&texture_view,
299+
)
300+
.unwrap();
316301

317302
device_handle.queue.submit([encoder.finish()]);
318303
surface_texture.present();
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
// Copyright 2025 the Vello Authors
2+
// SPDX-License-Identifier: Apache-2.0 OR MIT
3+
4+
// This shader clears specific slots in slot textures to transparent pixels.
5+
6+
// Assumes this texture consists of a single column of slots of `config.slot_height`,
7+
// numbering from 0 to `texture_height / slot_height - 1` from top to bottom.
8+
9+
struct Config {
10+
// Width of a slot (matching `WideTile::WIDTH` and the width of a slot texture).
11+
slot_width: u32,
12+
// Height of a slot (matching `Tile::HEIGHT`)
13+
slot_height: u32,
14+
// Total height of the texture (slot_height * number_of_slots)
15+
texture_height: u32,
16+
// Padding for 16-byte alignment
17+
_padding: u32,
18+
}
19+
20+
@group(0) @binding(0)
21+
var<uniform> config: Config;
22+
23+
@vertex
24+
fn vs_main(
25+
@builtin(vertex_index) vertex_index: u32,
26+
@location(0) index: u32,
27+
) -> @builtin(position) vec4<f32> {
28+
// Map vertex_index (0-3) to quad corners:
29+
// 0 → (0,0), 1 → (1,0), 2 → (0,1), 3 → (1,1)
30+
let x = f32(vertex_index & 1u);
31+
let y = f32(vertex_index >> 1u);
32+
33+
// Calculate the y-position based on the slot index
34+
let slot_y_offset = f32(index * config.slot_height);
35+
36+
// Scale to match slot dimensions
37+
let pix_x = x * f32(config.slot_width);
38+
let pix_y = slot_y_offset + y * f32(config.slot_height);
39+
40+
// Convert to NDC
41+
let ndc_x = pix_x * 2.0 / f32(config.slot_width) - 1.0;
42+
let ndc_y = 1.0 - pix_y * 2.0 / f32(config.texture_height);
43+
44+
return vec4<f32>(ndc_x, ndc_y, 0.0, 1.0);
45+
}
46+
47+
@fragment
48+
fn fs_main(@builtin(position) position: vec4<f32>) -> @location(0) vec4<f32> {
49+
// Clear with transparent pixels
50+
return vec4<f32>(0.0, 0.0, 0.0, 0.0);
51+
}

sparse_strips/vello_hybrid/shaders/sparse_strip_renderer.wgsl renamed to sparse_strips/vello_hybrid/shaders/render_strips.wgsl

Lines changed: 32 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,19 @@
99
//
1010
// The alpha values are stored in a texture and sampled during fragment shading.
1111
// This approach optimizes memory usage by only storing alpha data where needed.
12+
//
13+
// The `StripInstance`'s `rgba_or_slot` field can either encode a color or a slot index.
14+
// If the alpha value is non-zero, the fragment shader samples the alpha texture.
15+
// Otherwise, the fragment shader samples the source clip texture using the given slot index.
1216

1317
struct Config {
14-
// Width of the rendering target
18+
// Width of the rendering target
1519
width: u32,
1620
// Height of the rendering target
1721
height: u32,
1822
// Height of a strip in the rendering
23+
// CAUTION: When changing this value, you must also update the fragment shader's
24+
// logic to handle the new strip height.
1925
strip_height: u32,
2026
// Number of trailing zeros in alphas_tex_width (log2 of width).
2127
// Pre-calculated on CPU since WebGL2 doesn't support `firstTrailingBit`.
@@ -29,17 +35,17 @@ struct StripInstance {
2935
@location(1) widths: u32,
3036
// Alpha texture column index where this strip's alpha values begin
3137
@location(2) col: u32,
32-
// [r, g, b, a] packed as u8's
33-
@location(3) rgba: u32,
38+
// [r, g, b, a] packed as u8's or a slot index when alpha is 0
39+
@location(3) rgba_or_slot: u32,
3440
}
3541

3642
struct VertexOutput {
37-
// Texture coordinates for the current fragment
43+
// Texture coordinates for the current fragment
3844
@location(0) tex_coord: vec2<f32>,
3945
// Ending x-position of the dense (alpha) region
4046
@location(1) @interpolate(flat) dense_end: u32,
41-
// RGBA color value
42-
@location(2) @interpolate(flat) color: u32,
47+
// Color value or slot index when alpha is 0
48+
@location(2) @interpolate(flat) rgba_or_slot: u32,
4349
// Normalized device coordinates (NDC) for the current vertex
4450
@builtin(position) position: vec4<f32>,
4551
};
@@ -77,21 +83,22 @@ fn vs_main(
7783

7884
out.position = vec4<f32>(ndc_x, ndc_y, 0.0, 1.0);
7985
out.tex_coord = vec2<f32>(f32(instance.col) + x * f32(width), y * f32(config.strip_height));
80-
out.color = instance.rgba;
86+
out.rgba_or_slot = instance.rgba_or_slot;
8187
return out;
8288
}
8389

8490
@group(0) @binding(0)
8591
var alphas_texture: texture_2d<u32>;
8692

93+
@group(0) @binding(2)
94+
var clip_input_texture: texture_2d<f32>;
95+
8796
@fragment
8897
fn fs_main(in: VertexOutput) -> @location(0) vec4<f32> {
8998
let x = u32(floor(in.tex_coord.x));
9099
var alpha = 1.0;
91100
// Determine if the current fragment is within the dense (alpha) region
92101
// If so, sample the alpha value from the texture; otherwise, alpha remains fully opaque (1.0)
93-
// TODO: This is a branch, but we can make it branchless by using a select
94-
// would it be faster to do a texture lookup for every pixel?
95102
if x < in.dense_end {
96103
let y = u32(floor(in.tex_coord.y));
97104
// Retrieve alpha value from the texture. We store 16 1-byte alpha
@@ -108,18 +115,28 @@ fn fs_main(in: VertexOutput) -> @location(0) vec4<f32> {
108115
let channel_index = alphas_index % 4u;
109116
// Calculate texel coordinates
110117
let tex_x = texel_index & (alphas_tex_width - 1u);
111-
let tex_y = texel_index >> config.alphas_tex_width_bits;
112-
118+
let tex_y = texel_index >> config.alphas_tex_width_bits;
119+
113120
// Load all 4 channels from the texture
114121
let rgba_values = textureLoad(alphas_texture, vec2<u32>(tex_x, tex_y), 0);
115-
122+
116123
// Get the column's alphas from the appropriate RGBA channel based on the index
117124
let alphas_u32 = unpack_alphas_from_channel(rgba_values, channel_index);
118125
// Extract the alpha value for the current y-position from the packed u32 data
119126
alpha = f32((alphas_u32 >> (y * 8u)) & 0xffu) * (1.0 / 255.0);
120127
}
121-
// Apply the alpha value to the unpacked RGBA color
122-
return alpha * unpack4x8unorm(in.color);
128+
// Apply the alpha value to the unpacked RGBA color or slot index
129+
let alpha_byte = in.rgba_or_slot >> 24u;
130+
if alpha_byte != 0 {
131+
// in.rgba_or_slot encodes a color
132+
return alpha * unpack4x8unorm(in.rgba_or_slot);
133+
} else {
134+
// in.rgba_or_slot encodes a slot in the source clip texture
135+
let clip_x = u32(in.position.x) & 0xFFu;
136+
let clip_y = (u32(in.position.y) & 3) + in.rgba_or_slot * config.strip_height;
137+
let clip_in_color = textureLoad(clip_input_texture, vec2(clip_x, clip_y), 0);
138+
return alpha * clip_in_color;
139+
}
123140
}
124141

125142
fn unpack_alphas_from_channel(rgba: vec4<u32>, channel_index: u32) -> u32 {
@@ -136,6 +153,7 @@ fn unpack_alphas_from_channel(rgba: vec4<u32>, channel_index: u32) -> u32 {
136153
// Polyfills `unpack4x8unorm`.
137154
//
138155
// Downlevel targets do not support native WGSL `unpack4x8unorm`.
156+
// TODO: Remove once we upgrade to WGPU 25.
139157
fn unpack4x8unorm(rgba_packed: u32) -> vec4<f32> {
140158
// Extract each byte and convert to float in range [0,1]
141159
return vec4<f32>(

0 commit comments

Comments
 (0)