[Vello Hybrid]: Clipping (Spatiotemporal Allocation) #957

taj-p · 2025-05-05T04:06:46Z

Context

Hooks up and addresses all the TODO's in Raph's sketch (#934). See this thread #vello > Spatiotemporal allocation (hybrid) @ 💬.

Notes

Made GpuResources non-optional - I thought this was more trouble than it was worth.
Adds in new clear_slots pipeline to support fine grained clearing of slots in slot textures (needed for spatiotemporal allocation)
Regenerates test snapshots
Enables the clipping test suite for vello_hybrid
There are many performance wins to be had. This PR is pretty big already, so I think these are worth following up separately.
- Not re-creating buffers for each render pass (re-using the allocations between calls)
- Using a staging belt (to prevent allocating an extra staging buffer per write_buffer)
- Perhaps allowing more than 1 column of slots per slot texture.

I've copied the documentation from schedule.rs below for information about how spatiotemporal allocation works:

Scheduling

Draw commands are either issued to the final target or slots in a clip texture.
Rounds represent a draw in up to 3 render targets (two clip textures and a final target).
The clip texture stores slots for many clip depths. Once our clip textures are full,
we flush rounds (i.e. execute render passes) to free up space. Note that a slot refers
to 1 wide tile's worth of pixels in the clip texture.
The free vector contains the indices of the slots that are available for use in the two clip textures.

Example

Consider the following scene of drawing a single wide tile with three overlapping rectangles with
decreasing width clipping regions.

const WIDTH: f64 = 100.0;
const HEIGHT: f64 = Tile::HEIGHT as f64;
const OFFSET: f64 = WIDTH / 3.0;

let colors = [RED, GREEN, BLUE];

for i in 0..3 {
    let clip_rect = Rect::new((i as f64) * OFFSET, 0.0, 100, HEIGHT);
    ctx.push_clip_layer(&clip_rect.to_path(0.1));
    ctx.set_paint(colors[i]);
    ctx.fill_rect(&Rect::new(0.0, 0.0, WIDTH, HEIGHT));
}
for _ in 0..3 {
    ctx.pop_layer();
}

This single wide tile scene should produce the below rendering:

┌────────────────────────────┌────────────────────────────┌─────────────────────────────
│      ──              ───   │       /     /       /     /│        ──────────────      │
│  ────            ────      │      /     /       /     / │────────                    │
│──           ─────          │     /     /       /     /  │                            │
│        ───Red              │    /     /Green  /     /   │           Blue             │
│    ────                ──  │   /     /       /     /    │                     ───────│
│ ───                ────    │  /     /       /     /     │       ──────────────       │
│                  ──        │ /     /       /     /      │───────                     │
└────────────────────────────└────────────────────────────└────────────────────────────┘

How the scene is scheduled into rounds and draw calls are shown below:

Round 0

In this round, we don't have any preserved slots or slots that we need to sample from. Simply,
draw unclipped primitives.

Draw to texture 0:

In Slot N - 1 of texture 0, draw the unclipped green rectangle.

Slot N - 1:

┌──────────────────────────────────────────────────────────────────────────────────────┐
│       /     /       /     /        /     /       /     /       /     /       /     / │
│      /     /       /     /        /     /       /     /       /     /       /     /  │
│     /     /       /     /        /     /       /     /       /     /       /     /   │
│    /     /       /     /        /     / Green /     /       /     /       /     /    │
│   /     /       /     /        /     /       /     /       /     /       /     /     │
│  /     /       /     /        /     /       /     /       /     /       /     /      │
│ /     /       /     /        /     /       /     /       /     /       /     /       │
└──────────────────────────────────────────────────────────────────────────────────────┘

Draw to texture 1:

In Slot N - 2 of texture 1, draw unclipped red rectangle and, in slot N - 1, draw the unclipped
blue rectangle.

Slot N - 2:

┌──────────────────────────────────────────────────────────────────────────────────────┐
│      ──              ───                            ──              ───              │
│  ────            ────               ──          ────            ────               ──│
│──           ─────               ────          ──           ─────               ────  │
│        ─────                ────        Red           ─────                ────      │
│    ────                 ────                      ────                 ────          │
│ ───                 ────                       ───                 ────              │
│                  ───                                            ───                  │
└──────────────────────────────────────────────────────────────────────────────────────┘

Slot N - 1:

┌──────────────────────────────────────────────────────────────────────────────────────┐
│                                           ────────────────────────────────────────── │
│───────────────────────────────────────────                                           │
│                                                                                      │
│                                         Blue                          ───────────────│
│                                           ────────────────────────────               │
│               ────────────────────────────                                           │
│───────────────                                                                       │
└──────────────────────────────────────────────────────────────────────────────────────┘

Round 1

At this point, we have three slots that contain our unclipped rectangles. In this round,
we start to sample those pixels to apply clipping (texture 1 samples from texture 0 and
the render target view samples from texture 1).

Draw to texture 0:

Slot N - 1 of texture 0 contains our unclipped green rectangle. In this draw, we sample
the pixels from slot N - 2 from texture 1 to draw the blue rectangle into this slot.

Slot N - 1:

┌─────────────────────────────────────────────────────────┌─────────────────────────────
│        /     /       /     /       /     /       /     /│        ──────────────      │
│       /     /       /     /       /     /       /     / │────────                    │
│      /     /       /     /       /     /       /     /  │                            │
│     /     /       /  Green      /     /       /     /   │           Blue             │
│    /     /       /     /       /     /       /     /    │                     ───────│
│   /     /       /     /       /     /       /     /     │       ──────────────       │
│  /     /       /     /       /     /       /     /      │───────                     │
└─────────────────────────────────────────────────────────└────────────────────────────┘

Draw to texture 1:

Then, into Slot N - 2 of texture 1, which contains our red rectangle, we sample the pixels
from slot N - 1 of texture 0 which contain our green and blue rectangles.


┌────────────────────────────┌────────────────────────────┌─────────────────────────────
│      ──              ───   │       /     /       /     /│        ──────────────      │
│  ────            ────      │      /     /       /     / │────────                    │
│──           ─────          │     /     /       /     /  │                            │
│        ───Red              │    /     /Green  /     /   │           Blue             │
│    ────                ──  │   /     /       /     /    │                     ───────│
│ ───                ────    │  /     /       /     /     │       ──────────────       │
│                  ──        │ /     /       /     /      │───────                     │
└────────────────────────────└────────────────────────────└────────────────────────────┘

Draw to render target

At this point, we can sample the pixels from slot N - 1 of texture 1 to draw the final
rendition.

Nuances

When there are no clip/blend regions, we can render directly to the final target.
The above example provides an intuitive explanation for how rounds after 3 clip depths
are scheduled. At clip depths 1 and 2, we can draw directly to the final target within a
single round.
Before drawing into any slot, we need to clear it. If all slots can be cleared or are free,
we can use a LoadOp::Clear operation. Otherwise, we need to clear the dirty slots using
a fine grained render pass.

For more information about this algorithm, see this Zulip thread.

This commit has a sketch of spatio-temporal allocation for clipping, but it is not fully wired up yet. Scenes without clipping should work, but there is a fair amount of TODO remaining for clipping. There's a fair amount of refactoring here. The biggest change is that draw calls and render passes can be issued from inside the scheduler, as opposed to separate "prepare" and "render" calls. The number of render passes needed will vary by the scene.

sparse_strips/vello_sparse_tests/snapshots/image_with_transform_rotate_1.png

taj-p · 2025-05-05T04:10:20Z

sparse_strips/vello_sparse_tests/tests/clip.rs

@@ -67,6 +91,50 @@ fn clip_rectangle_with_star_evenodd(ctx: &mut impl Renderer) {
    ctx.pop_layer();
 }

+#[vello_test]
+fn clip_deeply_nested_circles(ctx: &mut impl Renderer) {


I would like this test to exercise creating many rounds in the scheduler.

taj-p · 2025-05-05T04:12:29Z

sparse_strips/vello_hybrid/src/render.rs

+        // TODO: We currently allocate a new strips buffer for each render pass. A more efficient
+        // approach would be to re-use buffers or slices of a larger buffer.


This is such a large refactor of vello hybrid that I wanted to keep optimisations like this out of this PR to limit added complexity.

taj-p · 2025-05-05T05:41:17Z

cc @ajakubowicz-canva

sparse_strips/vello_sparse_tests/snapshots/fill_command_respects_clip_bounds.png

ajakubowicz-canva · 2025-05-05T22:45:07Z

sparse_strips/vello_hybrid/src/render.rs

-        let alphas_texture =
-            Self::make_alphas_texture(device, max_texture_dimension_2d, alpha_texture_height);
-        let alpha_data = vec![0; (max_texture_dimension_2d * alpha_texture_height * 16) as usize];
+        const INITIAL_ALPHA_TEXTURE_HEIGHT: u32 = 1;


Non actionable – for my own understanding, why is it a safe change to change INITIAL_ALPHA_TEXTURE_HEIGHT from 2 to 1? What does it impact?

This was motivated by wanting to keep initial allocations small. If the scene requires more alphas, that can be done in prepare (where we re-allocate GPU resources to contain the scene we want to render).

sparse_strips/vello_api/src/mask.rs

sparse_strips/vello_api/src/paint.rs

ajakubowicz-canva · 2025-05-07T03:59:02Z

sparse_strips/vello_hybrid/src/render.rs

+            });
+
+        let strip_shader = device.create_shader_module(wgpu::ShaderModuleDescriptor {
+            label: Some("Strip Shader"),


Nit: may be nice if label matches file - e.g. Render Strips Shader. Similarly strip_shader -> render_strips_shader

I didn't like the render_* prefix because render_ could arguably be put in front of so many classes of things. I also didn't like renaming the shaders to strips. I'll sleep on it, but if anyone has a better name, please let me know 🙏 . Maybe "fine" for fine rasterisation (but arguably that includes "clear_slots" too).

ajakubowicz-canva · 2025-05-07T04:06:15Z

sparse_strips/vello_hybrid/src/lib.rs

+/// Errors that can occur during rendering.
+#[derive(Error, Debug)]
+pub enum RenderError {
+    /// No slots available for rendering.
+    ///
+    /// This error is likely to occur if a scene has an extreme number of nested layers
+    /// (clipping, blending, masks, or opacity layers).
+    ///
+    /// TODO: Consider supporting more than a single column of slots in slot textures.
+    #[error("No slots available for rendering")]
+    SlotsExhausted,
+}


Are there other errors we will want to add?

Should unwrap and expect methods actually return a RenderError?

I'm weary about making even more changes to vello_hybrid in this single PR, so I've left a TODO and will leave this as a separate exercise

Added in 41f78ba

Sorry I wasn't clear in the initial comment. I meant this as a philosophical question about the future. Do you expect any fallible behaviors to be moved into this enum in the future?

Yes. For example, I think when there are too many alphas to fit into the device's alpha texture, we should return an appropriate render error.

The question of returning a RenderError for unwraps and expects which we believe are safe is another question entirely. Historically, if I believe I'm smarter than the compiler, I lean towards panicking, but I think this decision should be made by Linebender as a group. I'll ask about it in Utrecht.

I've unresolved this so that it gets greater visibility from other reviewers.

sparse_strips/vello_hybrid/src/render.rs

ajakubowicz-canva · 2025-05-07T04:27:15Z

sparse_strips/vello_hybrid/src/render.rs

+                    sample_count: 1,
+                    dimension: wgpu::TextureDimension::D2,
+                    // TODO: Is this correct or need it be RGBA8Unorm?
+                    format: render_target_config.format,


Who sets the rendering target? It looks like it's set by the caller.

In our render_to_file example, we appear to pass Rgba8Unorm as the render target config format. Whilst for the winit example we pass Bgra8Unorm as the surface format.

I guess another question is related to the texture definition in the shader. In wgsl clip_input_texture is typed as texture_2d<f32>.

Is it expected that the vello_hybrid renderer works with all wgsl texture formats? (sorry for the naivety).

I don't think this is a naive question. We only support Rgba8Unorm and Bgra8Unorm currently I believe. I'll confirm this.

ajakubowicz-canva · 2025-05-07T04:38:01Z

sparse_strips/vello_hybrid/src/render.rs

+        });
+        let clear_slot_indices_buffer = Self::make_clear_slot_indices_buffer(
+            device,
+            slot_count as u64 * size_of::<u32>() as u64,


The u32 attribute used for the slot index that's passed to the clear slots wgsl shader seems brittle in it's definition when configuring the descriptors / pipeline.

A parallel is the GpuStrip which encapsulates the attribute passed to the render_strips shader. Is it worth using the New Type Idiom to wrap this u32 such that it's traceable to all locations that need to use the accurate size?

For example this line of code. But also the array_stride in the clear_pipeline?
It could also provide the attributes?

Maybe this isn't worth doing for 2 code locations.

sparse_strips/vello_hybrid/src/render.rs

ajakubowicz-canva · 2025-05-07T06:26:17Z

sparse_strips/vello_hybrid/src/scene.rs

+        if opacity.is_some() {
+            unimplemented!()
+        }


I may figure this out as I keep reviewing - but I am naively confused how opacity is not implemented or even how it's expected to be implemented, when there are tests like clip_with_opacity.

Is this assumption accurate? In the test clip_with_opacity, a clip layer is pushed, and then a rect is filled with alpha 0.5. Would the equivalent in the future be: push a clip layer with opacity set to 0.5, and then paint a fully opaque rect. Would the opacity in the clip layer transfer to the child drawn within it and the two cases would render the same?

Also, I just realized I am a little confused about using the name alpha vs opacity. Are they the same thing, e.g. an alpha of 1 and an opacity of 1 are both opaque. Should terms be consolidated?

Edit: yep, I confirmed opacity here behaves as I expect.

ajakubowicz-canva · 2025-05-07T06:31:02Z

sparse_strips/vello_hybrid/src/scene.rs

+            clip,
+            BlendMode::new(Mix::Normal, Compose::SrcOver),
+            None,


I realize I don't actually fully understand the difference between clipping and masking.
Clarifying: A mask is essentially a pixmap (alpha or luminance), whilst a clip_path lets you clip via a bezier path. It's not immediately intuitive that a clip_path is much more challenging than a mask that's "straightforward".

Clipping is basically a special case of an alpha mask, where everything inside of the clip shape is fully opaque in the alpha mask, and everything outside if fully transparent. So you could in theory emulate it with an alpha mask, but since clipping is such a common case we have a custom logic that is more complex, but faster.

ajakubowicz-canva · 2025-05-07T06:41:17Z

sparse_strips/vello_sparse_tests/tests/clip.rs

+
+    for (i, color) in colors.iter().enumerate() {
+        let clip_rect = Rect::new((i as f64) * OFFSET, 0.0, WIDTH, HEIGHT);
+        ctx.push_clip_layer(&clip_rect.to_path(0.1));


Nit: Worth defining 0.1 as DEFAULT_TOLERANCE (similar to vello_cpu)?

Yes, definitely. But since that touches every file in tests/*, let's leave it for a separate PR.

sparse_strips/vello_sparse_tests/tests/clip.rs

sparse_strips/vello_sparse_tests/tests/renderer.rs

ajakubowicz-canva

I have now technically read over all the code and it looks great! I will need to give the schedular a second pass as there are still some mysteries in there.

Unfortunately as someone relatively new to this repo, my comments have been focused on nits and syntax, and less on the overall design. However, overall design seems to be working per tests.

Great work!

sparse_strips/vello_hybrid/src/schedule.rs

ajakubowicz-canva · 2025-05-07T17:16:05Z

sparse_strips/vello_hybrid/src/schedule.rs

+//! At this point, we can sample the pixels from slot N - 1 of texture 1 to draw the final
+//! result.
+//!
+//! ## Nuances


I think this section is missing how slot depth maps to the slot textures. E.g. odd vs even slot depths.

Added notes in c1e9097

sparse_strips/vello_hybrid/src/schedule.rs

ajakubowicz-canva · 2025-05-07T17:26:59Z

sparse_strips/vello_hybrid/src/schedule.rs

+        junk: &mut RendererJunk<'_>,
+        scene: &Scene,
+    ) -> Result<(), RenderError> {
+        let mut tile_state = mem::take(&mut self.tile_state);


I'm not quite following why we don't just create a brand new tile_state vec allocation here, vs storing it on self?

Naively, mem::take replaces tile_state with an empty vec, does that incur an allocation? Then at the end we re-set tile_state back on self.

This comment seems perf related so can be ignored.

Empty vectors in rust do not allocate, so taking the memory and returning it later allows us to reuse the allocation without creating more allocations

sparse_strips/vello_hybrid/src/schedule.rs

ajakubowicz-canva · 2025-05-07T17:50:24Z

sparse_strips/vello_hybrid/shaders/render_strips.wgsl

+        let clip_x = u32(in.position.x) & 0xFFu;
+        let clip_y = (u32(in.position.y) & 3) + in.rgba_or_slot * config.strip_height;


It is not clear to me how in.position.y has been modified in this PR. From the scheduler, it looks like it also contains the slot_ix * Tile::HEIGHT, or slot y position. So why the & 3?

I think the answer to this question should also potentially result in a code comment in the wgsl shader.

in.position represents the position builtin. In a vertex shader, it ranges from -1 to 1 for X and Y. In the context of a fragment shader, it represents the pixel coordinate of where we are drawing (see this article). The & 3 is used to constrain the pixel coordinate to 4 since that's the height of our tile. I added a CAUTION: note to the config.strip_height about the danger in changing its value without updating this logic.

In time, we will want to make this configurable, but I'm not sure how that will present. We could make the & 3 configurable, but then we should also make the & 0xFFu configurable to wide tile width. I think at this stage we should untangle those concerns when we get to them.

Oh whoops, I got my x and y's confused! Thanks for the great answer :D

ajakubowicz-canva · 2025-05-07T17:57:27Z

sparse_strips/vello_hybrid/src/schedule.rs

+                    });
+                }
+                Cmd::PushBuf => {
+                    let ix = clip_depth % 2;


There are 3 draws in a round representing, slot 0, slot 1, and final texture. In draw_mut the Draw slot corresponds to 1 - clip_depth % 2.
Should these be aligned such that this is 1 - clip_depth % 2. I think all that should do is ensure the slot 0 free vec and slot 0 draw are on the same index and it shouldn't change logic.

Edit: I think it's all a bit more complex. The choice between the two is intentional and changes in multiple places.

ajakubowicz-canva

Great work!

LaurenzV

Just rubber stamping since we discussed in office hours that this is fine to merge, and Andrew doesn't seem to have write permissions yet.

ajakubowicz-canva · 2025-05-09T02:00:09Z

sparse_strips/vello_hybrid/src/schedule.rs

+                    debug_assert!(
+                        has_non_zero_alpha(rgba),
+                        "Color fields with 0 alpha are reserved for clipping"
+                    );


Raised in Zulip #vello > Vello Hybrid Crashing when alpha = 0 @ 💬.

What is the recommendation if someone wants to pass a fully transparent fill? Additionally, is there a use-case of painting something fully transparent?

### Context This PR follows the conversation had about #947 . I made this PR separately as it also incorporates the clipping changes #957 . In short, this PR adds a native WebGL backend when targeting `wasm32` and if using the `"webgl"` feature on `vello_hybrid`. The **primary motivation** of using a custom webgl renderer is binary size, allowing 3mb to be removed when targeting WebGL2 natively. This is achieved by omitting `wgpu` from the binary when the architecture is `wasm32` and the `"webgl"` feature flag is set on `vello_hybrid`. ### Changes #### vello_hybrid examples - The `webgl` example has been renamed to `wgpu_webgl`. Now it's more clear that it leverages `wgpu`'s WebGL backend. - A `native_webgl` example has been added which uses the new WebGL renderer backend. - `ci.yml` tests both the `wgpu_webgl` example and the `native_webgl` example - smoke testing both webgl techniques. - A new `ClipScene` has been added for manually viewing and testing deeply nested clipping. ([file](https://github.com/linebender/vello/pull/1011/files#diff-ef57b226886dac928b079c4743d6ed1c86ced27637edca1b60c496c95f03479b)) The PR can be manually tested by locally pulling the branch and running the two examples: - `cargo run_wasm -p wgpu_webgl --release`: Test original example - `cargo run_wasm -p native_webgl --release`: Test new backend #### New `vello_sparse_shaders` package added This new package contains the WGSL shaders as a source of truth. `vello_hybrid` optionally depends on this library which triggers a build step generating a compiled module. The module contains GLSL shader source code, as well as mappings from the WGSL identifiers to the naga-mangled identifiers in the GLSL. <details><summary>The generated code:</summary> ```rs // Generated code by `vello_sparse_shaders` - DO NOT EDIT /// Build time GLSL shaders derived from wgsl shaders. /// Compiled glsl for `clear_slots.wgsl` pub mod clear_slots { #![allow(missing_docs, reason="No metadata to generate precise documentation forgenerated code.")] pub const VERTEX_SOURCE: &str = r###"#version 300 es precision highp float; precision highp int; struct Config { uint slot_width; uint slot_height; uint texture_height; uint _padding; }; uniform Config_block_0Vertex { Config _group_0_binding_0_vs; }; layout(location = 0) in uint _p2vs_location0; void main() { uint vertex_index = uint(gl_VertexID); uint index = _p2vs_location0; float x = float((vertex_index & 1u)); float y = float((vertex_index >> 1u)); uint _e10 = _group_0_binding_0_vs.slot_height; float slot_y_offset = float((index * _e10)); uint _e15 = _group_0_binding_0_vs.slot_width; float pix_x = (x * float(_e15)); uint _e20 = _group_0_binding_0_vs.slot_height; float pix_y = (slot_y_offset + (y * float(_e20))); uint _e28 = _group_0_binding_0_vs.slot_width; float ndc_x = (((pix_x * 2.0) / float(_e28)) - 1.0); uint _e37 = _group_0_binding_0_vs.texture_height; float ndc_y = (1.0 - ((pix_y * 2.0) / float(_e37))); gl_Position = vec4(ndc_x, ndc_y, 0.0, 1.0); gl_Position.yz = vec2(-gl_Position.y, gl_Position.z * 2.0 - gl_Position.w); return; } "###; pub mod vertex { pub const CONFIG: &str = "Config_block_0Vertex"; } pub const FRAGMENT_SOURCE: &str = r###"#version 300 es precision highp float; precision highp int; struct Config { uint slot_width; uint slot_height; uint texture_height; uint _padding; }; layout(location = 0) out vec4 _fs2p_location0; void main() { vec4 position = gl_FragCoord; _fs2p_location0 = vec4(0.0, 0.0, 0.0, 0.0); return; } "###; } /// Compiled glsl for `render_strips.wgsl` pub mod render_strips { #![allow(missing_docs, reason="No metadata to generate precise documentation forgenerated code.")] pub const VERTEX_SOURCE: &str = r###"#version 300 es precision highp float; precision highp int; struct Config { uint width; uint height; uint strip_height; uint alphas_tex_width_bits; }; struct StripInstance { uint xy; uint widths; uint col; uint rgba_or_slot; }; struct VertexOutput { vec2 tex_coord; uint dense_end; uint rgba_or_slot; vec4 position; }; uniform Config_block_0Vertex { Config _group_0_binding_1_vs; }; layout(location = 0) in uint _p2vs_location0; layout(location = 1) in uint _p2vs_location1; layout(location = 2) in uint _p2vs_location2; layout(location = 3) in uint _p2vs_location3; smooth out vec2 _vs2fs_location0; flat out uint _vs2fs_location1; flat out uint _vs2fs_location2; uint unpack_alphas_from_channel(uvec4 rgba, uint channel_index) { switch(channel_index) { case 0u: { return rgba.x; } case 1u: { return rgba.y; } case 2u: { return rgba.z; } case 3u: { return rgba.w; } default: { return rgba.x; } } } vec4 unpack4x8unorm(uint rgba_packed) { return vec4((float(((rgba_packed >> 0u) & 255u)) / 255.0), (float(((rgba_packed >> 8u) & 255u)) / 255.0), (float(((rgba_packed >> 16u) & 255u)) / 255.0), (float(((rgba_packed >> 24u) & 255u)) / 255.0)); } void main() { uint in_vertex_index = uint(gl_VertexID); StripInstance instance = StripInstance(_p2vs_location0, _p2vs_location1, _p2vs_location2, _p2vs_location3); VertexOutput out_ = VertexOutput(vec2(0.0), 0u, 0u, vec4(0.0)); float x = float((in_vertex_index & 1u)); float y = float((in_vertex_index >> 1u)); uint x0_ = (instance.xy & 65535u); uint y0_ = (instance.xy >> 16u); uint width = (instance.widths & 65535u); uint dense_width = (instance.widths >> 16u); out_.dense_end = (instance.col + dense_width); float pix_x = (float(x0_) + (float(width) * x)); uint _e31 = _group_0_binding_1_vs.strip_height; float pix_y = (float(y0_) + (y * float(_e31))); uint _e39 = _group_0_binding_1_vs.width; float ndc_x = (((pix_x * 2.0) / float(_e39)) - 1.0); uint _e48 = _group_0_binding_1_vs.height; float ndc_y = (1.0 - ((pix_y * 2.0) / float(_e48))); out_.position = vec4(ndc_x, ndc_y, 0.0, 1.0); uint _e65 = _group_0_binding_1_vs.strip_height; out_.tex_coord = vec2((float(instance.col) + (x * float(width))), (y * float(_e65))); out_.rgba_or_slot = instance.rgba_or_slot; VertexOutput _e71 = out_; _vs2fs_location0 = _e71.tex_coord; _vs2fs_location1 = _e71.dense_end; _vs2fs_location2 = _e71.rgba_or_slot; gl_Position = _e71.position; gl_Position.yz = vec2(-gl_Position.y, gl_Position.z * 2.0 - gl_Position.w); return; } "###; pub mod vertex { pub const CONFIG: &str = "Config_block_0Vertex"; } pub const FRAGMENT_SOURCE: &str = r###"#version 300 es precision highp float; precision highp int; struct Config { uint width; uint height; uint strip_height; uint alphas_tex_width_bits; }; struct StripInstance { uint xy; uint widths; uint col; uint rgba_or_slot; }; struct VertexOutput { vec2 tex_coord; uint dense_end; uint rgba_or_slot; vec4 position; }; uniform Config_block_0Fragment { Config _group_0_binding_1_fs; }; uniform highp usampler2D _group_0_binding_0_fs; uniform highp sampler2D _group_0_binding_2_fs; smooth in vec2 _vs2fs_location0; flat in uint _vs2fs_location1; flat in uint _vs2fs_location2; layout(location = 0) out vec4 _fs2p_location0; uint unpack_alphas_from_channel(uvec4 rgba, uint channel_index) { switch(channel_index) { case 0u: { return rgba.x; } case 1u: { return rgba.y; } case 2u: { return rgba.z; } case 3u: { return rgba.w; } default: { return rgba.x; } } } vec4 unpack4x8unorm(uint rgba_packed) { return vec4((float(((rgba_packed >> 0u) & 255u)) / 255.0), (float(((rgba_packed >> 8u) & 255u)) / 255.0), (float(((rgba_packed >> 16u) & 255u)) / 255.0), (float(((rgba_packed >> 24u) & 255u)) / 255.0)); } void main() { VertexOutput in_ = VertexOutput(_vs2fs_location0, _vs2fs_location1, _vs2fs_location2, gl_FragCoord); float alpha = 1.0; uint alphas_index = uint(floor(in_.tex_coord.x)); if ((alphas_index < in_.dense_end)) { uint y = uint(floor(in_.tex_coord.y)); uvec2 tex_dimensions = uvec2(textureSize(_group_0_binding_0_fs, 0).xy); uint alphas_tex_width = tex_dimensions.x; uint texel_index = (alphas_index / 4u); uint channel_index_1 = (alphas_index % 4u); uint tex_x = (texel_index & (alphas_tex_width - 1u)); uint _e25 = _group_0_binding_1_fs.alphas_tex_width_bits; uint tex_y = (texel_index >> _e25); uvec4 rgba_values = texelFetch(_group_0_binding_0_fs, ivec2(uvec2(tex_x, tex_y)), 0); uint _e31 = unpack_alphas_from_channel(rgba_values, channel_index_1); alpha = (float(((_e31 >> (y * 8u)) & 255u)) * 0.003921569); } uint alpha_byte = (in_.rgba_or_slot >> 24u); if ((alpha_byte != 0u)) { float _e45 = alpha; vec4 _e47 = unpack4x8unorm(in_.rgba_or_slot); _fs2p_location0 = (_e45 * _e47); return; } else { uint clip_x = (uint(in_.position.x) & 255u); uint _e62 = _group_0_binding_1_fs.strip_height; uint clip_y = ((uint(in_.position.y) & 3u) + (in_.rgba_or_slot * _e62)); vec4 clip_in_color = texelFetch(_group_0_binding_2_fs, ivec2(uvec2(clip_x, clip_y)), 0); float _e69 = alpha; _fs2p_location0 = (_e69 * clip_in_color); return; } } "###; pub mod fragment { pub const CONFIG: &str = "Config_block_0Fragment"; pub const ALPHAS_TEXTURE: &str = "_group_0_binding_0_fs"; pub const CLIP_INPUT_TEXTURE: &str = "_group_0_binding_2_fs"; } } ``` </details> The generated code can then be imported with: `use vello_sparse_shaders::{clear_slots, render_strips};` #### `vello_hybrid` changes - A new `render` subdirectory has been added that contains: - `common.rs`: All the shared render logic. - `wgpu.rs`: The original renderer leveraging `wgpu`. - `webgl.rs`: The new WebGL native backend renderer. - The `Scheduler` has been made backend-agnostic by operating on a new `RendererBackend` trait. Both the `wgpu` and `webgl` renderer backends implement `RendererBackend`. #### Feature flag changes Feature flags in `vello_hybrid` are additive. By default the `wgpu` feature is enabled. If the compile target is `wasm32` and the `webgl` feature is enabled on `vello_hybrid`, then the native WebGL renderer will be enabled. #### Warnings A runtime warning has been added that will trigger once on either renderer being instantiated, if both: - `wgpu` with its WebGL backend is active. - The `WebGlRenderer` is also active. The warning is: ``` Both WebGL and wgpu with the "webgl" feature are enabled. For optimal performance and binary size on web targets, use only the dedicated WebGL renderer. ``` ### Screen recording > [!NOTE] > The screen recording below is slightly stale – I've since changed the background to be dark so the white text scene can be read. ![webgl_native](https://github.com/user-attachments/assets/c94fffe9-8249-4a0c-ab46-13cb16097dd2) Left side is `native_webgl` example (using native WebGL2) Right side is the existing `webgl` example which uses `wgpu` with the `webgl` feature flag. ### Test plan To scope down this PR, there are no automated tests for the renderer except for the single browser test introduced in the example. The shader compilation has some unit tests. This PR was manually tested via the new native webgl example: `cargo run_wasm -p native_webgl`. This example can be tested against the original `cargo run_wasm -p wgpu_webgl`. ### Risks The only risk I'm uncertain about is the addition of the `wgpu` feature flag, that is used as a default feature. Could this be a breaking change for users that specify "no default features". They'd have to add the `wgpu` feature explicitly. This seems minor. ### Followup work This PR is huge, because it implements all the existing vello_hybrid features in the WebGL backend. Similarly it also includes build-time shader compilation. Instead of making this change completely impenetrable, I'm splitting test infrastructure into a separate change. This PR must be manually tested in the interim. The example has been added to CI so that it must compile and run.

raphlinus and others added 4 commits April 22, 2025 21:36

Clipping functionality

c220a65

Merge remote-tracking branch 'upstream/main' into tajp/hybrid/clip

4b60027

Fix bad merge

24bd54e

taj-p commented May 5, 2025

View reviewed changes

sparse_strips/vello_sparse_tests/snapshots/image_with_transform_rotate_1.png Outdated Show resolved Hide resolved

taj-p commented May 5, 2025

View reviewed changes

.

dc61c48

taj-p changed the title ~~[Vello Hybrid]: Spatiotemporal Allocation (clipping)~~ [Vello Hybrid]: Clipping (Spatiotemporal Allocation) May 5, 2025

taj-p marked this pull request as ready for review May 5, 2025 05:41

taj-p mentioned this pull request May 5, 2025

Tajp/hybrid/clipping (WIP) taj-p/vello#8

Closed

LaurenzV reviewed May 5, 2025

View reviewed changes

sparse_strips/vello_sparse_tests/snapshots/fill_command_respects_clip_bounds.png Outdated Show resolved Hide resolved

taj-p added 2 commits May 5, 2025 15:59

Revert snapshot changes

2ab36ec

Set initial texture height to 1

9523ded

ajakubowicz-canva reviewed May 5, 2025

View reviewed changes

ajakubowicz-canva reviewed May 7, 2025

View reviewed changes

sparse_strips/vello_hybrid/src/render.rs Show resolved Hide resolved

sparse_strips/vello_hybrid/src/render.rs Outdated Show resolved Hide resolved

sparse_strips/vello_hybrid/src/render.rs Outdated Show resolved Hide resolved

taj-p added 5 commits May 7, 2025 15:46

Comments 1

41f78ba

Don't multiply and use bit shifts for * 16

b7dbcec

Comments 2

e137620

Skip render pass

ccb0ea6

clippy

7585b31

ajakubowicz-canva reviewed May 7, 2025

View reviewed changes

taj-p added 2 commits May 7, 2025 16:34

Fix bad merge

4711939

Pass all args

cbf6ba3

ajakubowicz-canva reviewed May 7, 2025

View reviewed changes

DJMcNab mentioned this pull request May 7, 2025

vello_hybrid: WebGL2 renderer backend that skips wgpu #947

Closed

Comments 3

007ae05

ajakubowicz-canva approved these changes May 7, 2025

View reviewed changes

Formatting

46c7a10

LaurenzV approved these changes May 7, 2025

View reviewed changes

Add notes on clip depths, textures, and rendering

c1e9097

taj-p enabled auto-merge May 7, 2025 22:13

taj-p added this pull request to the merge queue May 7, 2025

Merged via the queue into linebender:main with commit f53fe05 May 7, 2025
17 checks passed

taj-p deleted the tajp/hybrid/clip branch May 7, 2025 22:20

ajakubowicz-canva reviewed May 9, 2025

View reviewed changes

ajakubowicz-canva mentioned this pull request May 16, 2025

vello_hybrid: add native WebGL backend #1011

Merged

		// TODO: We currently allocate a new strips buffer for each render pass. A more efficient
		// approach would be to re-use buffers or slices of a larger buffer.

		let clip_x = u32(in.position.x) & 0xFFu;
		let clip_y = (u32(in.position.y) & 3) + in.rgba_or_slot * config.strip_height;

[Vello Hybrid]: Clipping (Spatiotemporal Allocation) #957

[Vello Hybrid]: Clipping (Spatiotemporal Allocation) #957

Uh oh!

Conversation

taj-p commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Notes

Scheduling

Example

Round 0

Draw to texture 0:

Draw to texture 1:

Round 1

Draw to texture 0:

Draw to texture 1:

Draw to render target

Nuances

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

taj-p commented May 5, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

taj-p May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

taj-p May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

taj-p commented May 5, 2025 •

edited

Loading

taj-p May 6, 2025 •

edited

Loading

taj-p May 7, 2025 •

edited

Loading

taj-p May 7, 2025 •

edited

Loading