Skip to content

[Vello Hybrid]: Clipping (Spatiotemporal Allocation) #957

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
May 7, 2025

Conversation

taj-p
Copy link
Contributor

@taj-p taj-p commented May 5, 2025

Context

Hooks up and addresses all the TODO's in Raph's sketch (#934). See this thread#vello > Spatiotemporal allocation (hybrid) @ 💬.

Notes

  • Made GpuResources non-optional - I thought this was more trouble than it was worth.
  • Adds in new clear_slots pipeline to support fine grained clearing of slots in slot textures (needed for spatiotemporal allocation)
  • Regenerates test snapshots
  • Enables the clipping test suite for vello_hybrid
  • There are many performance wins to be had. This PR is pretty big already, so I think these are worth following up separately.
    • Not re-creating buffers for each render pass (re-using the allocations between calls)
    • Using a staging belt (to prevent allocating an extra staging buffer per write_buffer)
    • Perhaps allowing more than 1 column of slots per slot texture.

I've copied the documentation from schedule.rs below for information about how spatiotemporal allocation works:

Scheduling

  • Draw commands are either issued to the final target or slots in a clip texture.
  • Rounds represent a draw in up to 3 render targets (two clip textures and a final target).
  • The clip texture stores slots for many clip depths. Once our clip textures are full,
    we flush rounds (i.e. execute render passes) to free up space. Note that a slot refers
    to 1 wide tile's worth of pixels in the clip texture.
  • The free vector contains the indices of the slots that are available for use in the two clip textures.

Example

Consider the following scene of drawing a single wide tile with three overlapping rectangles with
decreasing width clipping regions.

const WIDTH: f64 = 100.0;
const HEIGHT: f64 = Tile::HEIGHT as f64;
const OFFSET: f64 = WIDTH / 3.0;

let colors = [RED, GREEN, BLUE];

for i in 0..3 {
    let clip_rect = Rect::new((i as f64) * OFFSET, 0.0, 100, HEIGHT);
    ctx.push_clip_layer(&clip_rect.to_path(0.1));
    ctx.set_paint(colors[i]);
    ctx.fill_rect(&Rect::new(0.0, 0.0, WIDTH, HEIGHT));
}
for _ in 0..3 {
    ctx.pop_layer();
}

This single wide tile scene should produce the below rendering:

┌────────────────────────────┌────────────────────────────┌─────────────────────────────
│      ──              ───   │       /     /       /     /│        ──────────────      │
│  ────            ────      │      /     /       /     / │────────                    │
│──           ─────          │     /     /       /     /  │                            │
│        ───Red              │    /     /Green  /     /   │           Blue             │
│    ────                ──  │   /     /       /     /    │                     ───────│
│ ───                ────    │  /     /       /     /     │       ──────────────       │
│                  ──        │ /     /       /     /      │───────                     │
└────────────────────────────└────────────────────────────└────────────────────────────┘
                                                                                        

How the scene is scheduled into rounds and draw calls are shown below:

Round 0

In this round, we don't have any preserved slots or slots that we need to sample from. Simply,
draw unclipped primitives.

Draw to texture 0:

In Slot N - 1 of texture 0, draw the unclipped green rectangle.

Slot N - 1:

┌──────────────────────────────────────────────────────────────────────────────────────┐
│       /     /       /     /        /     /       /     /       /     /       /     / │
│      /     /       /     /        /     /       /     /       /     /       /     /  │
│     /     /       /     /        /     /       /     /       /     /       /     /   │
│    /     /       /     /        /     / Green /     /       /     /       /     /    │
│   /     /       /     /        /     /       /     /       /     /       /     /     │
│  /     /       /     /        /     /       /     /       /     /       /     /      │
│ /     /       /     /        /     /       /     /       /     /       /     /       │
└──────────────────────────────────────────────────────────────────────────────────────┘

Draw to texture 1:

In Slot N - 2 of texture 1, draw unclipped red rectangle and, in slot N - 1, draw the unclipped
blue rectangle.

Slot N - 2:

┌──────────────────────────────────────────────────────────────────────────────────────┐
│      ──              ───                            ──              ───              │
│  ────            ────               ──          ────            ────               ──│
│──           ─────               ────          ──           ─────               ────  │
│        ─────                ────        Red           ─────                ────      │
│    ────                 ────                      ────                 ────          │
│ ───                 ────                       ───                 ────              │
│                  ───                                            ───                  │
└──────────────────────────────────────────────────────────────────────────────────────┘

Slot N - 1:

┌──────────────────────────────────────────────────────────────────────────────────────┐
│                                           ────────────────────────────────────────── │
│───────────────────────────────────────────                                           │
│                                                                                      │
│                                         Blue                          ───────────────│
│                                           ────────────────────────────               │
│               ────────────────────────────                                           │
│───────────────                                                                       │
└──────────────────────────────────────────────────────────────────────────────────────┘

Round 1

At this point, we have three slots that contain our unclipped rectangles. In this round,
we start to sample those pixels to apply clipping (texture 1 samples from texture 0 and
the render target view samples from texture 1).

Draw to texture 0:

Slot N - 1 of texture 0 contains our unclipped green rectangle. In this draw, we sample
the pixels from slot N - 2 from texture 1 to draw the blue rectangle into this slot.

Slot N - 1:

┌─────────────────────────────────────────────────────────┌─────────────────────────────
│        /     /       /     /       /     /       /     /│        ──────────────      │
│       /     /       /     /       /     /       /     / │────────                    │
│      /     /       /     /       /     /       /     /  │                            │
│     /     /       /  Green      /     /       /     /   │           Blue             │
│    /     /       /     /       /     /       /     /    │                     ───────│
│   /     /       /     /       /     /       /     /     │       ──────────────       │
│  /     /       /     /       /     /       /     /      │───────                     │
└─────────────────────────────────────────────────────────└────────────────────────────┘

Draw to texture 1:

Then, into Slot N - 2 of texture 1, which contains our red rectangle, we sample the pixels
from slot N - 1 of texture 0 which contain our green and blue rectangles.


┌────────────────────────────┌────────────────────────────┌─────────────────────────────
│      ──              ───   │       /     /       /     /│        ──────────────      │
│  ────            ────      │      /     /       /     / │────────                    │
│──           ─────          │     /     /       /     /  │                            │
│        ───Red              │    /     /Green  /     /   │           Blue             │
│    ────                ──  │   /     /       /     /    │                     ───────│
│ ───                ────    │  /     /       /     /     │       ──────────────       │
│                  ──        │ /     /       /     /      │───────                     │
└────────────────────────────└────────────────────────────└────────────────────────────┘

Draw to render target

At this point, we can sample the pixels from slot N - 1 of texture 1 to draw the final
rendition.

Nuances

  • When there are no clip/blend regions, we can render directly to the final target.
  • The above example provides an intuitive explanation for how rounds after 3 clip depths
    are scheduled. At clip depths 1 and 2, we can draw directly to the final target within a
    single round.
  • Before drawing into any slot, we need to clear it. If all slots can be cleared or are free,
    we can use a LoadOp::Clear operation. Otherwise, we need to clear the dirty slots using
    a fine grained render pass.

For more information about this algorithm, see this Zulip thread.

raphlinus and others added 4 commits April 22, 2025 21:36
This commit has a sketch of spatio-temporal allocation for clipping, but it is not fully wired up yet. Scenes without clipping should work, but there is a fair amount of TODO remaining for clipping.

There's a fair amount of refactoring here. The biggest change is that draw calls and render passes can be issued from inside the scheduler, as opposed to separate "prepare" and "render" calls. The number of render passes needed will vary by the scene.
@@ -67,6 +91,50 @@ fn clip_rectangle_with_star_evenodd(ctx: &mut impl Renderer) {
ctx.pop_layer();
}

#[vello_test]
fn clip_deeply_nested_circles(ctx: &mut impl Renderer) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like this test to exercise creating many rounds in the scheduler.

Comment on lines +709 to +710
// TODO: We currently allocate a new strips buffer for each render pass. A more efficient
// approach would be to re-use buffers or slices of a larger buffer.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is such a large refactor of vello hybrid that I wanted to keep optimisations like this out of this PR to limit added complexity.

@taj-p taj-p changed the title [Vello Hybrid]: Spatiotemporal Allocation (clipping) [Vello Hybrid]: Clipping (Spatiotemporal Allocation) May 5, 2025
@taj-p
Copy link
Contributor Author

taj-p commented May 5, 2025

cc @ajakubowicz-canva

@taj-p taj-p marked this pull request as ready for review May 5, 2025 05:41
let alphas_texture =
Self::make_alphas_texture(device, max_texture_dimension_2d, alpha_texture_height);
let alpha_data = vec![0; (max_texture_dimension_2d * alpha_texture_height * 16) as usize];
const INITIAL_ALPHA_TEXTURE_HEIGHT: u32 = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non actionable – for my own understanding, why is it a safe change to change INITIAL_ALPHA_TEXTURE_HEIGHT from 2 to 1? What does it impact?

Copy link
Contributor Author

@taj-p taj-p May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was motivated by wanting to keep initial allocations small. If the scene requires more alphas, that can be done in prepare (where we re-allocate GPU resources to contain the scene we want to render).

});

let strip_shader = device.create_shader_module(wgpu::ShaderModuleDescriptor {
label: Some("Strip Shader"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: may be nice if label matches file - e.g. Render Strips Shader. Similarly strip_shader -> render_strips_shader

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't like the render_* prefix because render_ could arguably be put in front of so many classes of things. I also didn't like renaming the shaders to strips. I'll sleep on it, but if anyone has a better name, please let me know 🙏 . Maybe "fine" for fine rasterisation (but arguably that includes "clear_slots" too).

Comment on lines 53 to 64
/// Errors that can occur during rendering.
#[derive(Error, Debug)]
pub enum RenderError {
/// No slots available for rendering.
///
/// This error is likely to occur if a scene has an extreme number of nested layers
/// (clipping, blending, masks, or opacity layers).
///
/// TODO: Consider supporting more than a single column of slots in slot textures.
#[error("No slots available for rendering")]
SlotsExhausted,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there other errors we will want to add?

Should unwrap and expect methods actually return a RenderError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm weary about making even more changes to vello_hybrid in this single PR, so I've left a TODO and will leave this as a separate exercise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 41f78ba

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I wasn't clear in the initial comment. I meant this as a philosophical question about the future. Do you expect any fallible behaviors to be moved into this enum in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. For example, I think when there are too many alphas to fit into the device's alpha texture, we should return an appropriate render error.

The question of returning a RenderError for unwraps and expects which we believe are safe is another question entirely. Historically, if I believe I'm smarter than the compiler, I lean towards panicking, but I think this decision should be made by Linebender as a group. I'll ask about it in Utrecht.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've unresolved this so that it gets greater visibility from other reviewers.

sample_count: 1,
dimension: wgpu::TextureDimension::D2,
// TODO: Is this correct or need it be RGBA8Unorm?
format: render_target_config.format,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who sets the rendering target? It looks like it's set by the caller.

In our render_to_file example, we appear to pass Rgba8Unorm as the render target config format. Whilst for the winit example we pass Bgra8Unorm as the surface format.

I guess another question is related to the texture definition in the shader. In wgsl clip_input_texture is typed as texture_2d<f32>.

Is it expected that the vello_hybrid renderer works with all wgsl texture formats? (sorry for the naivety).

Copy link
Contributor Author

@taj-p taj-p May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a naive question. We only support Rgba8Unorm and Bgra8Unorm currently I believe. I'll confirm this.

});
let clear_slot_indices_buffer = Self::make_clear_slot_indices_buffer(
device,
slot_count as u64 * size_of::<u32>() as u64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The u32 attribute used for the slot index that's passed to the clear slots wgsl shader seems brittle in it's definition when configuring the descriptors / pipeline.

A parallel is the GpuStrip which encapsulates the attribute passed to the render_strips shader. Is it worth using the New Type Idiom to wrap this u32 such that it's traceable to all locations that need to use the accurate size?

For example this line of code. But also the array_stride in the clear_pipeline?
It could also provide the attributes?

Maybe this isn't worth doing for 2 code locations.

Comment on lines +146 to +148
if opacity.is_some() {
unimplemented!()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may figure this out as I keep reviewing - but I am naively confused how opacity is not implemented or even how it's expected to be implemented, when there are tests like clip_with_opacity.

Is this assumption accurate? In the test clip_with_opacity, a clip layer is pushed, and then a rect is filled with alpha 0.5. Would the equivalent in the future be: push a clip layer with opacity set to 0.5, and then paint a fully opaque rect. Would the opacity in the clip layer transfer to the child drawn within it and the two cases would render the same?

Also, I just realized I am a little confused about using the name alpha vs opacity. Are they the same thing, e.g. an alpha of 1 and an opacity of 1 are both opaque. Should terms be consolidated?

Edit: yep, I confirmed opacity here behaves as I expect.

Comment on lines +154 to +156
clip,
BlendMode::new(Mix::Normal, Compose::SrcOver),
None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize I don't actually fully understand the difference between clipping and masking.
Clarifying: A mask is essentially a pixmap (alpha or luminance), whilst a clip_path lets you clip via a bezier path. It's not immediately intuitive that a clip_path is much more challenging than a mask that's "straightforward".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clipping is basically a special case of an alpha mask, where everything inside of the clip shape is fully opaque in the alpha mask, and everything outside if fully transparent. So you could in theory emulate it with an alpha mask, but since clipping is such a common case we have a custom logic that is more complex, but faster.


for (i, color) in colors.iter().enumerate() {
let clip_rect = Rect::new((i as f64) * OFFSET, 0.0, WIDTH, HEIGHT);
ctx.push_clip_layer(&clip_rect.to_path(0.1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Worth defining 0.1 as DEFAULT_TOLERANCE (similar to vello_cpu)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely. But since that touches every file in tests/*, let's leave it for a separate PR.

Copy link
Contributor

@ajakubowicz-canva ajakubowicz-canva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now technically read over all the code and it looks great! I will need to give the schedular a second pass as there are still some mysteries in there.

Unfortunately as someone relatively new to this repo, my comments have been focused on nits and syntax, and less on the overall design. However, overall design seems to be working per tests.

Great work!

//! At this point, we can sample the pixels from slot N - 1 of texture 1 to draw the final
//! result.
//!
//! ## Nuances
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section is missing how slot depth maps to the slot textures. E.g. odd vs even slot depths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added notes in c1e9097

junk: &mut RendererJunk<'_>,
scene: &Scene,
) -> Result<(), RenderError> {
let mut tile_state = mem::take(&mut self.tile_state);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite following why we don't just create a brand new tile_state vec allocation here, vs storing it on self?

Naively, mem::take replaces tile_state with an empty vec, does that incur an allocation? Then at the end we re-set tile_state back on self.

This comment seems perf related so can be ignored.

Copy link
Contributor Author

@taj-p taj-p May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty vectors in rust do not allocate, so taking the memory and returning it later allows us to reuse the allocation without creating more allocations

Comment on lines +133 to +134
let clip_x = u32(in.position.x) & 0xFFu;
let clip_y = (u32(in.position.y) & 3) + in.rgba_or_slot * config.strip_height;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me how in.position.y has been modified in this PR. From the scheduler, it looks like it also contains the slot_ix * Tile::HEIGHT, or slot y position. So why the & 3?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the answer to this question should also potentially result in a code comment in the wgsl shader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in.position represents the position builtin. In a vertex shader, it ranges from -1 to 1 for X and Y. In the context of a fragment shader, it represents the pixel coordinate of where we are drawing (see this article). The & 3 is used to constrain the pixel coordinate to 4 since that's the height of our tile. I added a CAUTION: note to the config.strip_height about the danger in changing its value without updating this logic.

In time, we will want to make this configurable, but I'm not sure how that will present. We could make the & 3 configurable, but then we should also make the & 0xFFu configurable to wide tile width. I think at this stage we should untangle those concerns when we get to them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh whoops, I got my x and y's confused! Thanks for the great answer :D

});
}
Cmd::PushBuf => {
let ix = clip_depth % 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 3 draws in a round representing, slot 0, slot 1, and final texture. In draw_mut the Draw slot corresponds to 1 - clip_depth % 2.
Should these be aligned such that this is 1 - clip_depth % 2. I think all that should do is ensure the slot 0 free vec and slot 0 draw are on the same index and it shouldn't change logic.

Edit: I think it's all a bit more complex. The choice between the two is intentional and changes in multiple places.

Copy link
Contributor

@ajakubowicz-canva ajakubowicz-canva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

Copy link
Contributor

@LaurenzV LaurenzV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just rubber stamping since we discussed in office hours that this is fine to merge, and Andrew doesn't seem to have write permissions yet.

@taj-p taj-p enabled auto-merge May 7, 2025 22:13
@taj-p taj-p added this pull request to the merge queue May 7, 2025
Merged via the queue into linebender:main with commit f53fe05 May 7, 2025
17 checks passed
@taj-p taj-p deleted the tajp/hybrid/clip branch May 7, 2025 22:20
Comment on lines +375 to +378
debug_assert!(
has_non_zero_alpha(rgba),
"Color fields with 0 alpha are reserved for clipping"
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised in Zulip#vello > Vello Hybrid Crashing when alpha = 0 @ 💬.

What is the recommendation if someone wants to pass a fully transparent fill? Additionally, is there a use-case of painting something fully transparent?

github-merge-queue bot pushed a commit that referenced this pull request May 20, 2025
### Context

This PR follows the conversation had about
#947 . I made this PR separately
as it also incorporates the clipping changes
#957 .

In short, this PR adds a native WebGL backend when targeting `wasm32`
and if using the `"webgl"` feature on `vello_hybrid`.
The **primary motivation** of using a custom webgl renderer is binary
size, allowing 3mb to be removed when targeting WebGL2 natively. This is
achieved by omitting `wgpu` from the binary when the architecture is
`wasm32` and the `"webgl"` feature flag is set on `vello_hybrid`.

### Changes

#### vello_hybrid examples

- The `webgl` example has been renamed to `wgpu_webgl`. Now it's more
clear that it leverages `wgpu`'s WebGL backend.
- A `native_webgl` example has been added which uses the new WebGL
renderer backend.
- `ci.yml` tests both the `wgpu_webgl` example and the `native_webgl`
example - smoke testing both webgl techniques.
- A new `ClipScene` has been added for manually viewing and testing
deeply nested clipping.
([file](https://github.com/linebender/vello/pull/1011/files#diff-ef57b226886dac928b079c4743d6ed1c86ced27637edca1b60c496c95f03479b))

The PR can be manually tested by locally pulling the branch and running
the two examples:

- `cargo run_wasm -p wgpu_webgl --release`: Test original example
- `cargo run_wasm -p native_webgl --release`: Test new backend


#### New `vello_sparse_shaders` package added

This new package contains the WGSL shaders as a source of truth.
`vello_hybrid` optionally depends on this library which triggers a build
step generating a compiled module. The module contains GLSL shader
source code, as well as mappings from the WGSL identifiers to the
naga-mangled identifiers in the GLSL.

<details><summary>The generated code:</summary>

```rs
// Generated code by `vello_sparse_shaders` - DO NOT EDIT
/// Build time GLSL shaders derived from wgsl shaders.
/// Compiled glsl for `clear_slots.wgsl`
pub mod clear_slots {
    #![allow(missing_docs, reason="No metadata to generate precise documentation forgenerated code.")]

    pub const VERTEX_SOURCE: &str = r###"#version 300 es

precision highp float;
precision highp int;

struct Config {
    uint slot_width;
    uint slot_height;
    uint texture_height;
    uint _padding;
};
uniform Config_block_0Vertex { Config _group_0_binding_0_vs; };

layout(location = 0) in uint _p2vs_location0;

void main() {
    uint vertex_index = uint(gl_VertexID);
    uint index = _p2vs_location0;
    float x = float((vertex_index & 1u));
    float y = float((vertex_index >> 1u));
    uint _e10 = _group_0_binding_0_vs.slot_height;
    float slot_y_offset = float((index * _e10));
    uint _e15 = _group_0_binding_0_vs.slot_width;
    float pix_x = (x * float(_e15));
    uint _e20 = _group_0_binding_0_vs.slot_height;
    float pix_y = (slot_y_offset + (y * float(_e20)));
    uint _e28 = _group_0_binding_0_vs.slot_width;
    float ndc_x = (((pix_x * 2.0) / float(_e28)) - 1.0);
    uint _e37 = _group_0_binding_0_vs.texture_height;
    float ndc_y = (1.0 - ((pix_y * 2.0) / float(_e37)));
    gl_Position = vec4(ndc_x, ndc_y, 0.0, 1.0);
    gl_Position.yz = vec2(-gl_Position.y, gl_Position.z * 2.0 - gl_Position.w);
    return;
}

"###;

    pub mod vertex {
        pub const CONFIG: &str = "Config_block_0Vertex";
    }
    pub const FRAGMENT_SOURCE: &str = r###"#version 300 es

precision highp float;
precision highp int;

struct Config {
    uint slot_width;
    uint slot_height;
    uint texture_height;
    uint _padding;
};
layout(location = 0) out vec4 _fs2p_location0;

void main() {
    vec4 position = gl_FragCoord;
    _fs2p_location0 = vec4(0.0, 0.0, 0.0, 0.0);
    return;
}

"###;
}
/// Compiled glsl for `render_strips.wgsl`
pub mod render_strips {
    #![allow(missing_docs, reason="No metadata to generate precise documentation forgenerated code.")]

    pub const VERTEX_SOURCE: &str = r###"#version 300 es

precision highp float;
precision highp int;

struct Config {
    uint width;
    uint height;
    uint strip_height;
    uint alphas_tex_width_bits;
};
struct StripInstance {
    uint xy;
    uint widths;
    uint col;
    uint rgba_or_slot;
};
struct VertexOutput {
    vec2 tex_coord;
    uint dense_end;
    uint rgba_or_slot;
    vec4 position;
};
uniform Config_block_0Vertex { Config _group_0_binding_1_vs; };

layout(location = 0) in uint _p2vs_location0;
layout(location = 1) in uint _p2vs_location1;
layout(location = 2) in uint _p2vs_location2;
layout(location = 3) in uint _p2vs_location3;
smooth out vec2 _vs2fs_location0;
flat out uint _vs2fs_location1;
flat out uint _vs2fs_location2;

uint unpack_alphas_from_channel(uvec4 rgba, uint channel_index) {
    switch(channel_index) {
        case 0u: {
            return rgba.x;
        }
        case 1u: {
            return rgba.y;
        }
        case 2u: {
            return rgba.z;
        }
        case 3u: {
            return rgba.w;
        }
        default: {
            return rgba.x;
        }
    }
}

vec4 unpack4x8unorm(uint rgba_packed) {
    return vec4((float(((rgba_packed >> 0u) & 255u)) / 255.0), (float(((rgba_packed >> 8u) & 255u)) / 255.0), (float(((rgba_packed >> 16u) & 255u)) / 255.0), (float(((rgba_packed >> 24u) & 255u)) / 255.0));
}

void main() {
    uint in_vertex_index = uint(gl_VertexID);
    StripInstance instance = StripInstance(_p2vs_location0, _p2vs_location1, _p2vs_location2, _p2vs_location3);
    VertexOutput out_ = VertexOutput(vec2(0.0), 0u, 0u, vec4(0.0));
    float x = float((in_vertex_index & 1u));
    float y = float((in_vertex_index >> 1u));
    uint x0_ = (instance.xy & 65535u);
    uint y0_ = (instance.xy >> 16u);
    uint width = (instance.widths & 65535u);
    uint dense_width = (instance.widths >> 16u);
    out_.dense_end = (instance.col + dense_width);
    float pix_x = (float(x0_) + (float(width) * x));
    uint _e31 = _group_0_binding_1_vs.strip_height;
    float pix_y = (float(y0_) + (y * float(_e31)));
    uint _e39 = _group_0_binding_1_vs.width;
    float ndc_x = (((pix_x * 2.0) / float(_e39)) - 1.0);
    uint _e48 = _group_0_binding_1_vs.height;
    float ndc_y = (1.0 - ((pix_y * 2.0) / float(_e48)));
    out_.position = vec4(ndc_x, ndc_y, 0.0, 1.0);
    uint _e65 = _group_0_binding_1_vs.strip_height;
    out_.tex_coord = vec2((float(instance.col) + (x * float(width))), (y * float(_e65)));
    out_.rgba_or_slot = instance.rgba_or_slot;
    VertexOutput _e71 = out_;
    _vs2fs_location0 = _e71.tex_coord;
    _vs2fs_location1 = _e71.dense_end;
    _vs2fs_location2 = _e71.rgba_or_slot;
    gl_Position = _e71.position;
    gl_Position.yz = vec2(-gl_Position.y, gl_Position.z * 2.0 - gl_Position.w);
    return;
}

"###;

    pub mod vertex {
        pub const CONFIG: &str = "Config_block_0Vertex";
    }
    pub const FRAGMENT_SOURCE: &str = r###"#version 300 es

precision highp float;
precision highp int;

struct Config {
    uint width;
    uint height;
    uint strip_height;
    uint alphas_tex_width_bits;
};
struct StripInstance {
    uint xy;
    uint widths;
    uint col;
    uint rgba_or_slot;
};
struct VertexOutput {
    vec2 tex_coord;
    uint dense_end;
    uint rgba_or_slot;
    vec4 position;
};
uniform Config_block_0Fragment { Config _group_0_binding_1_fs; };

uniform highp usampler2D _group_0_binding_0_fs;

uniform highp sampler2D _group_0_binding_2_fs;

smooth in vec2 _vs2fs_location0;
flat in uint _vs2fs_location1;
flat in uint _vs2fs_location2;
layout(location = 0) out vec4 _fs2p_location0;

uint unpack_alphas_from_channel(uvec4 rgba, uint channel_index) {
    switch(channel_index) {
        case 0u: {
            return rgba.x;
        }
        case 1u: {
            return rgba.y;
        }
        case 2u: {
            return rgba.z;
        }
        case 3u: {
            return rgba.w;
        }
        default: {
            return rgba.x;
        }
    }
}

vec4 unpack4x8unorm(uint rgba_packed) {
    return vec4((float(((rgba_packed >> 0u) & 255u)) / 255.0), (float(((rgba_packed >> 8u) & 255u)) / 255.0), (float(((rgba_packed >> 16u) & 255u)) / 255.0), (float(((rgba_packed >> 24u) & 255u)) / 255.0));
}

void main() {
    VertexOutput in_ = VertexOutput(_vs2fs_location0, _vs2fs_location1, _vs2fs_location2, gl_FragCoord);
    float alpha = 1.0;
    uint alphas_index = uint(floor(in_.tex_coord.x));
    if ((alphas_index < in_.dense_end)) {
        uint y = uint(floor(in_.tex_coord.y));
        uvec2 tex_dimensions = uvec2(textureSize(_group_0_binding_0_fs, 0).xy);
        uint alphas_tex_width = tex_dimensions.x;
        uint texel_index = (alphas_index / 4u);
        uint channel_index_1 = (alphas_index % 4u);
        uint tex_x = (texel_index & (alphas_tex_width - 1u));
        uint _e25 = _group_0_binding_1_fs.alphas_tex_width_bits;
        uint tex_y = (texel_index >> _e25);
        uvec4 rgba_values = texelFetch(_group_0_binding_0_fs, ivec2(uvec2(tex_x, tex_y)), 0);
        uint _e31 = unpack_alphas_from_channel(rgba_values, channel_index_1);
        alpha = (float(((_e31 >> (y * 8u)) & 255u)) * 0.003921569);
    }
    uint alpha_byte = (in_.rgba_or_slot >> 24u);
    if ((alpha_byte != 0u)) {
        float _e45 = alpha;
        vec4 _e47 = unpack4x8unorm(in_.rgba_or_slot);
        _fs2p_location0 = (_e45 * _e47);
        return;
    } else {
        uint clip_x = (uint(in_.position.x) & 255u);
        uint _e62 = _group_0_binding_1_fs.strip_height;
        uint clip_y = ((uint(in_.position.y) & 3u) + (in_.rgba_or_slot * _e62));
        vec4 clip_in_color = texelFetch(_group_0_binding_2_fs, ivec2(uvec2(clip_x, clip_y)), 0);
        float _e69 = alpha;
        _fs2p_location0 = (_e69 * clip_in_color);
        return;
    }
}

"###;
    pub mod fragment {
        pub const CONFIG: &str = "Config_block_0Fragment";
        pub const ALPHAS_TEXTURE: &str = "_group_0_binding_0_fs";
        pub const CLIP_INPUT_TEXTURE: &str = "_group_0_binding_2_fs";
    }
}
```

</details>

The generated code can then be imported with:
`use vello_sparse_shaders::{clear_slots, render_strips};`

#### `vello_hybrid` changes

- A new `render` subdirectory has been added that contains:
  - `common.rs`: All the shared render logic.
  - `wgpu.rs`: The original renderer leveraging `wgpu`.
  - `webgl.rs`: The new WebGL native backend renderer.

- The `Scheduler` has been made backend-agnostic by operating on a new
`RendererBackend` trait. Both the `wgpu` and `webgl` renderer backends
implement `RendererBackend`.

#### Feature flag changes

Feature flags in `vello_hybrid` are additive. By default the `wgpu`
feature is enabled. If the compile target is `wasm32` and the `webgl`
feature is enabled on `vello_hybrid`, then the native WebGL renderer
will be enabled.

#### Warnings

A runtime warning has been added that will trigger once on either
renderer being instantiated, if both:
 - `wgpu` with its WebGL backend is active.
 - The `WebGlRenderer` is also active.

The warning is:

```
Both WebGL and wgpu with the "webgl" feature are enabled.
For optimal performance and binary size on web targets, use only the dedicated WebGL renderer.
```

### Screen recording

> [!NOTE]
> The screen recording below is slightly stale – I've since changed the
background to be dark so the white text scene can be read.


![webgl_native](https://github.com/user-attachments/assets/c94fffe9-8249-4a0c-ab46-13cb16097dd2)

Left side is `native_webgl` example (using native WebGL2)
Right side is the existing `webgl` example which uses `wgpu` with the
`webgl` feature flag.

### Test plan

To scope down this PR, there are no automated tests for the renderer
except for the single browser test introduced in the example. The shader
compilation has some unit tests.

This PR was manually tested via the new native webgl example: `cargo
run_wasm -p native_webgl`. This example can be tested against the
original `cargo run_wasm -p wgpu_webgl`.

### Risks

The only risk I'm uncertain about is the addition of the `wgpu` feature
flag, that is used as a default feature. Could this be a breaking change
for users that specify "no default features". They'd have to add the
`wgpu` feature explicitly. This seems minor.

### Followup work

This PR is huge, because it implements all the existing vello_hybrid
features in the WebGL backend. Similarly it also includes build-time
shader compilation. Instead of making this change completely
impenetrable, I'm splitting test infrastructure into a separate change.
This PR must be manually tested in the interim.

The example has been added to CI so that it must compile and run.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants