-
Notifications
You must be signed in to change notification settings - Fork 160
[Vello Hybrid]: Clipping (Spatiotemporal Allocation) #957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit has a sketch of spatio-temporal allocation for clipping, but it is not fully wired up yet. Scenes without clipping should work, but there is a fair amount of TODO remaining for clipping. There's a fair amount of refactoring here. The biggest change is that draw calls and render passes can be issued from inside the scheduler, as opposed to separate "prepare" and "render" calls. The number of render passes needed will vary by the scene.
sparse_strips/vello_sparse_tests/snapshots/image_with_transform_rotate_1.png
Outdated
Show resolved
Hide resolved
@@ -67,6 +91,50 @@ fn clip_rectangle_with_star_evenodd(ctx: &mut impl Renderer) { | |||
ctx.pop_layer(); | |||
} | |||
|
|||
#[vello_test] | |||
fn clip_deeply_nested_circles(ctx: &mut impl Renderer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like this test to exercise creating many rounds in the scheduler.
// TODO: We currently allocate a new strips buffer for each render pass. A more efficient | ||
// approach would be to re-use buffers or slices of a larger buffer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is such a large refactor of vello hybrid that I wanted to keep optimisations like this out of this PR to limit added complexity.
sparse_strips/vello_sparse_tests/snapshots/fill_command_respects_clip_bounds.png
Outdated
Show resolved
Hide resolved
let alphas_texture = | ||
Self::make_alphas_texture(device, max_texture_dimension_2d, alpha_texture_height); | ||
let alpha_data = vec![0; (max_texture_dimension_2d * alpha_texture_height * 16) as usize]; | ||
const INITIAL_ALPHA_TEXTURE_HEIGHT: u32 = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non actionable – for my own understanding, why is it a safe change to change INITIAL_ALPHA_TEXTURE_HEIGHT
from 2
to 1
? What does it impact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was motivated by wanting to keep initial allocations small. If the scene requires more alphas, that can be done in prepare
(where we re-allocate GPU resources to contain the scene we want to render).
}); | ||
|
||
let strip_shader = device.create_shader_module(wgpu::ShaderModuleDescriptor { | ||
label: Some("Strip Shader"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: may be nice if label matches file - e.g. Render Strips Shader
. Similarly strip_shader
-> render_strips_shader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't like the render_*
prefix because render_
could arguably be put in front of so many classes of things. I also didn't like renaming the shaders to strips
. I'll sleep on it, but if anyone has a better name, please let me know 🙏 . Maybe "fine" for fine rasterisation (but arguably that includes "clear_slots" too).
/// Errors that can occur during rendering. | ||
#[derive(Error, Debug)] | ||
pub enum RenderError { | ||
/// No slots available for rendering. | ||
/// | ||
/// This error is likely to occur if a scene has an extreme number of nested layers | ||
/// (clipping, blending, masks, or opacity layers). | ||
/// | ||
/// TODO: Consider supporting more than a single column of slots in slot textures. | ||
#[error("No slots available for rendering")] | ||
SlotsExhausted, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there other errors we will want to add?
Should unwrap
and expect
methods actually return a RenderError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm weary about making even more changes to vello_hybrid
in this single PR, so I've left a TODO and will leave this as a separate exercise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 41f78ba
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I wasn't clear in the initial comment. I meant this as a philosophical question about the future. Do you expect any fallible behaviors to be moved into this enum in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. For example, I think when there are too many alphas to fit into the device's alpha texture, we should return an appropriate render error.
The question of returning a RenderError
for unwraps and expects which we believe are safe is another question entirely. Historically, if I believe I'm smarter than the compiler, I lean towards panicking, but I think this decision should be made by Linebender as a group. I'll ask about it in Utrecht.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've unresolved this so that it gets greater visibility from other reviewers.
sample_count: 1, | ||
dimension: wgpu::TextureDimension::D2, | ||
// TODO: Is this correct or need it be RGBA8Unorm? | ||
format: render_target_config.format, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who sets the rendering target? It looks like it's set by the caller.
In our render_to_file
example, we appear to pass Rgba8Unorm
as the render target config format. Whilst for the winit
example we pass Bgra8Unorm
as the surface format.
I guess another question is related to the texture definition in the shader. In wgsl
clip_input_texture
is typed as texture_2d<f32>
.
Is it expected that the vello_hybrid renderer works with all wgsl texture formats? (sorry for the naivety).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a naive question. We only support Rgba8Unorm and Bgra8Unorm currently I believe. I'll confirm this.
}); | ||
let clear_slot_indices_buffer = Self::make_clear_slot_indices_buffer( | ||
device, | ||
slot_count as u64 * size_of::<u32>() as u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The u32
attribute used for the slot index
that's passed to the clear slots wgsl shader seems brittle in it's definition when configuring the descriptors / pipeline.
A parallel is the GpuStrip
which encapsulates the attribute passed to the render_strips
shader. Is it worth using the New Type Idiom to wrap this u32
such that it's traceable to all locations that need to use the accurate size?
For example this line of code. But also the array_stride
in the clear_pipeline
?
It could also provide the attributes?
Maybe this isn't worth doing for 2 code locations.
if opacity.is_some() { | ||
unimplemented!() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may figure this out as I keep reviewing - but I am naively confused how opacity is not implemented or even how it's expected to be implemented, when there are tests like clip_with_opacity
.
Is this assumption accurate? In the test clip_with_opacity
, a clip layer is pushed, and then a rect is filled with alpha 0.5. Would the equivalent in the future be: push a clip layer with opacity set to 0.5
, and then paint a fully opaque rect. Would the opacity in the clip layer transfer to the child drawn within it and the two cases would render the same?
Also, I just realized I am a little confused about using the name alpha
vs opacity
. Are they the same thing, e.g. an alpha of 1
and an opacity of 1
are both opaque. Should terms be consolidated?
Edit: yep, I confirmed opacity here behaves as I expect.
clip, | ||
BlendMode::new(Mix::Normal, Compose::SrcOver), | ||
None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize I don't actually fully understand the difference between clipping and masking.
Clarifying: A mask is essentially a pixmap (alpha or luminance), whilst a clip_path lets you clip via a bezier path. It's not immediately intuitive that a clip_path is much more challenging than a mask that's "straightforward".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clipping is basically a special case of an alpha mask, where everything inside of the clip shape is fully opaque in the alpha mask, and everything outside if fully transparent. So you could in theory emulate it with an alpha mask, but since clipping is such a common case we have a custom logic that is more complex, but faster.
|
||
for (i, color) in colors.iter().enumerate() { | ||
let clip_rect = Rect::new((i as f64) * OFFSET, 0.0, WIDTH, HEIGHT); | ||
ctx.push_clip_layer(&clip_rect.to_path(0.1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Worth defining 0.1
as DEFAULT_TOLERANCE
(similar to vello_cpu)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely. But since that touches every file in tests/*
, let's leave it for a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now technically read over all the code and it looks great! I will need to give the schedular a second pass as there are still some mysteries in there.
Unfortunately as someone relatively new to this repo, my comments have been focused on nits and syntax, and less on the overall design. However, overall design seems to be working per tests.
Great work!
//! At this point, we can sample the pixels from slot N - 1 of texture 1 to draw the final | ||
//! result. | ||
//! | ||
//! ## Nuances |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this section is missing how slot depth maps to the slot textures. E.g. odd vs even slot depths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added notes in c1e9097
junk: &mut RendererJunk<'_>, | ||
scene: &Scene, | ||
) -> Result<(), RenderError> { | ||
let mut tile_state = mem::take(&mut self.tile_state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite following why we don't just create a brand new tile_state
vec allocation here, vs storing it on self
?
Naively, mem::take
replaces tile_state with an empty vec, does that incur an allocation? Then at the end we re-set tile_state
back on self
.
This comment seems perf related so can be ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty vectors in rust do not allocate, so taking the memory and returning it later allows us to reuse the allocation without creating more allocations
let clip_x = u32(in.position.x) & 0xFFu; | ||
let clip_y = (u32(in.position.y) & 3) + in.rgba_or_slot * config.strip_height; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me how in.position.y
has been modified in this PR. From the scheduler, it looks like it also contains the slot_ix * Tile::HEIGHT
, or slot y position. So why the & 3
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the answer to this question should also potentially result in a code comment in the wgsl shader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in.position
represents the position builtin
. In a vertex shader, it ranges from -1 to 1 for X and Y. In the context of a fragment shader, it represents the pixel coordinate of where we are drawing (see this article). The & 3
is used to constrain the pixel coordinate to 4 since that's the height of our tile. I added a CAUTION:
note to the config.strip_height
about the danger in changing its value without updating this logic.
In time, we will want to make this configurable, but I'm not sure how that will present. We could make the & 3
configurable, but then we should also make the & 0xFFu
configurable to wide tile width. I think at this stage we should untangle those concerns when we get to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh whoops, I got my x and y's confused! Thanks for the great answer :D
}); | ||
} | ||
Cmd::PushBuf => { | ||
let ix = clip_depth % 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 3 draws in a round representing, slot 0, slot 1, and final texture. In draw_mut
the Draw slot corresponds to 1 - clip_depth % 2
.
Should these be aligned such that this is 1 - clip_depth % 2
. I think all that should do is ensure the slot 0 free vec and slot 0 draw are on the same index and it shouldn't change logic.
Edit: I think it's all a bit more complex. The choice between the two is intentional and changes in multiple places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just rubber stamping since we discussed in office hours that this is fine to merge, and Andrew doesn't seem to have write permissions yet.
debug_assert!( | ||
has_non_zero_alpha(rgba), | ||
"Color fields with 0 alpha are reserved for clipping" | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raised in Zulip#vello > Vello Hybrid Crashing when alpha = 0 @ 💬.
What is the recommendation if someone wants to pass a fully transparent fill? Additionally, is there a use-case of painting something fully transparent?
### Context This PR follows the conversation had about #947 . I made this PR separately as it also incorporates the clipping changes #957 . In short, this PR adds a native WebGL backend when targeting `wasm32` and if using the `"webgl"` feature on `vello_hybrid`. The **primary motivation** of using a custom webgl renderer is binary size, allowing 3mb to be removed when targeting WebGL2 natively. This is achieved by omitting `wgpu` from the binary when the architecture is `wasm32` and the `"webgl"` feature flag is set on `vello_hybrid`. ### Changes #### vello_hybrid examples - The `webgl` example has been renamed to `wgpu_webgl`. Now it's more clear that it leverages `wgpu`'s WebGL backend. - A `native_webgl` example has been added which uses the new WebGL renderer backend. - `ci.yml` tests both the `wgpu_webgl` example and the `native_webgl` example - smoke testing both webgl techniques. - A new `ClipScene` has been added for manually viewing and testing deeply nested clipping. ([file](https://github.com/linebender/vello/pull/1011/files#diff-ef57b226886dac928b079c4743d6ed1c86ced27637edca1b60c496c95f03479b)) The PR can be manually tested by locally pulling the branch and running the two examples: - `cargo run_wasm -p wgpu_webgl --release`: Test original example - `cargo run_wasm -p native_webgl --release`: Test new backend #### New `vello_sparse_shaders` package added This new package contains the WGSL shaders as a source of truth. `vello_hybrid` optionally depends on this library which triggers a build step generating a compiled module. The module contains GLSL shader source code, as well as mappings from the WGSL identifiers to the naga-mangled identifiers in the GLSL. <details><summary>The generated code:</summary> ```rs // Generated code by `vello_sparse_shaders` - DO NOT EDIT /// Build time GLSL shaders derived from wgsl shaders. /// Compiled glsl for `clear_slots.wgsl` pub mod clear_slots { #![allow(missing_docs, reason="No metadata to generate precise documentation forgenerated code.")] pub const VERTEX_SOURCE: &str = r###"#version 300 es precision highp float; precision highp int; struct Config { uint slot_width; uint slot_height; uint texture_height; uint _padding; }; uniform Config_block_0Vertex { Config _group_0_binding_0_vs; }; layout(location = 0) in uint _p2vs_location0; void main() { uint vertex_index = uint(gl_VertexID); uint index = _p2vs_location0; float x = float((vertex_index & 1u)); float y = float((vertex_index >> 1u)); uint _e10 = _group_0_binding_0_vs.slot_height; float slot_y_offset = float((index * _e10)); uint _e15 = _group_0_binding_0_vs.slot_width; float pix_x = (x * float(_e15)); uint _e20 = _group_0_binding_0_vs.slot_height; float pix_y = (slot_y_offset + (y * float(_e20))); uint _e28 = _group_0_binding_0_vs.slot_width; float ndc_x = (((pix_x * 2.0) / float(_e28)) - 1.0); uint _e37 = _group_0_binding_0_vs.texture_height; float ndc_y = (1.0 - ((pix_y * 2.0) / float(_e37))); gl_Position = vec4(ndc_x, ndc_y, 0.0, 1.0); gl_Position.yz = vec2(-gl_Position.y, gl_Position.z * 2.0 - gl_Position.w); return; } "###; pub mod vertex { pub const CONFIG: &str = "Config_block_0Vertex"; } pub const FRAGMENT_SOURCE: &str = r###"#version 300 es precision highp float; precision highp int; struct Config { uint slot_width; uint slot_height; uint texture_height; uint _padding; }; layout(location = 0) out vec4 _fs2p_location0; void main() { vec4 position = gl_FragCoord; _fs2p_location0 = vec4(0.0, 0.0, 0.0, 0.0); return; } "###; } /// Compiled glsl for `render_strips.wgsl` pub mod render_strips { #![allow(missing_docs, reason="No metadata to generate precise documentation forgenerated code.")] pub const VERTEX_SOURCE: &str = r###"#version 300 es precision highp float; precision highp int; struct Config { uint width; uint height; uint strip_height; uint alphas_tex_width_bits; }; struct StripInstance { uint xy; uint widths; uint col; uint rgba_or_slot; }; struct VertexOutput { vec2 tex_coord; uint dense_end; uint rgba_or_slot; vec4 position; }; uniform Config_block_0Vertex { Config _group_0_binding_1_vs; }; layout(location = 0) in uint _p2vs_location0; layout(location = 1) in uint _p2vs_location1; layout(location = 2) in uint _p2vs_location2; layout(location = 3) in uint _p2vs_location3; smooth out vec2 _vs2fs_location0; flat out uint _vs2fs_location1; flat out uint _vs2fs_location2; uint unpack_alphas_from_channel(uvec4 rgba, uint channel_index) { switch(channel_index) { case 0u: { return rgba.x; } case 1u: { return rgba.y; } case 2u: { return rgba.z; } case 3u: { return rgba.w; } default: { return rgba.x; } } } vec4 unpack4x8unorm(uint rgba_packed) { return vec4((float(((rgba_packed >> 0u) & 255u)) / 255.0), (float(((rgba_packed >> 8u) & 255u)) / 255.0), (float(((rgba_packed >> 16u) & 255u)) / 255.0), (float(((rgba_packed >> 24u) & 255u)) / 255.0)); } void main() { uint in_vertex_index = uint(gl_VertexID); StripInstance instance = StripInstance(_p2vs_location0, _p2vs_location1, _p2vs_location2, _p2vs_location3); VertexOutput out_ = VertexOutput(vec2(0.0), 0u, 0u, vec4(0.0)); float x = float((in_vertex_index & 1u)); float y = float((in_vertex_index >> 1u)); uint x0_ = (instance.xy & 65535u); uint y0_ = (instance.xy >> 16u); uint width = (instance.widths & 65535u); uint dense_width = (instance.widths >> 16u); out_.dense_end = (instance.col + dense_width); float pix_x = (float(x0_) + (float(width) * x)); uint _e31 = _group_0_binding_1_vs.strip_height; float pix_y = (float(y0_) + (y * float(_e31))); uint _e39 = _group_0_binding_1_vs.width; float ndc_x = (((pix_x * 2.0) / float(_e39)) - 1.0); uint _e48 = _group_0_binding_1_vs.height; float ndc_y = (1.0 - ((pix_y * 2.0) / float(_e48))); out_.position = vec4(ndc_x, ndc_y, 0.0, 1.0); uint _e65 = _group_0_binding_1_vs.strip_height; out_.tex_coord = vec2((float(instance.col) + (x * float(width))), (y * float(_e65))); out_.rgba_or_slot = instance.rgba_or_slot; VertexOutput _e71 = out_; _vs2fs_location0 = _e71.tex_coord; _vs2fs_location1 = _e71.dense_end; _vs2fs_location2 = _e71.rgba_or_slot; gl_Position = _e71.position; gl_Position.yz = vec2(-gl_Position.y, gl_Position.z * 2.0 - gl_Position.w); return; } "###; pub mod vertex { pub const CONFIG: &str = "Config_block_0Vertex"; } pub const FRAGMENT_SOURCE: &str = r###"#version 300 es precision highp float; precision highp int; struct Config { uint width; uint height; uint strip_height; uint alphas_tex_width_bits; }; struct StripInstance { uint xy; uint widths; uint col; uint rgba_or_slot; }; struct VertexOutput { vec2 tex_coord; uint dense_end; uint rgba_or_slot; vec4 position; }; uniform Config_block_0Fragment { Config _group_0_binding_1_fs; }; uniform highp usampler2D _group_0_binding_0_fs; uniform highp sampler2D _group_0_binding_2_fs; smooth in vec2 _vs2fs_location0; flat in uint _vs2fs_location1; flat in uint _vs2fs_location2; layout(location = 0) out vec4 _fs2p_location0; uint unpack_alphas_from_channel(uvec4 rgba, uint channel_index) { switch(channel_index) { case 0u: { return rgba.x; } case 1u: { return rgba.y; } case 2u: { return rgba.z; } case 3u: { return rgba.w; } default: { return rgba.x; } } } vec4 unpack4x8unorm(uint rgba_packed) { return vec4((float(((rgba_packed >> 0u) & 255u)) / 255.0), (float(((rgba_packed >> 8u) & 255u)) / 255.0), (float(((rgba_packed >> 16u) & 255u)) / 255.0), (float(((rgba_packed >> 24u) & 255u)) / 255.0)); } void main() { VertexOutput in_ = VertexOutput(_vs2fs_location0, _vs2fs_location1, _vs2fs_location2, gl_FragCoord); float alpha = 1.0; uint alphas_index = uint(floor(in_.tex_coord.x)); if ((alphas_index < in_.dense_end)) { uint y = uint(floor(in_.tex_coord.y)); uvec2 tex_dimensions = uvec2(textureSize(_group_0_binding_0_fs, 0).xy); uint alphas_tex_width = tex_dimensions.x; uint texel_index = (alphas_index / 4u); uint channel_index_1 = (alphas_index % 4u); uint tex_x = (texel_index & (alphas_tex_width - 1u)); uint _e25 = _group_0_binding_1_fs.alphas_tex_width_bits; uint tex_y = (texel_index >> _e25); uvec4 rgba_values = texelFetch(_group_0_binding_0_fs, ivec2(uvec2(tex_x, tex_y)), 0); uint _e31 = unpack_alphas_from_channel(rgba_values, channel_index_1); alpha = (float(((_e31 >> (y * 8u)) & 255u)) * 0.003921569); } uint alpha_byte = (in_.rgba_or_slot >> 24u); if ((alpha_byte != 0u)) { float _e45 = alpha; vec4 _e47 = unpack4x8unorm(in_.rgba_or_slot); _fs2p_location0 = (_e45 * _e47); return; } else { uint clip_x = (uint(in_.position.x) & 255u); uint _e62 = _group_0_binding_1_fs.strip_height; uint clip_y = ((uint(in_.position.y) & 3u) + (in_.rgba_or_slot * _e62)); vec4 clip_in_color = texelFetch(_group_0_binding_2_fs, ivec2(uvec2(clip_x, clip_y)), 0); float _e69 = alpha; _fs2p_location0 = (_e69 * clip_in_color); return; } } "###; pub mod fragment { pub const CONFIG: &str = "Config_block_0Fragment"; pub const ALPHAS_TEXTURE: &str = "_group_0_binding_0_fs"; pub const CLIP_INPUT_TEXTURE: &str = "_group_0_binding_2_fs"; } } ``` </details> The generated code can then be imported with: `use vello_sparse_shaders::{clear_slots, render_strips};` #### `vello_hybrid` changes - A new `render` subdirectory has been added that contains: - `common.rs`: All the shared render logic. - `wgpu.rs`: The original renderer leveraging `wgpu`. - `webgl.rs`: The new WebGL native backend renderer. - The `Scheduler` has been made backend-agnostic by operating on a new `RendererBackend` trait. Both the `wgpu` and `webgl` renderer backends implement `RendererBackend`. #### Feature flag changes Feature flags in `vello_hybrid` are additive. By default the `wgpu` feature is enabled. If the compile target is `wasm32` and the `webgl` feature is enabled on `vello_hybrid`, then the native WebGL renderer will be enabled. #### Warnings A runtime warning has been added that will trigger once on either renderer being instantiated, if both: - `wgpu` with its WebGL backend is active. - The `WebGlRenderer` is also active. The warning is: ``` Both WebGL and wgpu with the "webgl" feature are enabled. For optimal performance and binary size on web targets, use only the dedicated WebGL renderer. ``` ### Screen recording > [!NOTE] > The screen recording below is slightly stale – I've since changed the background to be dark so the white text scene can be read.  Left side is `native_webgl` example (using native WebGL2) Right side is the existing `webgl` example which uses `wgpu` with the `webgl` feature flag. ### Test plan To scope down this PR, there are no automated tests for the renderer except for the single browser test introduced in the example. The shader compilation has some unit tests. This PR was manually tested via the new native webgl example: `cargo run_wasm -p native_webgl`. This example can be tested against the original `cargo run_wasm -p wgpu_webgl`. ### Risks The only risk I'm uncertain about is the addition of the `wgpu` feature flag, that is used as a default feature. Could this be a breaking change for users that specify "no default features". They'd have to add the `wgpu` feature explicitly. This seems minor. ### Followup work This PR is huge, because it implements all the existing vello_hybrid features in the WebGL backend. Similarly it also includes build-time shader compilation. Instead of making this change completely impenetrable, I'm splitting test infrastructure into a separate change. This PR must be manually tested in the interim. The example has been added to CI so that it must compile and run.
Context
Hooks up and addresses all the TODO's in Raph's sketch (#934). See this thread#vello > Spatiotemporal allocation (hybrid) @ 💬.
Notes
GpuResources
non-optional - I thought this was more trouble than it was worth.clear_slots
pipeline to support fine grained clearing of slots in slot textures (needed for spatiotemporal allocation)vello_hybrid
write_buffer
)I've copied the documentation from
schedule.rs
below for information about how spatiotemporal allocation works:Scheduling
we flush rounds (i.e. execute render passes) to free up space. Note that a slot refers
to 1 wide tile's worth of pixels in the clip texture.
free
vector contains the indices of the slots that are available for use in the two clip textures.Example
Consider the following scene of drawing a single wide tile with three overlapping rectangles with
decreasing width clipping regions.
This single wide tile scene should produce the below rendering:
How the scene is scheduled into rounds and draw calls are shown below:
Round 0
In this round, we don't have any preserved slots or slots that we need to sample from. Simply,
draw unclipped primitives.
Draw to texture 0:
In Slot N - 1 of texture 0, draw the unclipped green rectangle.
Slot N - 1:
Draw to texture 1:
In Slot N - 2 of texture 1, draw unclipped red rectangle and, in slot N - 1, draw the unclipped
blue rectangle.
Slot N - 2:
Slot N - 1:
Round 1
At this point, we have three slots that contain our unclipped rectangles. In this round,
we start to sample those pixels to apply clipping (texture 1 samples from texture 0 and
the render target view samples from texture 1).
Draw to texture 0:
Slot N - 1 of texture 0 contains our unclipped green rectangle. In this draw, we sample
the pixels from slot N - 2 from texture 1 to draw the blue rectangle into this slot.
Slot N - 1:
Draw to texture 1:
Then, into Slot N - 2 of texture 1, which contains our red rectangle, we sample the pixels
from slot N - 1 of texture 0 which contain our green and blue rectangles.
Draw to render target
At this point, we can sample the pixels from slot N - 1 of texture 1 to draw the final
rendition.
Nuances
are scheduled. At clip depths 1 and 2, we can draw directly to the final target within a
single round.
we can use a
LoadOp::Clear
operation. Otherwise, we need to clear the dirty slots usinga fine grained render pass.
For more information about this algorithm, see this Zulip thread.