[RFC] sub/sd_sbr: use instanced rendering for significant performance improvements #17187

afishhh · 2025-12-27T02:23:38Z

This API is still a work-in-progress and requires afishhh/subrandr#125.
I can split out the "split packer out from ass_mp" into a separate PR if desired.

What the API looks like and why

The API I settled on here is not what I came with to IRC last time and is more similar to what libass does since I accepted that blending on the CPU is hopeless now (way too slow as kasper93 pointed out on IRC, I still tried to make it faster but CPUs are just not meant for this).

Thus this API assumes the user has access to a faster way of compositing bitmaps with bilinear interpolation.
Bilinear interpolation is assumed because GPUs have it for free in hardware and it allows the implementation to do tricks like drawing axis-and-pixel-aligned rectangles by drawing an interpolated single-pixel bitmap (saves significant atlas space and CPU work if there's many backgrounds, see Japanese subtitles of https://www.youtube.com/watch?v=ksdvNgqOToQ for a case where storing all backgrounds as bitmaps actually significantly impacts atlas size)

Bitmaps are also de-duplicated and output as "instances" of "images" with every instance referencing a single image, this again saves a lot of atlas space and CPU work in the (not uncommon) case where there's many instances of the same glyph (of course accounting for subpixel positioning and stuff) in the frame. (Correct me if I'm wrong but it doesn't seem like libass does this? Maybe ASS subbers just don't repeat the same things 100 times on the same frame)

Also since subrandr wants to be able to draw real bitmaps like emojis, it has to use the BGRA8 output format instead of the A8 that libass uses. This means that each color/alpha variant of a bitmap has to be separate, this does not appear to be a huge problem in practice though.

Currently the (simplified) API looks like this:

// size is not part of public API yada yaya
typedef struct sbr_output_image {
  uint32_t width, height;
  // This field is always NULL when returned by subrandr and isn't
  // read or modified by the library.
  // Can be used to associate custom data with images to simplify packing.
  void *user_data;
} sbr_output_image;

// size is not part of public API yada yaya
typedef struct sbr_output_instance {
  int32_t x, y;
  uint32_t width, height;
  struct sbr_output_image *base;
  struct sbr_output_instance *next;
} sbr_output_instance;

// TODO: This name is a bit long. If anyone has better ideas, please share.
typedef struct sbr_instanced_raster_pass sbr_instanced_raster_pass;

sbr_instanced_raster_pass *
sbr_renderer_render_instanced(sbr_renderer *, sbr_subtitle_context const *,
                              uint32_t t, uint64_t flags);

int sbr_output_image_draw_to(sbr_output_image const *, sbr_instanced_raster_pass *,
                             int32_t off_x, int32_t off_y, sbr_bgra8 *buffer,
                             uint32_t width, uint32_t height,
                             uint32_t stride);

To explain some potentially non-obvious design decisions:

Even though they are called "images" sbr_output_images may not hold a complete output image internally, this is why they are only exposed as a "draw into this buffer" function (this is used for non-pixel-aligned rectangles like underlines or strike-throughs which are drawn anti-aliased on the CPU (because instances need integer output dimensions)).
sbr_output_images hold an additional "user data" pointer to allow users like mpv to associate data with images in O(1) time without the complexity of their own hash map. In this PR this is used to associate a(n index of a) sub_bitmap with each sbr_output_image (this sub_bitmap is then accessed when constructing the real instanced output after packing).

Benchmark (singular)

Before	After	Video
		VS5SaZAxH7A

Rasterization no longer a 100ms bottleneck and my terrible font matching code in layout is probably more noticable here.
Sometimes the rasterization stage still takes 30ms for no reason though I am tempted to just blame this on scheduling and live happily.

TODO

Remove subrandr includes I accidentally left in the 1st commit (initially it was after the 2nd and I missed this while rebasing)
Write description for 2nd commit
Do more testing
Finalize API design and release v1.1

afishhh added 2 commits December 27, 2025 01:52

sub/ass_mp: split out packer into a separate file

cbd3d1b

sub/sd_sbr: use instanced rendering to avoid blending in software

02bbbd3

afishhh mentioned this pull request Dec 27, 2025

RGBA blending (software fallback?) seems to produce wrong results sometimes #17188

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] sub/sd_sbr: use instanced rendering for significant performance improvements #17187

[RFC] sub/sd_sbr: use instanced rendering for significant performance improvements #17187

afishhh commented Dec 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[RFC] sub/sd_sbr: use instanced rendering for significant performance improvements #17187

Are you sure you want to change the base?

[RFC] sub/sd_sbr: use instanced rendering for significant performance improvements #17187

Conversation

afishhh commented Dec 27, 2025

What the API looks like and why

Benchmark (singular)

TODO

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant