Skip to content

Conversation

@appgurueu
Copy link
Contributor

@appgurueu appgurueu commented Dec 6, 2025

This moves skinning (transformation of vertices by bone transforms) from the CPU ("software") to the GPU ("hardware") on the OpenGL 3 driver in order to improve performance. This is done by moving weights to vertex buffers and uploading them as vec4 vertex attributes (weights + joint IDs). Matrices are uploaded as a UBO.

There is some refactoring here that could maybe be split off into a separate PR.

The gains are unfortunately limited a bit by the computation of the joint transforms that still takes place on the CPU. This can be optimized further (and is already optimized a bit here, e.g. by eliminating some useless conversions and optimizing the Transform -> matrix conversion), but that should probably be done in a follow-up PR. (In particular, this can in principle be parallelized well, and we might not want to spread our bones on the heap. Matrix multiplication can maybe also be optimized further using SIMD.)

Closes #9218.

To do

This PR is Ready for Review.

How to test

Use the /spider_army command (included in the devtest gltf mod) to spawn 10³ spiders in the air. Observe them and note the FPS. You should see something like a ~2x increase vs master on the OGL 3 driver.

Also make sure that the fallback to SW skinning works as expected: Try the opengl driver, edit getMaxJointTransforms() to return 0 in the OGL 3 driver.

@LizzyFleckenstein03
Copy link
Contributor

performance improvement on my machine:
using AMD Radeon 780M integrated gpu: ~20fps before this change, ~50fps after
using AMD Radeon RX 7700S dedicated gpu: ~20fps before this change, ~80fps after
(cpu used: AMD Ryzen 9 7940HS (16) @ 5.263GHz)

great work! 👍

@lhofhansl
Copy link
Contributor

Seems to work. Some indication in the profiler would be nice - number of vertex attributes uploaded (or whatever makes sense).

} else {
assert(b->IndexBuffer);
if (b->ChangedID != b->IndexBuffer->getChangedID() || !b->vbo_ID) {
} else if (auto *ib = dynamic_cast<const scene::IIndexBuffer *>(b->Buffer)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so happy about dynamic_cast in hot code paths. is this good idea?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe not. it does seem like this might have some unwanted cost if there are very many small buffers.

i've replaced it with a virtual function call to get the buffer type and a switch.

@appgurueu appgurueu added the Action / change needed Code still needs changes (PR) / more information requested (Issues) label Dec 7, 2025
@appgurueu appgurueu removed the Action / change needed Code still needs changes (PR) / more information requested (Issues) label Dec 10, 2025
@appgurueu appgurueu mentioned this pull request Dec 13, 2025
18 tasks
@lhofhansl
Copy link
Contributor

I had a this applied for the past days... I have not observed anything weird or unexpected.

@lhofhansl
Copy link
Contributor

What's in the way of this?
I'll admit a detailed review is hard to do. I skimmed the code and nothing stood out as bad/wrong. I have not seen any issues with this.

@sfan5
Copy link
Member

sfan5 commented Dec 24, 2025

@appgurueu did you check if this works with dynamic shadows?
I believe the "let's just replace the shader material" hack in the code will break this.

@appgurueu
Copy link
Contributor Author

Good point. I tested it.

Shadows do break with this, as the shader used for rendering won't do the skinning, so shadows are cast for the static poses.

It's relatively easy to do a simple fix by also applying the skinning logic in the appropriate shadow shader. That's what I've done for now.

There is a good chance that accessing disabled vertex attributes and running a simple check on them is cheap enough that the overhead doesn't matter, but maybe I should make sure that scene nodes that don't need skinning get a simple shader with zero overhead?

@kromka-chleba
Copy link
Contributor

What is the max number of bone influences per vertex with this PR?
Modern game engines support either 4 or 8 bones per vertex. 4 is usually for mobile devices while 8 can allow more complex rigs with corrective bones. Godot 4 supports up to 8 bones per vertex.

@appgurueu
Copy link
Contributor Author

The limit is at 4 per vertex, introduced by #16655

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use hardware skinning esp. for animated meshes

5 participants