Optimizations to OpenPBR graph#2459
Conversation
This changelist implements two optimizations to the graph for OpenPBR Surface, replacing `mix` operations on leaf BSDFs with pre-multiplied `add` operations. Pre-multiplied `add` operations take better advantage of dynamic branching in hardware shading languages, and should have a neutral or positive impact on software shading languages. Performance tests were conducted on an NVIDIA RTX A6000 at 4K resolution, and the following timing improvements were seen: OpenPBR Carpaint: 16ms -> 7ms OpenPBR Glass: 27ms -> 11ms OpenPBR Pearl: 16ms -> 12ms OpenPBR Aluminum: 14ms -> 5ms
|
This looks pretty great! Would you know if this performance gain is consistently positive (even if the numbers vary) across the other backends? |
|
@dgovil I had a chance to test performance in the MaterialX Web Viewer, which uses a slightly different hardware shading language, and the improvements were consistent there. My expectation is that any hardware shading language with support for dynamic branches should see a similar performance improvement, and I believe MSL is in that category, though it's certainly worthwhile to confirm this. |
kwokcb
left a comment
There was a problem hiding this comment.
This looks like a good optimization especially as it's very wasteful to compute both sides of the mix() when not required due to pre-conditional branching.
The GPU should be highly optimized to perform a *+ combination. I'm not sure but it may be faster to have a single node which does this scale+offset function but maybe something to think about for afterwards. Depends how code generation is optimizing this but I'm guessing it's all inlined / single line output.
9754e6a
into
AcademySoftwareFoundation:main
|
I just got a chance to test this - and can confirm that MSL also sees the similar level of improvement in performance. We certainly want to find the right way to leverage this significant discovery. I do think that we should not forget this also changes the MDL and OSL implementations as well. Has someone profiled the changes in a non HW shading language? I would also note that I think this change makes the intention in the material design of the nodegraph less clear. I think it would be interesting to try and explore ideas where the HW shading languages achieve this significant performance improvement, thru an optimization at shader generation time, or perhaps an implementation that is specific to the languages. Instead of obfuscating the nodegraph that is used by all languages. Was this change reviewed by the OpenPBR stakeholders? The MaterialX nodegraph implementation is often referred to by that group as the reference implementation, and I feel that this change makes the design of OpenPBR harder to understand if referring to this nodegraph. |
|
Thanks for pointing this out @ld-kerley. It got merged quickly (2 days!) so I didn’t notice it in time. I had a brief look, and actually I don’t understand how this works (most likely due to not studying it for long enough..). It replaces the mix operation between thin-film and no-thin-film cases, with addition of two BSDFs with altered weights. The thin film weight inverted is multiplied into one of these weights. But this doesn’t seem to compute the same thing, and for example if the thin film weight is zero it seems it would produce a NaN. Maybe there’s an error, or also possibly I’m not understanding the logic (because the XML is a pain to reverse engineer.. 🤷♂️). (I also agree with Lee that this does further obfuscate the logic, since it’s no longer clear that the thin film weight is functioning as a mix weight (assuming the math is correct or can be fixed up). Though I think the graph was already pretty hard to understand, so it doesn’t really impact it much. I don’t feel that the XML, or the big cloud of nodes it generates in a node editor, functions very well as a reference implementation currently, since it’s both hard to read, as well as not giving the full details since much of the logic is hidden in the details of MaterialX code generation anyway. It’s also not really as faithful an implementation as one could do if writing the code from scratch, since the nodes don’t give full control over the internal computation. So anyway, this change, assuming it is correct, does not make it much worse than it was in terms of being a reference. But yeah, agreed that ideally we would not be compromising the clarity of the graph logic for a low level optimization in one particular back-end) |
|
@portsmouth This is certainly a shading model definition that would be more easily authored in ShadingLanguageX, and we're very interested in making that functionality available within MaterialX in the future. In terms of the math, though, I don't see any cases where a NaN can be produced, and indeed most of the material examples in the before-and-after renders above have a thin-film weight of zero. I've verified that no visual changes are produced in both GLSL and OSL, and we're working with Kai at NVIDIA to verify the MDL case. Can you clarify the situations in which you have concerns about the math differing and producing artifacts? |
|
Apologies, so I misunderstood that the Then I see that your changes just amount to reimplementing the mix using w_tf = thin_film_weight
w_s = specular_weight
dielectric_reflection_blend = (1 - w_tf) * dielectric_bsdf(no_film) + w_tf * dielectric_bsdf(film)
= mix(dielectric_bsdf(no_film), dielectric_bsdf(film), w_tf)
metal_bsdf_tf_blend = metal_bsdf + metal_bsdf_tf
= generalized_schlick_bsdf(w_s * (1 - w_tf), no_film) + generalized_schlick_bsdf(w_s * w_tf , film)
= w_s * mix(generalized_schlick_bsdf(no_film), generalized_schlick_bsdf(film), w_tf)This is a bit harder to understand, but as noted it was already hard anyway. It seems somewhat surprising that the generated GLSL code performs better, but the timings show that it does. This seems like something that possibly could be handled in the code generation stage though (and if so that would better as it would apply similar optimizations elsewhere). |
|
@portsmouth The key reason that this is a performance win in hardware shading languages (and indeed in one CPU/GPU path tracer that I've worked on) is that a I really like the idea of implementing this optimization as a "refactor step" in shader generators, where it could apply to shading models beyond just OpenPBR Surface, but this actually turns out to be very challenging in practice. I went quite far down this road before implementing the current graph optimization, and it's challenging for the shader generator itself to detect how a specific Implementing this optimization manually has the advantage of allowing us to rigorously test the output, as I've done here, making sure that no edge cases will lead to a different look being generated. If you or @ld-kerley have ideas on implementing this optimization in the shader generator, I strongly encourage you to give this a try, and I'd love to be shown a way that this can be done robustly and efficiently. |
It does seem surprising if GLSL does not completely optimize away one of the terms in a genuine mix if the weight evaluates to zero (or one). I can see it might be a problem if the BSDFs are evaluated first, then the mix applied to the results though, which maybe is what the code generation does? |
|
@portsmouth In nearly all hardware shading languages, the logic for a complex function such as a BSDF can be skipped at runtime when a dynamic branch is present in the shading code, selecting between two code paths based on the result of a per-pixel or per-warp computation. In many public implementations of the MaterialX BSDFs, including those in our GLSL/MSL/ESSL library, the dynamic branch tests whether the value of The graph optimization in this PR just builds upon the common presence of these dynamic branches, making it easier for a renderer to skip code in cases that it could not previously detect. |
|
Right I see, the BSDFs have an early out even for non-zero weights, as long as the weight is sufficiently small. So moving the mix weight into those weights, allows that early out to happen. Perhaps instead, if the mix function itself was made to early out as well, that would work without the refactor? i.e. wrap every different vec3 metal_tf_mix(in float w, ...) // ... is metal BSDF params
{
if (w < eps) return metal_bsdf_nofilm (...);
if (w > 1.0-eps) return metal_bsdf_withfilm(...);
return mix(w, metal_bsdf_nofilm (...), metal_bsdf_withfilm(...));
} |
|
We're thinking along similar lines, @portsmouth, and I tested that exact approach early in the process of implementing this optimization, but unfortunately the GLSL shader compiler doesn't view these as valid dynamic branches when it generates the final GPU instructions, so it did not produce any performance improvement over the original shading code. |
|
The issue may be that the logic that the shader compiler would need to skip is external to the function with the dynamic branch when written in the way described above, while the existing |
|
Another approach might be to just apply your transformation (from mix to addition), in the generated code: vec3 metal_tf_mix(in float w, ...) // ... is metal BSDF params
{
return metal_bsdf_nofilm (1-w, ...) + metal_bsdf_withfilm(w, ...);
}Then the early-out will happen inside the BSDFs if |
|
@portsmouth For reference, here is the concrete code that performs a I'd encourage other developers to continue experimenting with this idea, though, as if we could make it work robustly, it would allow a much wider set of shading models to benefit from this optimization. |
|
One key difference between the real-world implementation and your examples is that the input to a BSDF It's that complexity that we'd need to handle if we implemented this as a refactoring step in our shader generator, and the space of cases we encounter in MaterialX shading models is relatively large. As mentioned above, though, I think it's worthwhile for developers to experiment further along this path, as a working generator solution would provide a very powerful optimization, fully independent of the shading model that the artist is working with. |
|
If I'm understanding things correctly - I think a shader generation time optimization might actually be pretty easy to peform - but let me echo back what I'm understanding so far of why things are faster. The case we're trying to catch here is the The benefit of taking the shader generation approach would be that, as well as the clarity arguments above, we would also be able to provide a shader generation option to disable the optimization if we found a render context where it wasn't optimal. It's possible that other render backends are able to optimize the I'm a bit slammed right now, but I'd be happy to take a look at what a code generation stage optimization might look like. So perhaps it's a good idea to pause on the aggressive over optimization of the node graphs, so we don't have too much work to roll back later. |
|
We'd really appreciate the research and development work, if you have the time to look into this, and I've written up an initial call to action here: Note that, in some cases, the BSDF graph first needs to be refactored so that simple I wouldn't be overly concerned about the existing manual graph optimizations, which can trivially be converted back to |
* Integrate OpenPBR updates from MaterialX project This changelist integrates two post-1.1 updates to OpenPBR Surface from the MaterialX project: - Optimizations to OpenPBR graph (AcademySoftwareFoundation/MaterialX#2459) - Add code generation hints support (AcademySoftwareFoundation/MaterialX#1954) The more substantial update is the graph optimization, and I've copied the performance measurements from the original change for reference: Performance tests were conducted on an NVIDIA RTX A6000 at 4K resolution, and the following timing improvements were seen: OpenPBR Carpaint: 16ms -> 7ms OpenPBR Glass: 27ms -> 11ms OpenPBR Pearl: 16ms -> 12ms OpenPBR Aluminum: 14ms -> 5ms * Omit hardware shading optimizations
* Update OpenPBR default example (#216) This changelist updates the OpenPBR default example, matching its values to the latest default values of the shading model. * Change thin film IOR default (#211) From 1.5 to 1.4. As this won't make much difference to the look in implementations that ignore the adjacent IORs of the film. But for those that take it into account, this will make the film visible rather than invisible by default (since `specular_ior` is 1.5 by default, and `coat_ior` 1.6). * Add note about dark fuzz (#207) Addressing #176 * Enable Zeltner sheen (#217) This changelist enables Zeltner sheen in the reference implementation of OpenPBR, leveraging the new functionality in MaterialX 1.39. Additionally, the open_pbr_velvet.mtlx example has been updated to account for the visual differences between Conty-Kulla and Zeltner sheen. * Add a "resources" section to the front page (#215) With links to - MaterialX web viewer running OpenPBR default material - OpenPBR-viewer project and web app * Clarify formula for emission color (#209) Following the discussion of #85. * Update subsurface color types (#220) This changelist updates the types associated with physical color values for subsurface scattering in OpenPBR, aligning with the conclusions of recent threads on ASWF Slack channels. - Change `subsurface_radius_scale` from a `vector3` to a `color3` in the specification, aligning with the MaterialX implementation of OpenPBR. - Change the `radius` input of `subsurface_bsdf` from a `vector3` to a `color3` in the MaterialX implementation, aligning with the current definition of the `subsurface_bsdf` node in MaterialX 1.39. * Update specification and reference to v1.1 (#221) * Add "Flexibility of implementation" section (#248) * Add page to propose real-time approximations * Mention layering and mixing approximation * Mantion specular reflection approximation * Mention anisotropic reflection approximations * Fix typos * Wording * Reword the section, move it to the main document, remove the annex * Subsurface in thin-walled mode, small clarification (#258) * Merge v1.1 development to main (#222) This changelist merges v1.1 development from dev_1.1 to main, in preparation for marking the release of OpenPBR v1.1. * Subsurface in thin-walled mode, small clarification * Subsurface in thin-walled mode, small clarification --------- Co-authored-by: Jonathan Stone <jstone@lucasfilm.com> * Allow emission_color components to exceed 1 (#260) * Add more material examples (#257) * Merge v1.1 development to main (#222) This changelist merges v1.1 development from dev_1.1 to main, in preparation for marking the release of OpenPBR v1.1. * Adding more material examples - Bumped MaterialX version from 1.38 to 1.39 on existing examples - Added new examples * - Added MIT Black * Color updates: - Updated all colors to ACEScg - All metals now have F82 as specular_color - Added a few more metals from Portsmouth's chart * Material updates: - Added SSS to Sclera and made it less red - Added LCD Display material - Added two variations of Light Bulb with different CCT * Material updates: - Added base_diffuse_roughness to Brick, Charcoal, and Sand - Made Velvet purple so it's more convincing * Updated roughness values * Removed a few materials that were less useful as examples * Renamed Polyurethane * Updated Blood material * Added Abbe value to Blood material * Updated IOR of Blood * - Updated coffee material * Split Honey into two materials, liquid and crystallized * - Updated Honey (Crystallized) roughness value --------- Signed-off-by: Adrien Herubel <AdrienHerubel@users.noreply.github.com> Co-authored-by: Jonathan Stone <jstone@lucasfilm.com> Co-authored-by: Adrien Herubel <AdrienHerubel@users.noreply.github.com> * Integrate OpenPBR update from MaterialX project (#265) * Integrate OpenPBR updates from MaterialX project This changelist integrates two post-1.1 updates to OpenPBR Surface from the MaterialX project: - Optimizations to OpenPBR graph (AcademySoftwareFoundation/MaterialX#2459) - Add code generation hints support (AcademySoftwareFoundation/MaterialX#1954) The more substantial update is the graph optimization, and I've copied the performance measurements from the original change for reference: Performance tests were conducted on an NVIDIA RTX A6000 at 4K resolution, and the following timing improvements were seen: OpenPBR Carpaint: 16ms -> 7ms OpenPBR Glass: 27ms -> 11ms OpenPBR Pearl: 16ms -> 12ms OpenPBR Aluminum: 14ms -> 5ms * Omit hardware shading optimizations * Move anisotropy figure before Multiple Scattering section * Revert CHANGELOG.md to upstream/dev_1.2 (belongs in separate PR #295) --------- Signed-off-by: Adrien Herubel <AdrienHerubel@users.noreply.github.com> Co-authored-by: Jonathan Stone <jstone@lucasfilm.com> Co-authored-by: Julien Guertault <9511025+virtualzavie@users.noreply.github.com> Co-authored-by: Anton Palmqvist <13031779+AntonPalmqvist@users.noreply.github.com> Co-authored-by: Adrien Herubel <AdrienHerubel@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>








This changelist implements two optimizations to the graph for OpenPBR Surface, replacing
mixoperations on leaf BSDFs with pre-multipliedaddoperations. Pre-multipliedaddoperations take better advantage of dynamic branching in hardware shading languages, and should have a neutral or positive impact on software shading languages.Performance tests were conducted on an NVIDIA RTX A6000 at 4K resolution, and the following timing improvements were seen:
OpenPBR Carpaint: 16ms -> 7ms
OpenPBR Glass: 27ms -> 11ms
OpenPBR Pearl: 16ms -> 12ms
OpenPBR Aluminum: 14ms -> 5ms