Skip to content

Optimizations to OpenPBR graph#2459

Merged
jstone-lucasfilm merged 3 commits into
AcademySoftwareFoundation:mainfrom
jstone-lucasfilm:main
Jun 26, 2025
Merged

Optimizations to OpenPBR graph#2459
jstone-lucasfilm merged 3 commits into
AcademySoftwareFoundation:mainfrom
jstone-lucasfilm:main

Conversation

@jstone-lucasfilm

Copy link
Copy Markdown
Member

This changelist implements two optimizations to the graph for OpenPBR Surface, replacing mix operations on leaf BSDFs with pre-multiplied add operations. Pre-multiplied add operations take better advantage of dynamic branching in hardware shading languages, and should have a neutral or positive impact on software shading languages.

Performance tests were conducted on an NVIDIA RTX A6000 at 4K resolution, and the following timing improvements were seen:

OpenPBR Carpaint: 16ms -> 7ms
OpenPBR Glass: 27ms -> 11ms
OpenPBR Pearl: 16ms -> 12ms
OpenPBR Aluminum: 14ms -> 5ms

This changelist implements two optimizations to the graph for OpenPBR Surface, replacing `mix` operations on leaf BSDFs with pre-multiplied `add` operations.  Pre-multiplied `add` operations take better advantage of dynamic branching in hardware shading languages, and should have a neutral or positive impact on software shading languages.

Performance tests were conducted on an NVIDIA RTX A6000 at 4K resolution, and the following timing improvements were seen:

OpenPBR Carpaint: 16ms -> 7ms
OpenPBR Glass: 27ms -> 11ms
OpenPBR Pearl: 16ms -> 12ms
OpenPBR Aluminum: 14ms -> 5ms
@jstone-lucasfilm

Copy link
Copy Markdown
Member Author

OpenPBR Carpaint: 16ms -> 7ms

CarpaintBefore
CarpaintAfter

OpenPBR Glass: 27ms -> 11ms

GlassBefore
GlassAfter

OpenPBR Pearl: 16ms -> 12ms

PearlBefore
PearlAfter

OpenPBR Aluminum: 14ms -> 5ms

AluminumBefore
AluminumAfter

@dgovil

dgovil commented Jun 25, 2025

Copy link
Copy Markdown
Contributor

This looks pretty great! Would you know if this performance gain is consistently positive (even if the numbers vary) across the other backends?
Lee is back next week so I could ask him to check then on Metal.

@jstone-lucasfilm

Copy link
Copy Markdown
Member Author

@dgovil I had a chance to test performance in the MaterialX Web Viewer, which uses a slightly different hardware shading language, and the improvements were consistent there.

My expectation is that any hardware shading language with support for dynamic branches should see a similar performance improvement, and I believe MSL is in that category, though it's certainly worthwhile to confirm this.

@kwokcb kwokcb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good optimization especially as it's very wasteful to compute both sides of the mix() when not required due to pre-conditional branching.

The GPU should be highly optimized to perform a *+ combination. I'm not sure but it may be faster to have a single node which does this scale+offset function but maybe something to think about for afterwards. Depends how code generation is optimizing this but I'm guessing it's all inlined / single line output.

@jstone-lucasfilm jstone-lucasfilm merged commit 9754e6a into AcademySoftwareFoundation:main Jun 26, 2025
32 checks passed
@ld-kerley

Copy link
Copy Markdown
Contributor

I just got a chance to test this - and can confirm that MSL also sees the similar level of improvement in performance. We certainly want to find the right way to leverage this significant discovery.

I do think that we should not forget this also changes the MDL and OSL implementations as well. Has someone profiled the changes in a non HW shading language?

I would also note that I think this change makes the intention in the material design of the nodegraph less clear. I think it would be interesting to try and explore ideas where the HW shading languages achieve this significant performance improvement, thru an optimization at shader generation time, or perhaps an implementation that is specific to the languages. Instead of obfuscating the nodegraph that is used by all languages.

Was this change reviewed by the OpenPBR stakeholders? The MaterialX nodegraph implementation is often referred to by that group as the reference implementation, and I feel that this change makes the design of OpenPBR harder to understand if referring to this nodegraph.

@portsmouth

portsmouth commented Jul 3, 2025

Copy link
Copy Markdown

Thanks for pointing this out @ld-kerley. It got merged quickly (2 days!) so I didn’t notice it in time.

I had a brief look, and actually I don’t understand how this works (most likely due to not studying it for long enough..). It replaces the mix operation between thin-film and no-thin-film cases, with addition of two BSDFs with altered weights. The thin film weight inverted is multiplied into one of these weights. But this doesn’t seem to compute the same thing, and for example if the thin film weight is zero it seems it would produce a NaN. Maybe there’s an error, or also possibly I’m not understanding the logic (because the XML is a pain to reverse engineer.. 🤷‍♂️).

(I also agree with Lee that this does further obfuscate the logic, since it’s no longer clear that the thin film weight is functioning as a mix weight (assuming the math is correct or can be fixed up).

Though I think the graph was already pretty hard to understand, so it doesn’t really impact it much. I don’t feel that the XML, or the big cloud of nodes it generates in a node editor, functions very well as a reference implementation currently, since it’s both hard to read, as well as not giving the full details since much of the logic is hidden in the details of MaterialX code generation anyway.

It’s also not really as faithful an implementation as one could do if writing the code from scratch, since the nodes don’t give full control over the internal computation. So anyway, this change, assuming it is correct, does not make it much worse than it was in terms of being a reference. But yeah, agreed that ideally we would not be compromising the clarity of the graph logic for a low level optimization in one particular back-end)

@jstone-lucasfilm

jstone-lucasfilm commented Jul 3, 2025

Copy link
Copy Markdown
Member Author

@portsmouth This is certainly a shading model definition that would be more easily authored in ShadingLanguageX, and we're very interested in making that functionality available within MaterialX in the future.

In terms of the math, though, I don't see any cases where a NaN can be produced, and indeed most of the material examples in the before-and-after renders above have a thin-film weight of zero. I've verified that no visual changes are produced in both GLSL and OSL, and we're working with Kai at NVIDIA to verify the MDL case.

Can you clarify the situations in which you have concerns about the math differing and producing artifacts?

@portsmouth

portsmouth commented Jul 3, 2025

Copy link
Copy Markdown

Apologies, so I misunderstood that the invert(X) node was doing the reciprocal 1/X, while presumably it actually does 1 - X. (That's inversion for a shader writer I suppose..).

Then I see that your changes just amount to reimplementing the mix using 1 - weight terms, i.e. in pseudo-code (which I need to mentally or actually translate the XML into in order to understand it):

w_tf = thin_film_weight
w_s  = specular_weight

dielectric_reflection_blend = (1 - w_tf) * dielectric_bsdf(no_film) + w_tf * dielectric_bsdf(film)
                            = mix(dielectric_bsdf(no_film), dielectric_bsdf(film), w_tf)

metal_bsdf_tf_blend = metal_bsdf + metal_bsdf_tf
                    = generalized_schlick_bsdf(w_s * (1 - w_tf), no_film) + generalized_schlick_bsdf(w_s * w_tf , film)
                    = w_s * mix(generalized_schlick_bsdf(no_film), generalized_schlick_bsdf(film), w_tf)

This is a bit harder to understand, but as noted it was already hard anyway.

It seems somewhat surprising that the generated GLSL code performs better, but the timings show that it does. This seems like something that possibly could be handled in the code generation stage though (and if so that would better as it would apply similar optimizations elsewhere).

@jstone-lucasfilm

Copy link
Copy Markdown
Member Author

@portsmouth The key reason that this is a performance win in hardware shading languages (and indeed in one CPU/GPU path tracer that I've worked on) is that a weight of zero allows the renderer to skip all of the work for the associated BSDF node, while performing a mix between the outputs of two BSDF nodes doesn't guarantee that the renderer can detect and implement this optimization.

I really like the idea of implementing this optimization as a "refactor step" in shader generators, where it could apply to shading models beyond just OpenPBR Surface, but this actually turns out to be very challenging in practice. I went quite far down this road before implementing the current graph optimization, and it's challenging for the shader generator itself to detect how a specific mix operation should be refactored, as there are so many different cases it would need to be aware of.

Implementing this optimization manually has the advantage of allowing us to rigorously test the output, as I've done here, making sure that no edge cases will lead to a different look being generated.

If you or @ld-kerley have ideas on implementing this optimization in the shader generator, I strongly encourage you to give this a try, and I'd love to be shown a way that this can be done robustly and efficiently.

@portsmouth

Copy link
Copy Markdown

performing a mix between the outputs of two BSDF nodes doesn't guarantee that the renderer can detect and implement this optimization.

It does seem surprising if GLSL does not completely optimize away one of the terms in a genuine mix if the weight evaluates to zero (or one). I can see it might be a problem if the BSDFs are evaluated first, then the mix applied to the results though, which maybe is what the code generation does?

@jstone-lucasfilm

Copy link
Copy Markdown
Member Author

@portsmouth In nearly all hardware shading languages, the logic for a complex function such as a BSDF can be skipped at runtime when a dynamic branch is present in the shading code, selecting between two code paths based on the result of a per-pixel or per-warp computation.

In many public implementations of the MaterialX BSDFs, including those in our GLSL/MSL/ESSL library, the dynamic branch tests whether the value of weight is less than epsilon, as this optimization catches a vast number of cases where a BSDF is available for artistic use but not used by a specific material asset.

The graph optimization in this PR just builds upon the common presence of these dynamic branches, making it easier for a renderer to skip code in cases that it could not previously detect.

@portsmouth

portsmouth commented Jul 4, 2025

Copy link
Copy Markdown

Right I see, the BSDFs have an early out even for non-zero weights, as long as the weight is sufficiently small. So moving the mix weight into those weights, allows that early out to happen.

Perhaps instead, if the mix function itself was made to early out as well, that would work without the refactor? i.e. wrap every different mix in the generated shader explicitly like

vec3 metal_tf_mix(in float w, ...)  // ... is metal BSDF params
{
   if (w < eps)     return metal_bsdf_nofilm (...);
   if (w > 1.0-eps) return metal_bsdf_withfilm(...);
   return mix(w, metal_bsdf_nofilm (...), metal_bsdf_withfilm(...));
}

@jstone-lucasfilm

Copy link
Copy Markdown
Member Author

We're thinking along similar lines, @portsmouth, and I tested that exact approach early in the process of implementing this optimization, but unfortunately the GLSL shader compiler doesn't view these as valid dynamic branches when it generates the final GPU instructions, so it did not produce any performance improvement over the original shading code.

@jstone-lucasfilm

Copy link
Copy Markdown
Member Author

The issue may be that the logic that the shader compiler would need to skip is external to the function with the dynamic branch when written in the way described above, while the existing weight tests in our GLSL/MSL/ESSL library are requesting that code be skipped within the same function.

@portsmouth

portsmouth commented Jul 4, 2025

Copy link
Copy Markdown

Another approach might be to just apply your transformation (from mix to addition), in the generated code:

vec3 metal_tf_mix(in float w, ...)  // ... is metal BSDF params
{
   return metal_bsdf_nofilm (1-w, ...) + metal_bsdf_withfilm(w, ...);
}

Then the early-out will happen inside the BSDFs if w is close to 0 or 1. (Surely this works..).

@jstone-lucasfilm

Copy link
Copy Markdown
Member Author

@portsmouth For reference, here is the concrete code that performs a mix operation in GLSL, and I believe I tried nearly every combination of if tests that might theoretically generate a dynamic branch on the GPU, but without any luck:

https://github.com/AcademySoftwareFoundation/MaterialX/blob/main/libraries/pbrlib/genglsl/mx_mix_bsdf.glsl

I'd encourage other developers to continue experimenting with this idea, though, as if we could make it work robustly, it would allow a much wider set of shading models to benefit from this optimization.

@jstone-lucasfilm

Copy link
Copy Markdown
Member Author

One key difference between the real-world implementation and your examples is that the input to a BSDF mix can be any arbitrary combination of BSDFs, not simply two leaf BSDFs such as generalized_schlick_bsdf with and without thin-film.

It's that complexity that we'd need to handle if we implemented this as a refactoring step in our shader generator, and the space of cases we encounter in MaterialX shading models is relatively large.

As mentioned above, though, I think it's worthwhile for developers to experiment further along this path, as a working generator solution would provide a very powerful optimization, fully independent of the shading model that the artist is working with.

@ld-kerley

Copy link
Copy Markdown
Contributor

If I'm understanding things correctly - I think a shader generation time optimization might actually be pretty easy to peform - but let me echo back what I'm understanding so far of why things are faster.

The case we're trying to catch here is the mix input on the BSDF mix being 0 or 1? If thats the case, then I believe we could just add an optimization step on the graph that if it can detect mix is either of these values we just directly sever the connection to the opposing input (we actually did this very optimization in the Imageworks internal shader graph generator and saw massive wins). This works if the mix is a knowable value. In the case that it's a driven value and we can't know it's value at code generation time, then we can just construct the same <add> plus <invert> graph that Jonathan created by hand programmatically, rewiring the mix to the weight of the two respective BSDF nodes upstream. Obviously we have to be careful that a weight exists on the upstream node etc, and we are baking in a convention around the behavior of the BSDF nodes, but this PR is already taking that stance here, so I don't see that as a problem.

The benefit of taking the shader generation approach would be that, as well as the clarity arguments above, we would also be able to provide a shader generation option to disable the optimization if we found a render context where it wasn't optimal. It's possible that other render backends are able to optimize the mix in a more native, or more aggressive way, thus this refactor might actually be a backwards step for performance in some cases.

I'm a bit slammed right now, but I'd be happy to take a look at what a code generation stage optimization might look like. So perhaps it's a good idea to pause on the aggressive over optimization of the node graphs, so we don't have too much work to roll back later.

@jstone-lucasfilm

Copy link
Copy Markdown
Member Author

We'd really appreciate the research and development work, if you have the time to look into this, and I've written up an initial call to action here:

#2480

Note that, in some cases, the BSDF graph first needs to be refactored so that simple mix operations between leaf BSDFs can be isolated and optimized, as was the case in #2467.

I wouldn't be overly concerned about the existing manual graph optimizations, which can trivially be converted back to mix operations if an automated solution turns out to be robust and effective.

AdrienHerubel pushed a commit to AcademySoftwareFoundation/OpenPBR that referenced this pull request Oct 2, 2025
* Integrate OpenPBR updates from MaterialX project

This changelist integrates two post-1.1 updates to OpenPBR Surface from the MaterialX project:

- Optimizations to OpenPBR graph (AcademySoftwareFoundation/MaterialX#2459)
- Add code generation hints support (AcademySoftwareFoundation/MaterialX#1954)

The more substantial update is the graph optimization, and I've copied the performance measurements from the original change for reference:

Performance tests were conducted on an NVIDIA RTX A6000 at 4K resolution, and the following timing improvements were seen:

OpenPBR Carpaint: 16ms -> 7ms
OpenPBR Glass: 27ms -> 11ms
OpenPBR Pearl: 16ms -> 12ms
OpenPBR Aluminum: 14ms -> 5ms

* Omit hardware shading optimizations
AdrienHerubel added a commit to AcademySoftwareFoundation/OpenPBR that referenced this pull request Mar 3, 2026
* Update OpenPBR default example (#216)

This changelist updates the OpenPBR default example, matching its values to the latest default values of the shading model.

* Change thin film IOR default (#211)

From 1.5 to 1.4.  

As this won't make much difference to the look in implementations that ignore the adjacent IORs of the film.

But for those that take it into account, this will make the film visible rather than invisible by default (since `specular_ior` is 1.5 by default, and `coat_ior` 1.6).

* Add note about dark fuzz (#207)

Addressing #176

* Enable Zeltner sheen (#217)

This changelist enables Zeltner sheen in the reference implementation of OpenPBR, leveraging the new functionality in MaterialX 1.39.

Additionally, the open_pbr_velvet.mtlx example has been updated to account for the visual differences between Conty-Kulla and Zeltner sheen.

* Add a "resources" section to the front page (#215)

With links to
  - MaterialX web viewer running OpenPBR default material
  - OpenPBR-viewer project and web app

* Clarify formula for emission color (#209)

Following the discussion of #85.

* Update subsurface color types (#220)

This changelist updates the types associated with physical color values for subsurface scattering in OpenPBR, aligning with the conclusions of recent threads on ASWF Slack channels.

- Change `subsurface_radius_scale` from a `vector3` to a `color3` in the specification, aligning with the MaterialX implementation of OpenPBR.
- Change the `radius` input of `subsurface_bsdf` from a `vector3` to a `color3` in the MaterialX implementation, aligning with the current definition of the `subsurface_bsdf` node in MaterialX 1.39.

* Update specification and reference to v1.1 (#221)

* Add "Flexibility of implementation" section (#248)

* Add page to propose real-time approximations

* Mention layering and mixing approximation

* Mantion specular reflection approximation

* Mention anisotropic reflection approximations

* Fix typos

* Wording

* Reword the section, move it to the main document, remove the annex

* Subsurface in thin-walled mode, small clarification (#258)

* Merge v1.1 development to main (#222)

This changelist merges v1.1 development from dev_1.1 to main, in preparation for marking the release of OpenPBR v1.1.

* Subsurface in thin-walled mode, small clarification

* Subsurface in thin-walled mode, small clarification

---------

Co-authored-by: Jonathan Stone <jstone@lucasfilm.com>

* Allow emission_color components to exceed 1 (#260)

* Add more material examples (#257)

* Merge v1.1 development to main (#222)

This changelist merges v1.1 development from dev_1.1 to main, in preparation for marking the release of OpenPBR v1.1.

* Adding more material examples
- Bumped MaterialX version from 1.38 to 1.39 on existing examples
- Added new examples

* - Added MIT Black

* Color updates:
- Updated all colors to ACEScg
- All metals now have F82 as specular_color
- Added a few more metals from Portsmouth's chart

* Material updates:
- Added SSS to Sclera and made it less red
- Added LCD Display material
- Added two variations of Light Bulb with different CCT

* Material updates:
- Added base_diffuse_roughness to Brick, Charcoal, and Sand
- Made Velvet purple so it's more convincing

* Updated roughness values

* Removed a few materials that were less useful as examples

* Renamed Polyurethane

* Updated Blood material

* Added Abbe value to Blood material

* Updated IOR of Blood

* - Updated coffee material

* Split Honey into two materials, liquid and crystallized

* - Updated Honey (Crystallized) roughness value

---------

Signed-off-by: Adrien Herubel <AdrienHerubel@users.noreply.github.com>
Co-authored-by: Jonathan Stone <jstone@lucasfilm.com>
Co-authored-by: Adrien Herubel <AdrienHerubel@users.noreply.github.com>

* Integrate OpenPBR update from MaterialX project (#265)

* Integrate OpenPBR updates from MaterialX project

This changelist integrates two post-1.1 updates to OpenPBR Surface from the MaterialX project:

- Optimizations to OpenPBR graph (AcademySoftwareFoundation/MaterialX#2459)
- Add code generation hints support (AcademySoftwareFoundation/MaterialX#1954)

The more substantial update is the graph optimization, and I've copied the performance measurements from the original change for reference:

Performance tests were conducted on an NVIDIA RTX A6000 at 4K resolution, and the following timing improvements were seen:

OpenPBR Carpaint: 16ms -> 7ms
OpenPBR Glass: 27ms -> 11ms
OpenPBR Pearl: 16ms -> 12ms
OpenPBR Aluminum: 14ms -> 5ms

* Omit hardware shading optimizations

* Move anisotropy figure before Multiple Scattering section

* Revert CHANGELOG.md to upstream/dev_1.2 (belongs in separate PR #295)

---------

Signed-off-by: Adrien Herubel <AdrienHerubel@users.noreply.github.com>
Co-authored-by: Jonathan Stone <jstone@lucasfilm.com>
Co-authored-by: Julien Guertault <9511025+virtualzavie@users.noreply.github.com>
Co-authored-by: Anton Palmqvist <13031779+AntonPalmqvist@users.noreply.github.com>
Co-authored-by: Adrien Herubel <AdrienHerubel@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants