Skip to content

Improved performance of LeafNode ValueIters#2026

Merged
Idclip merged 3 commits intoAcademySoftwareFoundation:masterfrom
Idclip:leaf_val_iters
Jul 5, 2025
Merged

Improved performance of LeafNode ValueIters#2026
Idclip merged 3 commits intoAcademySoftwareFoundation:masterfrom
Idclip:leaf_val_iters

Conversation

@Idclip
Copy link
Copy Markdown
Contributor

@Idclip Idclip commented Apr 6, 2025

No description provided.

Signed-off-by: Nick Avramoussis <4256455+Idclip@users.noreply.github.com>
…es vectorization misses with clang

Signed-off-by: Nick Avramoussis <4256455+Idclip@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@danrbailey danrbailey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Could this possibly affect performance negatively when the iterator does not evaluate any values? For example ValueOnIterator with a leaf node where all values are inactive or ValueOffIterator with a leaf node where all values are active. I'm wondering if you pay an additional cost loading the buffer to memory in those cases where you would not otherwise.

}
else {
OPENVDB_ASSERT(pos < SIZE);
OPENVDB_ASSUME(pos < SIZE);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interested to know more about this. Is this related to the fact that pos is unsigned and thus can wrap around?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, one reason I dislike unsigned numbers is you an do these one-sided comparisons as a negative will be very large.... SO this is the equivalent of
pos >= 0 && pos < SIZE
but just waiting for errors if it ever changes to signed :> Naturally you can't add the >= 0 case as the compiler will likely whine about a useless comparison then :>

#elif defined(__GNUC__)
#if __GNUC__ >= 13
#define OPENVDB_ASSUME(...) __attribute__((__assume__(__VA_ARGS__)))
#endif
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How substantial is this assume attribute for the performance improvement of ValueAccessors? GCC13 is still not even in the VFX reference platform. Would something like this possibly be a viable alternative until then:

#define OPENVDB_ASSUME(...) if (!(__VA_ARGS__)) { __builtin_unreachable(); }

Comment thread openvdb/openvdb/tree/LeafNode.h Outdated
// Unlike other value iterators, cache the buffer data as part of
// the iterators members to avoid the cost of going through the
// leaf buffer atomic/checking API
, mData(parent->buffer().data()) {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I'm a little cautious of is that there's an uncommon workflow where you can call LeafBuffer::deallocate() and it deletes the data then calling any of the access methods on the LeafBuffer for example will still check the data pointer before attempting to dereference it. I think this version does not do that?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can see, this provides identical semantics to the old style iterator. There is no way to blat the underlying leaf buffer and leave the iterator in an invalid state, unless you invoke the buffer destructor - that is, so long as the LeafBuffer class remains in memory, the access will continue to be valid.

For example the below should behave identically with the old and new behaviour:

    auto iter = [&]()
    {
        openvdb::FloatGrid::TreeType::LeafNodeType a;
        a.fill(1);
        auto v = a.beginValueAll();
        a = openvdb::FloatGrid::TreeType::LeafNodeType();
        std::cerr << v.getValue() << std::endl; // prints 0
        openvdb::FloatGrid::TreeType::LeafNodeType::Buffer b;
        b.fill(1);
        a.swap(b);
        std::cerr << v.getValue() << std::endl; // still points to a's buffer, prints 0
        return v;
    }();

    std::cerr << iter.getValue() << std::endl; //UB, underlying buffer destroyed

Copy link
Copy Markdown
Contributor Author

@Idclip Idclip Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I've just realised this does behave differently with swap:

        openvdb::FloatGrid::TreeType::LeafNodeType a;
        std::cerr << v.getValue() << std::endl; // prints 0
        openvdb::FloatGrid::TreeType::LeafNodeType::Buffer b;
        b.fill(1);
        a.swap(b);
        //  prints 1 with old impl, prints 0 with new impl
        std::cerr << v.getValue() << std::endl;

tbh I don't see this is an issue - I also don't think there's a way to solve this and keep the exact same performance benefits as this PR unless we add friend methods to the leaf buffer or change how swap works. And FWIW, the value accessors will most likely have the same problem.

@danrbailey
Copy link
Copy Markdown
Contributor

I've thoroughly tested this with GCC 11 and GCC 13 today. I found substantial improvements in performance of around 2x using both compilers, this is fantastic! However, I couldn't detect a difference in performance when using the assume optimization on either compiler, possibly I have some compiler flags that conflict or something about the environment I am testing in. It may be more beneficial on clang perhaps.

I also confirmed what you said last time that the potential deallocation issue is the same as the current implementation, so no need to discuss that any further.

I did notice one thing that I wanted to highlight. The iterators are currently default initialized or initialized with a leaf node pointer. If you default initialize or initialize with a nullptr instead of a leaf node pointer, it now attempts to deference the nullptr whereas in the previous implementation, it would call parent() which threw an exception if the parent is a nullptr. I'm not sure what the correct solution is here, but at least asserting the parent pointer before dereferencing would help track down a potential issue here? (I do think we should change the logic here to expect that the parent node is not a nullptr on initialization). To reproduce:

openvdb::FloatTree::LeafNodeType::ValueOnIter iter(leaf.valueMask().beginOn(), nullptr);
iter.setValue(1.0f);

I also wanted to mention that I found a bit more performance on both compilers by breaking the leaf node iterator logic out into a separate header (tree/LeafIterator.h) and collapsing the layers of hierarchy (iteratorbase->sparseiteratorbase->valueiter) down into a single iterator class. Presumably the compiler has more opportunity to optimize with a bit less complexity in the design. It might be worth us considering this.

@Idclip
Copy link
Copy Markdown
Contributor Author

Idclip commented Apr 23, 2025

However, I couldn't detect a difference in performance when using the assume optimization on either compiler, possibly I have some compiler flags that conflict or something about the environment I am testing in. It may be more beneficial on clang perhaps.

To get the optimizer to trigger in the way I wanted was very temperamental. I also experimented splitting out the LeafBuffer into two, ABI identical implementations; one for delay loading, one without. This also triggered the optimization I wanted but was obviously a larger change set. I am not super happy about the assume, it is a bit awkward, but I don't think its a bad solution in lieu of more refactoring.

I also confirmed what you said last time that the potential deallocation issue is the same as the current implementation, so no need to discuss that any further.

Thoughts on the impl of swap()? I could deprecate this and introduce an unsafeSwap

I'm not sure what the correct solution is here, but at least asserting the parent pointer before de-referencing would help track down a potential issue here? (I do think we should change the logic here to expect that the parent node is not a nullptr on initialization).

Can definitely add an assert. I can also deprecate that constructor and introduce once that takes a reference. Ideally want to remove the default constructor as well, but not 100% sure of the consequences of that yet.

I also wanted to mention that I found a bit more performance on both compilers by breaking the leaf node iterator logic out into a separate header (tree/LeafIterator.h) and collapsing the layers of hierarchy (iteratorbase->sparseiteratorbase->valueiter) down into a single iterator class.

Did you try adding final to the the ValueIter? I bet LTO/IPO would help heaps here too. But yeah, this is similar to the code reduction and performance improvements I observed with splitting out delay loading code from the LeafBuffer - a lot is ifdef'ed out but some isn't and it is having an impact.

@danrbailey
Copy link
Copy Markdown
Contributor

Thoughts on the impl of swap()? I could deprecate this and introduce an unsafeSwap

I don't think that's necessary. It's only different in a very niche case when you have a value iterator pointing at a leaf and you do a swap on the leaf and then try and access the value iterator. It's very common to expect the iterator itself to be left in an invalid state if you change the underlying data in any way, so I don't think we need to do anything special to handle this change in behavior.

Can definitely add an assert. I can also deprecate that constructor and introduce once that takes a reference. Ideally want to remove the default constructor as well, but not 100% sure of the consequences of that yet.

Yeah, the assert is better than nothing, would be good to add that in.

Did you try adding final to the the ValueIter? I bet LTO/IPO would help heaps here too. But yeah, this is similar to the code reduction and performance improvements I observed with splitting out delay loading code from the LeafBuffer - a lot is ifdef'ed out but some isn't and it is having an impact.

No, I didn't try that, but we should definitely put final anywhere that we can.

I'm going to approve regardless, it would be nice to add that assert, but that's a minor comment, it looks good to me! Thanks for doing this. I have some other performance improvements I wanted to push up, I suppose I should re-run my benchmarks to see if they are still faster after this change. :)

}
else {
OPENVDB_ASSERT(pos < SIZE);
OPENVDB_ASSUME(pos < SIZE);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, one reason I dislike unsigned numbers is you an do these one-sided comparisons as a negative will be very large.... SO this is the equivalent of
pos >= 0 && pos < SIZE
but just waiting for errors if it ever changes to signed :> Naturally you can't add the >= 0 case as the compiler will likely whine about a useless comparison then :>

… is valid

Signed-off-by: Nick Avramoussis <4256455+Idclip@users.noreply.github.com>
@Idclip Idclip merged commit 5e1658e into AcademySoftwareFoundation:master Jul 5, 2025
34 checks passed
@Idclip Idclip deleted the leaf_val_iters branch July 5, 2025 05:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants