Improved performance of LeafNode ValueIters#2026
Improved performance of LeafNode ValueIters#2026Idclip merged 3 commits intoAcademySoftwareFoundation:masterfrom
Conversation
Signed-off-by: Nick Avramoussis <4256455+Idclip@users.noreply.github.com>
…es vectorization misses with clang Signed-off-by: Nick Avramoussis <4256455+Idclip@users.noreply.github.com>
danrbailey
left a comment
There was a problem hiding this comment.
This looks great! Could this possibly affect performance negatively when the iterator does not evaluate any values? For example ValueOnIterator with a leaf node where all values are inactive or ValueOffIterator with a leaf node where all values are active. I'm wondering if you pay an additional cost loading the buffer to memory in those cases where you would not otherwise.
| } | ||
| else { | ||
| OPENVDB_ASSERT(pos < SIZE); | ||
| OPENVDB_ASSUME(pos < SIZE); |
There was a problem hiding this comment.
Interested to know more about this. Is this related to the fact that pos is unsigned and thus can wrap around?
There was a problem hiding this comment.
Yeah, one reason I dislike unsigned numbers is you an do these one-sided comparisons as a negative will be very large.... SO this is the equivalent of
pos >= 0 && pos < SIZE
but just waiting for errors if it ever changes to signed :> Naturally you can't add the >= 0 case as the compiler will likely whine about a useless comparison then :>
| #elif defined(__GNUC__) | ||
| #if __GNUC__ >= 13 | ||
| #define OPENVDB_ASSUME(...) __attribute__((__assume__(__VA_ARGS__))) | ||
| #endif |
There was a problem hiding this comment.
How substantial is this assume attribute for the performance improvement of ValueAccessors? GCC13 is still not even in the VFX reference platform. Would something like this possibly be a viable alternative until then:
#define OPENVDB_ASSUME(...) if (!(__VA_ARGS__)) { __builtin_unreachable(); }
| // Unlike other value iterators, cache the buffer data as part of | ||
| // the iterators members to avoid the cost of going through the | ||
| // leaf buffer atomic/checking API | ||
| , mData(parent->buffer().data()) {} |
There was a problem hiding this comment.
One thing I'm a little cautious of is that there's an uncommon workflow where you can call LeafBuffer::deallocate() and it deletes the data then calling any of the access methods on the LeafBuffer for example will still check the data pointer before attempting to dereference it. I think this version does not do that?
There was a problem hiding this comment.
From what I can see, this provides identical semantics to the old style iterator. There is no way to blat the underlying leaf buffer and leave the iterator in an invalid state, unless you invoke the buffer destructor - that is, so long as the LeafBuffer class remains in memory, the access will continue to be valid.
For example the below should behave identically with the old and new behaviour:
auto iter = [&]()
{
openvdb::FloatGrid::TreeType::LeafNodeType a;
a.fill(1);
auto v = a.beginValueAll();
a = openvdb::FloatGrid::TreeType::LeafNodeType();
std::cerr << v.getValue() << std::endl; // prints 0
openvdb::FloatGrid::TreeType::LeafNodeType::Buffer b;
b.fill(1);
a.swap(b);
std::cerr << v.getValue() << std::endl; // still points to a's buffer, prints 0
return v;
}();
std::cerr << iter.getValue() << std::endl; //UB, underlying buffer destroyed
There was a problem hiding this comment.
Actually I've just realised this does behave differently with swap:
openvdb::FloatGrid::TreeType::LeafNodeType a;
std::cerr << v.getValue() << std::endl; // prints 0
openvdb::FloatGrid::TreeType::LeafNodeType::Buffer b;
b.fill(1);
a.swap(b);
// prints 1 with old impl, prints 0 with new impl
std::cerr << v.getValue() << std::endl;
tbh I don't see this is an issue - I also don't think there's a way to solve this and keep the exact same performance benefits as this PR unless we add friend methods to the leaf buffer or change how swap works. And FWIW, the value accessors will most likely have the same problem.
|
I've thoroughly tested this with GCC 11 and GCC 13 today. I found substantial improvements in performance of around 2x using both compilers, this is fantastic! However, I couldn't detect a difference in performance when using the assume optimization on either compiler, possibly I have some compiler flags that conflict or something about the environment I am testing in. It may be more beneficial on clang perhaps. I also confirmed what you said last time that the potential deallocation issue is the same as the current implementation, so no need to discuss that any further. I did notice one thing that I wanted to highlight. The iterators are currently default initialized or initialized with a leaf node pointer. If you default initialize or initialize with a nullptr instead of a leaf node pointer, it now attempts to deference the nullptr whereas in the previous implementation, it would call I also wanted to mention that I found a bit more performance on both compilers by breaking the leaf node iterator logic out into a separate header ( |
To get the optimizer to trigger in the way I wanted was very temperamental. I also experimented splitting out the
Thoughts on the impl of swap()? I could deprecate this and introduce an unsafeSwap
Can definitely add an assert. I can also deprecate that constructor and introduce once that takes a reference. Ideally want to remove the default constructor as well, but not 100% sure of the consequences of that yet.
Did you try adding |
I don't think that's necessary. It's only different in a very niche case when you have a value iterator pointing at a leaf and you do a swap on the leaf and then try and access the value iterator. It's very common to expect the iterator itself to be left in an invalid state if you change the underlying data in any way, so I don't think we need to do anything special to handle this change in behavior.
Yeah, the assert is better than nothing, would be good to add that in.
No, I didn't try that, but we should definitely put final anywhere that we can. I'm going to approve regardless, it would be nice to add that assert, but that's a minor comment, it looks good to me! Thanks for doing this. I have some other performance improvements I wanted to push up, I suppose I should re-run my benchmarks to see if they are still faster after this change. :) |
| } | ||
| else { | ||
| OPENVDB_ASSERT(pos < SIZE); | ||
| OPENVDB_ASSUME(pos < SIZE); |
There was a problem hiding this comment.
Yeah, one reason I dislike unsigned numbers is you an do these one-sided comparisons as a negative will be very large.... SO this is the equivalent of
pos >= 0 && pos < SIZE
but just waiting for errors if it ever changes to signed :> Naturally you can't add the >= 0 case as the compiler will likely whine about a useless comparison then :>
… is valid Signed-off-by: Nick Avramoussis <4256455+Idclip@users.noreply.github.com>
No description provided.