-
Notifications
You must be signed in to change notification settings - Fork 94
[ bugfix ] Fix is_valid algorithm #3001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ bugfix ] Fix is_valid algorithm #3001
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some questions.
| uint16x8_t zero = vdupq_n_f16(0); | ||
| const uint16_t inf_bits = 0x7C00; | ||
| const uint16_t n_inf_bits = 0xFC00; | ||
| const uint16x8_t inf_v = vdupq_n_u16(inf_bits); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does vdupq_n_u16 do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes a SIMD register vector from a scalar.
For example,
const uint16x8_t my_vec = vdupq_n_u16(1);
for ( int i = 0; i < 8; ++i) {
std::cout << my_vec[i] << "\t";
}Running this will print:
1 1 1 1 1 1 1 1 | __fp16 inf_s = std::numeric_limits<float>::infinity(); | ||
| float16x8_t inf = vdupq_n_f16(inf_s); | ||
| uint16x8_t zero = vdupq_n_f16(0); | ||
| const uint16_t inf_bits = 0x7C00; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this bit stand for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0x7C00 and 0xFC00 represent positive and negative infinity, respectively, when interpreted as IEEE 754 half-precision floating-point (__fp16).
To be specific,
IEEE 754 Half-Precision (__fp16) Format is determined like:
1 bit for the sign (S)
5 bits for the exponent (E)
10 bits for the fraction (F)
The value of a floating-point number is determined by:
except for special cases.
In case of 0x7C00 (Positive Infinity)
Binary representation: 0111 1100 0000 0000
Sign bit (S): 0 (positive)
Exponent (E): 11111 (all 1s)
Fraction (F): 0000000000 (all 0s)
According to IEEE 754 rules:
When E = 11111 and F = 0, the value represents infinity.
Since S = 0, it is positive infinity (+∞).
In case of 0xFC00 (Negative Infinity)
Binary representation: 1111 1100 0000 0000
Sign bit (S): 1 (negative)
Exponent (E): 11111 (all 1s)
Fraction (F): 0000000000 (all 0s)
Since E = 11111 and F = 0, the value represents infinity.
Meanwhile, S = 1, it is negative infinity (-∞).
Please let me know if I got this wrong.
- Previous is_valid() for half-precision inspected with f32 inf values. - Fix this to check with f16 inf and -inf values (reinterpreted from u16) - Fix missing return bug **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: skykongkong8 <[email protected]>
cbc3426 to
5fc6413
Compare
|
FYI) Current android |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| if (val != val) { | ||
| return false; | ||
| } | ||
| uint16_t val_bits = reinterpret_cast<uint16_t &>(val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such reinterpret_cast<> is Undefined Behaviour.
For c++20 dialect the best way would be to use std::bit_cast
For c++17 dialect the standard blessed way to do that is to use std::memcpy
uint16_t val_bits;
std::memcpy((char*)&val_bits, (const char*)&val, sizeof(__fp16));Good recent read is std::bit_cast wg21 proposal
It doesn't give detailed explaination why using union to cast or reinterpret_cast is bad idea but gives pointer to the solution and enough insight to notice that c++ memory is typed.
std::launder proposal gives much more insight https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0532r0.pdf in context of union's
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skykongkong8 looked at recent changes related to fp16 due to two CI crashes (#3012 and #3010) in integration_test.cpp when fp16 is enabled. Most likely not the culprit, but noticed such UB nonetheless.
We probably need std::bit_cast<> polyfill for c++17 built on top of std::memcpy and audit our code for such pattern. It will get worse with compiler advancement, due to optimizes getting smarter.
char, unsigned char, c++20 std::byte are pretty special types and any memory can be accessed via pointers to those types (c++ standard explicitly blesses those types).
Please do note this list doesn't contain signed char which is distinct type from char!
Other type pointers/references don't have such behavior.
Since construction, the part of the memory gains type, up until destruction. Accessing this memory by any pointer reference to other type is illegal (Undefined Behavior).
std::memcpy method of getting __fp16 bits works since access is by char* and doing copy (it won't be slower); it'll just let optimizer know it must really do get those bits and can't break such code as per wording.
Hope it makes sense, c++ is type-safe and memory is typed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woah, thanks for pointing this out @piotrrak !
If I understood correctly, using reinterpret_cast, or union (cuz I also considered using union before implementing this function like this... haha) won't lead the data work like I intended.
And I also tested on my local RPi5 and desktop that when I stored a different-typed data, let's say,
Something like:
// DON'T:
const uint16_t inf_bits = 0x7C00;
const __fp16 f16_inf = reinterpret_cast<__fp16>(inf_bits);
nntrainer::Tensor t(..., datatype::_FP16);
...
t.setValue(0,0,0,0, f16_inf)returns different bit values when reinterpreted from the original one.
Then, how about vreinterpretq_u16_f16?
Would some fixes like:
// DO:
uint16_t* new_arr = new uint16_t[N];
std::memcpy((char*)new_arr , (const char*)input, N * sizeof(__fp16));
// proceed to compare without reinterpret...sounds more proper?
+)And maybe come back here sometime when we are ready to introduce c++20 to apply std::bitcast<>?
Furthermore, guess you visited here from your concerns suggested on #3012..
Again, thanks for sharing your expertise and active participation! Really appreciate it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woah, thanks for pointing this out @piotrrak ! If I understood correctly, using
reinterpret_cast, orunion(cuz I also considered using union before implementing this function like this... haha) won't lead the data work like I intended. And I also tested on my local RPi5 and desktop that when I stored a different-typed data, let's say, Something like:// DON'T: const uint16_t inf_bits = 0x7C00; const __fp16 f16_inf = reinterpret_cast<__fp16>(inf_bits); nntrainer::Tensor t(..., datatype::_FP16); ... t.setValue(0,0,0,0, f16_inf)
Yea, it neads memcpy, compiler will optimize but will know what it can't do.
returns different bit values when reinterpreted from the original one.
Then, how about
vreinterpretq_u16_f16? Would some fixes like:// DO: uint16_t* new_arr = new uint16_t[N]; std::memcpy((char*)new_arr , (const char*)input, N * sizeof(__fp16)); // proceed to compare without reinterpret...
vreinterpretq_u16_f16 is different beast, it has proper semantics of cast as documented, it is compiler builtin that someone spend an effort to generate proper code that optimizer won't touch;
So reinterpret in that function name doesn't have nothing to do with casting.
Please do note that is what CPU does, instructions have type, vector registers don't;
You got all types of lanes on same register;
Not true in hardware but what it is; Here the problem with such reintepret_cast is problem that language prohibits it.
sounds more proper? +)And maybe come back here sometime when we are ready to introduce c++20 to apply
std::bitcast<>?Furthermore, guess you visited here from your concerns suggested on #3012.. Again, thanks for sharing your expertise and active participation! Really appreciate it!
We really shouldn't wait for std::bit_cast; in practice many projects including our internal had it years before (even before c++17) in some utility implemented as std::memcpy because this wording (not std::byte) provision predates c++11. I do really think we should provide proper solution as helper, since you won't be only one to hit such problem and most programmers got with reinterpret_castor union
I can write such helper template tomorrow; But it will be something as https://en.cppreference.com/w/cpp/numeric/bit_cast suggests for possible implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really learned a lot! Thanks for everything! I will refer to links you attached here and there :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++ is tricky at times, but reading ISO committee wg21 mailing helps a lot https://www.open-std.org/jtc1/sc22/wg21/
individual papers link is publicly accessible, unfortunately we don't participate much @Samsung in those efforts.
- Followed by discussion made from nnstreamer#3001, naive reinterpret cast is an undefined behavior. - Instead, using std::memcpy will do. - Additionally, SIMD instruction for reinterpret cast will work as they way it is expected. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: skykongkong8 <[email protected]>
- Followed by discussion made from nnstreamer#3001, naive reinterpret cast is an undefined behavior. - Instead, using std::memcpy will do. - Additionally, SIMD instruction for reinterpret cast will work as they way it is expected. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: skykongkong8 <[email protected]>
- Followed by discussion made from #3001, naive reinterpret cast is an undefined behavior. - Instead, using std::memcpy will do. - Additionally, SIMD instruction for reinterpret cast will work as they way it is expected. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: skykongkong8 <[email protected]>
Self evaluation: