[ bugfix ] Fix is_valid algorithm #3001

skykongkong8 · 2025-03-11T01:06:38Z

Previous is_valid() for half-precision inspected with f32 inf values.
Fix this to check with f16 inf and -inf values (reinterpreted from u16)
Fix missing return bug

Self evaluation:

Build test: [X]Passed [ ]Failed [ ]Skipped
Run test: [X]Passed [ ]Failed [ ]Skipped

dkjung

I have some questions.

nntrainer/tensor/cpu_backend/arm/arm_compute_backend_fp16.cpp

dkjung · 2025-03-11T06:56:04Z

nntrainer/tensor/cpu_backend/arm/neon_impl_fp16.cpp

-  uint16x8_t zero = vdupq_n_f16(0);
+  const uint16_t inf_bits = 0x7C00;
+  const uint16_t n_inf_bits = 0xFC00;
+  const uint16x8_t inf_v = vdupq_n_u16(inf_bits);


what does vdupq_n_u16 do?

It makes a SIMD register vector from a scalar.
For example,

const uint16x8_t my_vec = vdupq_n_u16(1); for ( int i = 0; i < 8; ++i) { std::cout << my_vec[i] << "\t"; }

Running this will print:

1 1 1 1 1 1 1 1

dkjung · 2025-03-11T07:17:58Z

nntrainer/tensor/cpu_backend/arm/neon_impl_fp16.cpp

-  __fp16 inf_s = std::numeric_limits<float>::infinity();
-  float16x8_t inf = vdupq_n_f16(inf_s);
-  uint16x8_t zero = vdupq_n_f16(0);
+  const uint16_t inf_bits = 0x7C00;


What does this bit stand for?

0x7C00 and 0xFC00 represent positive and negative infinity, respectively, when interpreted as IEEE 754 half-precision floating-point (__fp16).

To be specific,
IEEE 754 Half-Precision (__fp16) Format is determined like:
1 bit for the sign (S)
5 bits for the exponent (E)
10 bits for the fraction (F)

The value of a floating-point number is determined by:
$$\text{Value} = (-1)^S \times 2^{(E - 15)} \times (1.F)$$
except for special cases.

In case of 0x7C00 (Positive Infinity)

Binary representation: 0111 1100 0000 0000
Sign bit (S): 0 (positive)
Exponent (E): 11111 (all 1s)
Fraction (F): 0000000000 (all 0s)

According to IEEE 754 rules:
When E = 11111 and F = 0, the value represents infinity.
Since S = 0, it is positive infinity (+∞).

In case of 0xFC00 (Negative Infinity)

Binary representation: 1111 1100 0000 0000
Sign bit (S): 1 (negative)
Exponent (E): 11111 (all 1s)
Fraction (F): 0000000000 (all 0s)

Since E = 11111 and F = 0, the value represents infinity.
Meanwhile, S = 1, it is negative infinity (-∞).

Please let me know if I got this wrong.

- Previous is_valid() for half-precision inspected with f32 inf values. - Fix this to check with f16 inf and -inf values (reinterpreted from u16) - Fix missing return bug **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: skykongkong8 <[email protected]>

skykongkong8 · 2025-03-11T08:25:19Z

FYI) Current android unittest_tensor_fp16 is throwing segmentation fault. This patch will fix the issue.

baek2sm

LGTM

jijoongmoon

LGTM

piotrrak · 2025-03-12T20:42:29Z

nntrainer/tensor/cpu_backend/arm/neon_impl_fp16.cpp

+    if (val != val) {
+      return false;
+    }
+    uint16_t val_bits = reinterpret_cast<uint16_t &>(val);


Such reinterpret_cast<> is Undefined Behaviour.
For c++20 dialect the best way would be to use std::bit_cast

For c++17 dialect the standard blessed way to do that is to use std::memcpy

uint16_t val_bits; std::memcpy((char*)&val_bits, (const char*)&val, sizeof(__fp16));

Good recent read is std::bit_cast wg21 proposal
It doesn't give detailed explaination why using union to cast or reinterpret_cast is bad idea but gives pointer to the solution and enough insight to notice that c++ memory is typed.
std::launder proposal gives much more insight https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0532r0.pdf in context of union's

@skykongkong8 looked at recent changes related to fp16 due to two CI crashes (#3012 and #3010) in integration_test.cpp when fp16 is enabled. Most likely not the culprit, but noticed such UB nonetheless.

We probably need std::bit_cast<> polyfill for c++17 built on top of std::memcpy and audit our code for such pattern. It will get worse with compiler advancement, due to optimizes getting smarter.

char, unsigned char, c++20 std::byte are pretty special types and any memory can be accessed via pointers to those types (c++ standard explicitly blesses those types).
Please do note this list doesn't contain signed char which is distinct type from char!

Other type pointers/references don't have such behavior.
Since construction, the part of the memory gains type, up until destruction. Accessing this memory by any pointer reference to other type is illegal (Undefined Behavior).
std::memcpy method of getting __fp16 bits works since access is by char* and doing copy (it won't be slower); it'll just let optimizer know it must really do get those bits and can't break such code as per wording.
Hope it makes sense, c++ is type-safe and memory is typed.

Woah, thanks for pointing this out @piotrrak !
If I understood correctly, using reinterpret_cast, or union (cuz I also considered using union before implementing this function like this... haha) won't lead the data work like I intended.
And I also tested on my local RPi5 and desktop that when I stored a different-typed data, let's say,
Something like:

// DON'T: const uint16_t inf_bits = 0x7C00; const __fp16 f16_inf = reinterpret_cast<__fp16>(inf_bits); nntrainer::Tensor t(..., datatype::_FP16); ... t.setValue(0,0,0,0, f16_inf)

returns different bit values when reinterpreted from the original one.

Then, how about vreinterpretq_u16_f16?
Would some fixes like:

// DO: uint16_t* new_arr = new uint16_t[N]; std::memcpy((char*)new_arr , (const char*)input, N * sizeof(__fp16)); // proceed to compare without reinterpret...

sounds more proper?
+)And maybe come back here sometime when we are ready to introduce c++20 to apply std::bitcast<>?

Furthermore, guess you visited here from your concerns suggested on #3012..
Again, thanks for sharing your expertise and active participation! Really appreciate it!

Woah, thanks for pointing this out @piotrrak ! If I understood correctly, using reinterpret_cast, or union (cuz I also considered using union before implementing this function like this... haha) won't lead the data work like I intended. And I also tested on my local RPi5 and desktop that when I stored a different-typed data, let's say, Something like:

// DON'T: const uint16_t inf_bits = 0x7C00; const __fp16 f16_inf = reinterpret_cast<__fp16>(inf_bits); nntrainer::Tensor t(..., datatype::_FP16); ... t.setValue(0,0,0,0, f16_inf)

Yea, it neads memcpy, compiler will optimize but will know what it can't do.

returns different bit values when reinterpreted from the original one.

Then, how about vreinterpretq_u16_f16? Would some fixes like:

// DO: uint16_t* new_arr = new uint16_t[N]; std::memcpy((char*)new_arr , (const char*)input, N * sizeof(__fp16)); // proceed to compare without reinterpret...

vreinterpretq_u16_f16 is different beast, it has proper semantics of cast as documented, it is compiler builtin that someone spend an effort to generate proper code that optimizer won't touch;
So reinterpret in that function name doesn't have nothing to do with casting.

Please do note that is what CPU does, instructions have type, vector registers don't;
You got all types of lanes on same register;
Not true in hardware but what it is; Here the problem with such reintepret_cast is problem that language prohibits it.

sounds more proper? +)And maybe come back here sometime when we are ready to introduce c++20 to apply std::bitcast<>?

Furthermore, guess you visited here from your concerns suggested on #3012.. Again, thanks for sharing your expertise and active participation! Really appreciate it!

We really shouldn't wait for std::bit_cast; in practice many projects including our internal had it years before (even before c++17) in some utility implemented as std::memcpy because this wording (not std::byte) provision predates c++11. I do really think we should provide proper solution as helper, since you won't be only one to hit such problem and most programmers got with reinterpret_castor union

I can write such helper template tomorrow; But it will be something as https://en.cppreference.com/w/cpp/numeric/bit_cast suggests for possible implementation.

I really learned a lot! Thanks for everything! I will refer to links you attached here and there :)

C++ is tricky at times, but reading ISO committee wg21 mailing helps a lot https://www.open-std.org/jtc1/sc22/wg21/
individual papers link is publicly accessible, unfortunately we don't participate much @Samsung in those efforts.

- Followed by discussion made from nnstreamer#3001, naive reinterpret cast is an undefined behavior. - Instead, using std::memcpy will do. - Additionally, SIMD instruction for reinterpret cast will work as they way it is expected. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: skykongkong8 <[email protected]>

- Followed by discussion made from #3001, naive reinterpret cast is an undefined behavior. - Instead, using std::memcpy will do. - Additionally, SIMD instruction for reinterpret cast will work as they way it is expected. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: skykongkong8 <[email protected]>

skykongkong8 requested review from DonghakPark, EunjuYang, SeoHyungjun, again4you, anyj0527, baek2sm, djeong20, dkjung, gichan-jang, haehun, jaeyun-jung, jihochu, jijoongmoon, leemgs, lhs8928, myungjoo, songgot and wooksong as code owners March 11, 2025 01:06

github-actions bot added the Need Review label Mar 11, 2025

DonghakPark approved these changes Mar 11, 2025

View reviewed changes

dkjung reviewed Mar 11, 2025

View reviewed changes

skykongkong8 force-pushed the pr/arm/missing_return_bug branch from cbc3426 to 5fc6413 Compare March 11, 2025 08:12

baek2sm approved these changes Mar 12, 2025

View reviewed changes

jijoongmoon approved these changes Mar 12, 2025

View reviewed changes

github-actions bot added PR/READY2MERGE and removed Need Review labels Mar 12, 2025

jijoongmoon merged commit efdeb10 into nnstreamer:main Mar 12, 2025
18 of 19 checks passed

skykongkong8 deleted the pr/arm/missing_return_bug branch March 12, 2025 07:28

piotrrak reviewed Mar 12, 2025

View reviewed changes

skykongkong8 mentioned this pull request Mar 21, 2025

[ bugfix ] Fix fp16 is_valid algorithm bug #3046

Merged

[ bugfix ] Fix is_valid algorithm #3001

[ bugfix ] Fix is_valid algorithm #3001

Uh oh!

Conversation

skykongkong8 commented Mar 11, 2025

Uh oh!

dkjung left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dkjung Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

skykongkong8 Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkjung Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

skykongkong8 Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skykongkong8 commented Mar 11, 2025

Uh oh!

baek2sm left a comment

Choose a reason for hiding this comment

Uh oh!

jijoongmoon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

piotrrak Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

piotrrak Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skykongkong8 Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

piotrrak Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

skykongkong8 Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

piotrrak Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

skykongkong8 Mar 11, 2025 •

edited

Loading

skykongkong8 Mar 11, 2025 •

edited

Loading

piotrrak Mar 12, 2025 •

edited

Loading

piotrrak Mar 12, 2025 •

edited

Loading

skykongkong8 Mar 13, 2025 •

edited

Loading