Skip to content

Conversation

@skykongkong8
Copy link
Member

  • Previous is_valid() for half-precision inspected with f32 inf values.
  • Fix this to check with f16 inf and -inf values (reinterpreted from u16)
  • Fix missing return bug

Self evaluation:

  1. Build test: [X]Passed [ ]Failed [ ]Skipped
  2. Run test: [X]Passed [ ]Failed [ ]Skipped

Copy link
Collaborator

@dkjung dkjung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some questions.

uint16x8_t zero = vdupq_n_f16(0);
const uint16_t inf_bits = 0x7C00;
const uint16_t n_inf_bits = 0xFC00;
const uint16x8_t inf_v = vdupq_n_u16(inf_bits);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does vdupq_n_u16 do?

Copy link
Member Author

@skykongkong8 skykongkong8 Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes a SIMD register vector from a scalar.
For example,

const uint16x8_t my_vec = vdupq_n_u16(1);
for ( int i = 0; i < 8; ++i) {
  std::cout << my_vec[i] << "\t";
}

Running this will print:

1 1 1 1 1 1 1 1 

__fp16 inf_s = std::numeric_limits<float>::infinity();
float16x8_t inf = vdupq_n_f16(inf_s);
uint16x8_t zero = vdupq_n_f16(0);
const uint16_t inf_bits = 0x7C00;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this bit stand for?

Copy link
Member Author

@skykongkong8 skykongkong8 Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0x7C00 and 0xFC00 represent positive and negative infinity, respectively, when interpreted as IEEE 754 half-precision floating-point (__fp16).

To be specific,
IEEE 754 Half-Precision (__fp16) Format is determined like:
1 bit for the sign (S)
5 bits for the exponent (E)
10 bits for the fraction (F)

The value of a floating-point number is determined by:

$$\text{Value} = (-1)^S \times 2^{(E - 15)} \times (1.F)$$

except for special cases.

In case of 0x7C00 (Positive Infinity)

Binary representation: 0111 1100 0000 0000
Sign bit (S): 0 (positive)
Exponent (E): 11111 (all 1s)
Fraction (F): 0000000000 (all 0s)

According to IEEE 754 rules:
When E = 11111 and F = 0, the value represents infinity.
Since S = 0, it is positive infinity (+∞).

In case of 0xFC00 (Negative Infinity)

Binary representation: 1111 1100 0000 0000
Sign bit (S): 1 (negative)
Exponent (E): 11111 (all 1s)
Fraction (F): 0000000000 (all 0s)

Since E = 11111 and F = 0, the value represents infinity.
Meanwhile, S = 1, it is negative infinity (-∞).

Please let me know if I got this wrong.

- Previous is_valid() for half-precision inspected with f32 inf values.
- Fix this to check with f16 inf and -inf values (reinterpreted from u16)
- Fix missing return bug

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <[email protected]>
@skykongkong8 skykongkong8 force-pushed the pr/arm/missing_return_bug branch from cbc3426 to 5fc6413 Compare March 11, 2025 08:12
@skykongkong8
Copy link
Member Author

FYI) Current android unittest_tensor_fp16 is throwing segmentation fault. This patch will fix the issue.

Copy link
Contributor

@baek2sm baek2sm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@jijoongmoon jijoongmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jijoongmoon jijoongmoon merged commit efdeb10 into nnstreamer:main Mar 12, 2025
18 of 19 checks passed
@skykongkong8 skykongkong8 deleted the pr/arm/missing_return_bug branch March 12, 2025 07:28
if (val != val) {
return false;
}
uint16_t val_bits = reinterpret_cast<uint16_t &>(val);
Copy link

@piotrrak piotrrak Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such reinterpret_cast<> is Undefined Behaviour.
For c++20 dialect the best way would be to use std::bit_cast

For c++17 dialect the standard blessed way to do that is to use std::memcpy

uint16_t val_bits;
std::memcpy((char*)&val_bits, (const char*)&val, sizeof(__fp16));

Good recent read is std::bit_cast wg21 proposal
It doesn't give detailed explaination why using union to cast or reinterpret_cast is bad idea but gives pointer to the solution and enough insight to notice that c++ memory is typed.
std::launder proposal gives much more insight https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0532r0.pdf in context of union's

Copy link

@piotrrak piotrrak Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skykongkong8 looked at recent changes related to fp16 due to two CI crashes (#3012 and #3010) in integration_test.cpp when fp16 is enabled. Most likely not the culprit, but noticed such UB nonetheless.

We probably need std::bit_cast<> polyfill for c++17 built on top of std::memcpy and audit our code for such pattern. It will get worse with compiler advancement, due to optimizes getting smarter.

char, unsigned char, c++20 std::byte are pretty special types and any memory can be accessed via pointers to those types (c++ standard explicitly blesses those types).
Please do note this list doesn't contain signed char which is distinct type from char!

Other type pointers/references don't have such behavior.
Since construction, the part of the memory gains type, up until destruction. Accessing this memory by any pointer reference to other type is illegal (Undefined Behavior).
std::memcpy method of getting __fp16 bits works since access is by char* and doing copy (it won't be slower); it'll just let optimizer know it must really do get those bits and can't break such code as per wording.
Hope it makes sense, c++ is type-safe and memory is typed.

Copy link
Member Author

@skykongkong8 skykongkong8 Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah, thanks for pointing this out @piotrrak !
If I understood correctly, using reinterpret_cast, or union (cuz I also considered using union before implementing this function like this... haha) won't lead the data work like I intended.
And I also tested on my local RPi5 and desktop that when I stored a different-typed data, let's say,
Something like:

// DON'T:
const uint16_t inf_bits = 0x7C00;
const __fp16 f16_inf = reinterpret_cast<__fp16>(inf_bits);
nntrainer::Tensor t(..., datatype::_FP16);
...
t.setValue(0,0,0,0, f16_inf)

returns different bit values when reinterpreted from the original one.

Then, how about vreinterpretq_u16_f16?
Would some fixes like:

// DO:
uint16_t* new_arr = new uint16_t[N];
std::memcpy((char*)new_arr , (const char*)input, N * sizeof(__fp16));
// proceed to compare without reinterpret...

sounds more proper?
+)And maybe come back here sometime when we are ready to introduce c++20 to apply std::bitcast<>?

Furthermore, guess you visited here from your concerns suggested on #3012..
Again, thanks for sharing your expertise and active participation! Really appreciate it!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah, thanks for pointing this out @piotrrak ! If I understood correctly, using reinterpret_cast, or union (cuz I also considered using union before implementing this function like this... haha) won't lead the data work like I intended. And I also tested on my local RPi5 and desktop that when I stored a different-typed data, let's say, Something like:

// DON'T:
const uint16_t inf_bits = 0x7C00;
const __fp16 f16_inf = reinterpret_cast<__fp16>(inf_bits);
nntrainer::Tensor t(..., datatype::_FP16);
...
t.setValue(0,0,0,0, f16_inf)

Yea, it neads memcpy, compiler will optimize but will know what it can't do.

returns different bit values when reinterpreted from the original one.

Then, how about vreinterpretq_u16_f16? Would some fixes like:

// DO:
uint16_t* new_arr = new uint16_t[N];
std::memcpy((char*)new_arr , (const char*)input, N * sizeof(__fp16));
// proceed to compare without reinterpret...

vreinterpretq_u16_f16 is different beast, it has proper semantics of cast as documented, it is compiler builtin that someone spend an effort to generate proper code that optimizer won't touch;
So reinterpret in that function name doesn't have nothing to do with casting.

Please do note that is what CPU does, instructions have type, vector registers don't;
You got all types of lanes on same register;
Not true in hardware but what it is; Here the problem with such reintepret_cast is problem that language prohibits it.

sounds more proper? +)And maybe come back here sometime when we are ready to introduce c++20 to apply std::bitcast<>?

Furthermore, guess you visited here from your concerns suggested on #3012.. Again, thanks for sharing your expertise and active participation! Really appreciate it!

We really shouldn't wait for std::bit_cast; in practice many projects including our internal had it years before (even before c++17) in some utility implemented as std::memcpy because this wording (not std::byte) provision predates c++11. I do really think we should provide proper solution as helper, since you won't be only one to hit such problem and most programmers got with reinterpret_castor union

I can write such helper template tomorrow; But it will be something as https://en.cppreference.com/w/cpp/numeric/bit_cast suggests for possible implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really learned a lot! Thanks for everything! I will refer to links you attached here and there :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ is tricky at times, but reading ISO committee wg21 mailing helps a lot https://www.open-std.org/jtc1/sc22/wg21/
individual papers link is publicly accessible, unfortunately we don't participate much @Samsung in those efforts.

skykongkong8 added a commit to skykongkong8/nntrainer that referenced this pull request Mar 21, 2025
- Followed by discussion made from nnstreamer#3001, naive reinterpret cast is an undefined behavior.
- Instead, using std::memcpy will do.
- Additionally, SIMD instruction for reinterpret cast will work as they way it is expected.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <[email protected]>
skykongkong8 added a commit to skykongkong8/nntrainer that referenced this pull request Mar 21, 2025
- Followed by discussion made from nnstreamer#3001, naive reinterpret cast is an undefined behavior.
- Instead, using std::memcpy will do.
- Additionally, SIMD instruction for reinterpret cast will work as they way it is expected.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <[email protected]>
jijoongmoon pushed a commit that referenced this pull request Mar 24, 2025
- Followed by discussion made from #3001, naive reinterpret cast is an undefined behavior.
- Instead, using std::memcpy will do.
- Additionally, SIMD instruction for reinterpret cast will work as they way it is expected.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants