You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Refactor and enhance floating-point numbers article
Updated formatting and structure of the article on floating-point numbers, including sections on IEEE 754, infinity, and NaN. Improved clarity and organization of content.
Copy file name to clipboardExpand all lines: src/content/posts/The-Magic-World-of-Numbers-in-Computers.md
+35-20Lines changed: 35 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ description: The Magic World of Float Numbers
7
7
tags: []
8
8
---
9
9
10
-
##Intro
10
+
# Intro
11
11
12
12
Floats are exciting. We often assume that numerical results from computers are extremely accurate, but that’s frequently not the case. This happens because there are accurate and non-accurate ways to represent numbers. Accurate representations consume more storage, memory, and processing power compared to non-accurate ones. As a result, non-accurate forms - like floats - are often used in places you might not expect.
13
13
@@ -23,23 +23,26 @@ The latter is much simpler, but floats add a layer of complexity. First, there i
23
23
24
24
# IEEE 754
25
25
26
-
IEEE 754 breaks every floating-point number into three distinct parts, packed into either 32 bits (single-precision) or 64 bits (double-precision):
27
-
- Sign Bit (1 bit): A single bit that decides whether the number is positive (0) or negative (1).
28
-
- Biased Exponent (8 bits for 32-bit, 11 bits for 64-bit): This isn't just any exponent. It's biased, meaning it's offset by a fixed value to allow for both positive and negative powers of 2. For 32-bit floats, the bias is 127. So, an exponent of 0 is stored as 127, -1 as 126, and 127 as 254. This trick lets the hardware handle negative exponents without needing a sign bit for the exponent itself. The base is 2 because computers works with binary, so in the scientific notation example we saw before would had been 1*2^-43 (2^-43 = 10^-13).
29
-
- Mantissa (23 bits for 32-bit, 52 bits for 64-bit): This stores the significant digits of the number, but the leading 1 is implied and not stored. For example, the binary number 1.0110 is stored as just 0110, saving space but adding complexity.
26
+
As we said before, there isn't only one standard for storing floats, but many. IEEE 754 is by far the most common. It breaks every floating-point number into three distinct parts, packed into either 32 bits (single-precision) or 64 bits (double-precision):
27
+
28
+
-**Sign Bit (1 bit)**: A single bit that decides whether the number is positive (0) or negative (1).
29
+
-**Biased Exponent** (8 bits for 32-bit, 11 bits for 64-bit): This isn't just any exponent. It's biased, meaning it's offset by a fixed value to allow for both positive and negative powers of 2. For 32-bit floats, the bias is 127. So, an exponent of 0 is stored as 127, -1 as 126, and 127 as 254. This trick lets the hardware handle negative exponents without needing a sign bit for the exponent itself. The base is 2 because computers works with binary, so in the scientific notation example we saw before would had been 1*2^-43 (2^-43 = 10^-13).
30
+
-**Mantissa** (23 bits for 32-bit, 52 bits for 64-bit): This stores the significant digits of the number, but the leading 1 is implied and not stored. For example, the binary number 1.0110 is stored as just 0110, saving space but adding complexity.
31
+
32
+
## IEEE 754 Example
30
33
31
34
A nice example would beeing calculating 3.14... in IEEE 754.
32
35
33
-
1. Binary
36
+
### I. Binary
34
37
35
38
- 3 in binary is `11`
36
39
- 0.14... in binary is `0.0010101111010111000...`
37
40
38
-
3.14 in binary is `11.0010101111010111000...`
41
+
-3.14 in binary is `11.0010101111010111000...`
39
42
40
-
##I. Biased Exponent
43
+
### II. Biased Exponent
41
44
42
-
###I.I. Exponent
45
+
#### II.I. Exponent
43
46
44
47
First, we need to normalize `11.0010101111010111000...` to the form `1.xxx * 2^exponent`. Much like converting to scientific notation.
45
48
@@ -53,7 +56,7 @@ In this case, we will do this simply by doing a binary shift, but this is just a
53
56
54
57
By converting to `1.xxx * 2^exponent` form we see that our exponent is 1.
55
58
56
-
###I.II. Biased Exponent
59
+
#### II.II. Biased Exponent
57
60
58
61
Our exponent is 1. But don't forget, as we said before, that in IEEE 754 we dont store the exponent, but a bias of the exponent ([...]This isn't just any exponent. It's biased, meaning it's offset by a fixed value to allow for both positive and negative powers of 2.[...]).
59
62
@@ -63,13 +66,13 @@ So the number we will store is : the exponent + 127 => 1 + 127 = 128.
63
66
64
67
But we want this in binary. 128 in binary is `10000000`. That's our "Biased Exponent".
65
68
66
-
##II. Split into IEEE 754 Parts
69
+
### III. Split into IEEE 754 Parts
67
70
68
71
- Sign: `0` (positive)
69
72
- Biased Exponent: `10000000`
70
73
- Mantissa: Take the first 23 bits after the 1.: `10010101111010111000101` (truncated to fit)
71
74
72
-
##III. Result
75
+
### IV. Result
73
76
74
77
Indeed, `0 10000000 10010101111010111000101` is the IEEE 754 of the number 3.14... You can verify this by using a handy IEEE 754 calculator [1].
75
78
@@ -97,33 +100,45 @@ That's an interesting thing to know for the next time you will do mathematical o
97
100
98
101
---
99
102
100
-
# Inifinity and "Not a Number" (IEEE 754)
103
+
# Inifinity and "Not a Number" (NaN) (IEEE 754)
101
104
102
105
Two very interesting mathematical concepts that are impossible to represent in any other... accurate form of numbers, are infinity and "not a number". A number can be infinite, and a number can be "not a number" temporarily, until it becomes a number, or simply because it's the result of a calculation like division with 0 - instead of error it's better to take "NaN" sometimes.
103
106
104
107
So IEEE 754 float numbers comes here to save the day if you need to represent those 2 very important mathematical concepts.
105
108
106
-
### Inifinity
109
+
## Inifinity
110
+
107
111
If you raise all the numbers of the exponent and none of the mantissa, you get infinity ! (`0 11111111 00000000000000000000000`)
112
+
113
+
## "Not a Number" (NaN)
114
+
115
+
"Not a Number" is a special case of number meant as a placeholder value for a numerical value that it's nit set yet or it was the result of an error, like for example from a division with 0 which is impossible.
116
+
117
+
(Talking about NaN, its actually very interesting that JSON, that it's a representation of values in a way that JS works, doesn't support NaN. JS, as many other languahes, treats numbers by default as floats, so it's interesting that JSON doesn't allow a number to be NaN. Instead, NaNs are usually converted to NULLs on most of the times when working with JSON.)
118
+
108
119
### "Quiet NaN"
120
+
109
121
If you raise all the numbers of the exponent and the "most significant bit" of the mantissa (the first or the last, depends on the endianess), you get a "Quiet NaN". This NaN is natural and it doesnt trigger an error. (`0 11111111 00000000000000000000001` / `0 11111111 10000000000000000000000`).
122
+
110
123
### "Signaling NaN"
111
-
- If you raise all the numbers of the exponent and a bit that's not the "most significant bit" of the mantissa (like, just one in the middle, just to be sure), you get a "Signaling NaN". This NaN triggers an error. (`0 11111111 00000000000000000000010`).
112
124
113
-
( All this actually makes me curious why JSON doesn't support NaN, as numbers are treated as 64-bit floats in JS, where JSON comes from. )
125
+
- If you raise all the numbers of the exponent and a bit that's not the "most significant bit" of the mantissa (like, just one in the middle, just to be sure), you get a "Signaling NaN". This NaN triggers an error. (`0 11111111 00000000000000000000010`).
114
126
115
127
---
116
128
117
-
##Outro
129
+
# Outro
118
130
119
131
Floats may seem inaccurate, but this inaccuracy allows to handle bigger problems, as with huge numbers or with less hardware.
120
132
121
-
Extreme accuracy is not always needed, let alone that it may not even exist in reallity. A way to achive extreme accuracy, is actually to acknowledge that it's very subjective, so you limit the set. This was done eventaully at the Android calculator. [2]
133
+
Extreme accuracy is not always needed, let alone that it may not even exist in reallity.
134
+
135
+
Even if someone is interested only in extreme accuracy, floats are interesting to search about. By studying floats, one can reflect and understand the problem they try to solve and this is working with very big ranges, very unclear sets of numbers. Easily comes out of this the opposite, that a way to achive extreme accuracy, is actually to acknowledge that it's very subjective, so you eventually have to limit the sets you are working with to something fixed.
122
136
123
-
This arcticle was meant to unlock the secret world of float numbers in order for us to use them better.
137
+
This post was meant to unlock the secret world of float numbers in order for us to use them better.
124
138
125
139
---
126
140
127
-
## Links
141
+
# Links
142
+
128
143
-[1] "IEEE-754 Floating Point Converter" [https://www.h-schmidt.net/FloatConverter/IEEE754.html](https://www.h-schmidt.net/FloatConverter/IEEE754.html)
129
144
-[2] "A calculator app? Anyone could make that." [https://chadnauseam.com/coding/random/calculator-app](https://chadnauseam.com/coding/random/calculator-app)
0 commit comments