Skip to content

Commit 0fa7e30

Browse files
authored
Refactor and enhance floating-point numbers article
Updated formatting and structure of the article on floating-point numbers, including sections on IEEE 754, infinity, and NaN. Improved clarity and organization of content.
1 parent 673cddf commit 0fa7e30

1 file changed

Lines changed: 35 additions & 20 deletions

File tree

src/content/posts/The-Magic-World-of-Numbers-in-Computers.md

Lines changed: 35 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ description: The Magic World of Float Numbers
77
tags: []
88
---
99

10-
## Intro
10+
# Intro
1111

1212
Floats are exciting. We often assume that numerical results from computers are extremely accurate, but that’s frequently not the case. This happens because there are accurate and non-accurate ways to represent numbers. Accurate representations consume more storage, memory, and processing power compared to non-accurate ones. As a result, non-accurate forms - like floats - are often used in places you might not expect.
1313

@@ -23,23 +23,26 @@ The latter is much simpler, but floats add a layer of complexity. First, there i
2323

2424
# IEEE 754
2525

26-
IEEE 754 breaks every floating-point number into three distinct parts, packed into either 32 bits (single-precision) or 64 bits (double-precision):
27-
- Sign Bit (1 bit): A single bit that decides whether the number is positive (0) or negative (1).
28-
- Biased Exponent (8 bits for 32-bit, 11 bits for 64-bit): This isn't just any exponent. It's biased, meaning it's offset by a fixed value to allow for both positive and negative powers of 2. For 32-bit floats, the bias is 127. So, an exponent of 0 is stored as 127, -1 as 126, and 127 as 254. This trick lets the hardware handle negative exponents without needing a sign bit for the exponent itself. The base is 2 because computers works with binary, so in the scientific notation example we saw before would had been 1*2^-43 (2^-43 = 10^-13).
29-
- Mantissa (23 bits for 32-bit, 52 bits for 64-bit): This stores the significant digits of the number, but the leading 1 is implied and not stored. For example, the binary number 1.0110 is stored as just 0110, saving space but adding complexity.
26+
As we said before, there isn't only one standard for storing floats, but many. IEEE 754 is by far the most common. It breaks every floating-point number into three distinct parts, packed into either 32 bits (single-precision) or 64 bits (double-precision):
27+
28+
- **Sign Bit (1 bit)**: A single bit that decides whether the number is positive (0) or negative (1).
29+
- **Biased Exponent** (8 bits for 32-bit, 11 bits for 64-bit): This isn't just any exponent. It's biased, meaning it's offset by a fixed value to allow for both positive and negative powers of 2. For 32-bit floats, the bias is 127. So, an exponent of 0 is stored as 127, -1 as 126, and 127 as 254. This trick lets the hardware handle negative exponents without needing a sign bit for the exponent itself. The base is 2 because computers works with binary, so in the scientific notation example we saw before would had been 1*2^-43 (2^-43 = 10^-13).
30+
- **Mantissa** (23 bits for 32-bit, 52 bits for 64-bit): This stores the significant digits of the number, but the leading 1 is implied and not stored. For example, the binary number 1.0110 is stored as just 0110, saving space but adding complexity.
31+
32+
## IEEE 754 Example
3033

3134
A nice example would beeing calculating 3.14... in IEEE 754.
3235

33-
1. Binary
36+
### I. Binary
3437

3538
- 3 in binary is `11`
3639
- 0.14... in binary is `0.0010101111010111000...`
3740

38-
3.14 in binary is `11.0010101111010111000...`
41+
- 3.14 in binary is `11.0010101111010111000...`
3942

40-
## I. Biased Exponent
43+
### II. Biased Exponent
4144

42-
### I.I. Exponent
45+
#### II.I. Exponent
4346

4447
First, we need to normalize `11.0010101111010111000...` to the form `1.xxx * 2^exponent`. Much like converting to scientific notation.
4548

@@ -53,7 +56,7 @@ In this case, we will do this simply by doing a binary shift, but this is just a
5356

5457
By converting to `1.xxx * 2^exponent` form we see that our exponent is 1.
5558

56-
### I.II. Biased Exponent
59+
#### II.II. Biased Exponent
5760

5861
Our exponent is 1. But don't forget, as we said before, that in IEEE 754 we dont store the exponent, but a bias of the exponent ([...]This isn't just any exponent. It's biased, meaning it's offset by a fixed value to allow for both positive and negative powers of 2.[...]).
5962

@@ -63,13 +66,13 @@ So the number we will store is : the exponent + 127 => 1 + 127 = 128.
6366

6467
But we want this in binary. 128 in binary is `10000000`. That's our "Biased Exponent".
6568

66-
## II. Split into IEEE 754 Parts
69+
### III. Split into IEEE 754 Parts
6770

6871
- Sign: `0` (positive)
6972
- Biased Exponent: `10000000`
7073
- Mantissa: Take the first 23 bits after the 1.: `10010101111010111000101` (truncated to fit)
7174

72-
## III. Result
75+
### IV. Result
7376

7477
Indeed, `0 10000000 10010101111010111000101` is the IEEE 754 of the number 3.14... You can verify this by using a handy IEEE 754 calculator [1].
7578

@@ -97,33 +100,45 @@ That's an interesting thing to know for the next time you will do mathematical o
97100

98101
---
99102

100-
# Inifinity and "Not a Number" (IEEE 754)
103+
# Inifinity and "Not a Number" (NaN) (IEEE 754)
101104

102105
Two very interesting mathematical concepts that are impossible to represent in any other... accurate form of numbers, are infinity and "not a number". A number can be infinite, and a number can be "not a number" temporarily, until it becomes a number, or simply because it's the result of a calculation like division with 0 - instead of error it's better to take "NaN" sometimes.
103106

104107
So IEEE 754 float numbers comes here to save the day if you need to represent those 2 very important mathematical concepts.
105108

106-
### Inifinity
109+
## Inifinity
110+
107111
If you raise all the numbers of the exponent and none of the mantissa, you get infinity ! (`0 11111111 00000000000000000000000`)
112+
113+
## "Not a Number" (NaN)
114+
115+
"Not a Number" is a special case of number meant as a placeholder value for a numerical value that it's nit set yet or it was the result of an error, like for example from a division with 0 which is impossible.
116+
117+
(Talking about NaN, its actually very interesting that JSON, that it's a representation of values in a way that JS works, doesn't support NaN. JS, as many other languahes, treats numbers by default as floats, so it's interesting that JSON doesn't allow a number to be NaN. Instead, NaNs are usually converted to NULLs on most of the times when working with JSON.)
118+
108119
### "Quiet NaN"
120+
109121
If you raise all the numbers of the exponent and the "most significant bit" of the mantissa (the first or the last, depends on the endianess), you get a "Quiet NaN". This NaN is natural and it doesnt trigger an error. (`0 11111111 00000000000000000000001` / `0 11111111 10000000000000000000000`).
122+
110123
### "Signaling NaN"
111-
- If you raise all the numbers of the exponent and a bit that's not the "most significant bit" of the mantissa (like, just one in the middle, just to be sure), you get a "Signaling NaN". This NaN triggers an error. (`0 11111111 00000000000000000000010`).
112124

113-
( All this actually makes me curious why JSON doesn't support NaN, as numbers are treated as 64-bit floats in JS, where JSON comes from. )
125+
- If you raise all the numbers of the exponent and a bit that's not the "most significant bit" of the mantissa (like, just one in the middle, just to be sure), you get a "Signaling NaN". This NaN triggers an error. (`0 11111111 00000000000000000000010`).
114126

115127
---
116128

117-
## Outro
129+
# Outro
118130

119131
Floats may seem inaccurate, but this inaccuracy allows to handle bigger problems, as with huge numbers or with less hardware.
120132

121-
Extreme accuracy is not always needed, let alone that it may not even exist in reallity. A way to achive extreme accuracy, is actually to acknowledge that it's very subjective, so you limit the set. This was done eventaully at the Android calculator. [2]
133+
Extreme accuracy is not always needed, let alone that it may not even exist in reallity.
134+
135+
Even if someone is interested only in extreme accuracy, floats are interesting to search about. By studying floats, one can reflect and understand the problem they try to solve and this is working with very big ranges, very unclear sets of numbers. Easily comes out of this the opposite, that a way to achive extreme accuracy, is actually to acknowledge that it's very subjective, so you eventually have to limit the sets you are working with to something fixed.
122136

123-
This arcticle was meant to unlock the secret world of float numbers in order for us to use them better.
137+
This post was meant to unlock the secret world of float numbers in order for us to use them better.
124138

125139
---
126140

127-
## Links
141+
# Links
142+
128143
- [1] "IEEE-754 Floating Point Converter" [https://www.h-schmidt.net/FloatConverter/IEEE754.html](https://www.h-schmidt.net/FloatConverter/IEEE754.html)
129144
- [2] "A calculator app? Anyone could make that." [https://chadnauseam.com/coding/random/calculator-app](https://chadnauseam.com/coding/random/calculator-app)

0 commit comments

Comments
 (0)