|
3 | 3 | <feed xmlns="http://www.w3.org/2005/Atom"> |
4 | 4 | <id>https://blog.simplecode.gr</id> |
5 | 5 | <title>Simplecode Blog</title> |
6 | | - <updated>2025-10-16T06:17:40.504Z</updated> |
| 6 | + <updated>2025-10-16T06:37:10.621Z</updated> |
7 | 7 | <generator>Astro-Theme-Retypeset with Feed for Node.js</generator> |
8 | 8 | <author> |
9 | 9 | <name>Simplecode</name> |
|
28 | 28 | <h2>IEEE 754</h2> |
29 | 29 | <p>As we said before, there isn't only one standard for storing floats, but many. IEEE 754 is by far the most common. It breaks every floating-point number into three distinct parts, packed into either 32 bits (single-precision) or 64 bits (double-precision):</p> |
30 | 30 | <ul> |
31 | | -<li><strong>Sign Bit (1 bit)</strong>: A single bit that decides whether the number is positive (0) or negative (1).</li> |
32 | | -<li><strong>Biased Exponent</strong> (8 bits for 32-bit, 11 bits for 64-bit): This isn't just any exponent. It's biased, meaning it's offset by a fixed value to allow for both positive and negative powers of 2. For 32-bit floats, the bias is 127. So, an exponent of 0 is stored as 127, -1 as 126, and 127 as 254. This trick lets the hardware handle negative exponents without needing a sign bit for the exponent itself. The base is 2 because computers works with binary, so in the scientific notation example we saw before would had been 1*2^-43 (2^-43 = 10^-13).</li> |
33 | | -<li><strong>Mantissa</strong> (23 bits for 32-bit, 52 bits for 64-bit): This stores the significant digits of the number, but the leading 1 is implied and not stored. For example, the binary number 1.0110 is stored as just 0110, saving space but adding complexity.</li> |
| 31 | +<li><strong>Sign Bit (1 bit)</strong>: A single bit that decides whether the number is <em><strong>positive</strong></em> (<code>0</code>) or <em><strong>negative</strong></em> (<code>1</code>).</li> |
| 32 | +<li><strong>Biased Exponent</strong> (<em>8 bits for 32-bit, 11 bits for 64-bit</em>): This isn't just any exponent. It's biased, meaning it's offset by a fixed value to allow for both positive and negative powers of 2. <em><strong>For 32-bit floats, the bias is 127</strong></em>. So, an exponent of 0 is stored as 127, -1 as 126, and 127 as 254. <em><strong>For 64-bit floats, the bias is 1023</strong></em>. This trick lets the hardware handle negative exponents without needing a sign bit for the exponent itself. The base is 2 because computers works with binary, so we need something in the form of <code>x*2^y</code>.</li> |
| 33 | +<li><strong>Mantissa</strong> (<em>23 bits for 32-bit, 52 bits for 64-bit</em>): This stores the significant digits of the number, but the leading 1 is implied and not stored. For example, the binary number <code>1.0110</code> is stored as just <code>0110</code>, saving space but adding complexity. <em><strong>Notice how the mantissa length increases significantly from 23 bits in a 32-bit float to 52 bits in a 64-bit float.</strong></em></li> |
34 | 34 | </ul> |
35 | 35 | <h3>IEEE 754 Example</h3> |
36 | | -<p>A nice example would beeing calculating 3.14... in IEEE 754.</p> |
| 36 | +<p>A nice example is to calculate <code>3.14...</code> in IEEE 754.</p> |
| 37 | +<p>We will assume that we are going to calculate a 32-bit IEEE 754 float. This is important, as based on the above, it changes the bias of the "Biased Exponent" to 127 and the length of the "Mantissa" to 23 bits.</p> |
37 | 38 | <h4>I. Binary</h4> |
38 | 39 | <ul> |
39 | | -<li> |
40 | | -<p>3 in binary is <code>11</code></p> |
41 | | -</li> |
42 | | -<li> |
43 | | -<p>0.14... in binary is <code>0.0010101111010111000...</code></p> |
44 | | -</li> |
45 | | -<li> |
46 | | -<p>3.14 in binary is <code>11.0010101111010111000...</code></p> |
47 | | -</li> |
| 40 | +<li>3 in binary is <code>11</code></li> |
| 41 | +<li>0.14... in binary is <code>0.0010101111010111000...</code></li> |
| 42 | +</ul> |
| 43 | +<p>=></p> |
| 44 | +<ul> |
| 45 | +<li>3.14 in binary is <code>11.0010101111010111000...</code></li> |
48 | 46 | </ul> |
49 | 47 | <h4>II. Biased Exponent</h4> |
50 | 48 | <h5>II.I. Exponent</h5> |
|
56 | 54 | <p>By converting to <code>1.xxx * 2^exponent</code> form we see that our exponent is 1.</p> |
57 | 55 | <h5>II.II. Biased Exponent</h5> |
58 | 56 | <p>Our exponent is 1. But don't forget, as we said before, that in IEEE 754 we dont store the exponent, but a bias of the exponent ([...]This isn't just any exponent. It's biased, meaning it's offset by a fixed value to allow for both positive and negative powers of 2.[...]).</p> |
59 | | -<p>We will assume that we are talking about a 32-bit float, so our bias, based on what is written above, is 127.</p> |
| 57 | +<p><em>We assumed that we are talking about a 32-bit float</em>, so our bias, based on what is written above, is 127.</p> |
60 | 58 | <p>So the number we will store is : the exponent + 127 => 1 + 127 = 128.</p> |
61 | 59 | <p>But we want this in binary. 128 in binary is <code>10000000</code>. That's our "Biased Exponent".</p> |
62 | | -<h4>III. Split into IEEE 754 Parts</h4> |
| 60 | +<h4>III. Mantissa</h4> |
| 61 | +<p>Take the first 23 bits (<em>because we assumed we are talking about 32-bit float</em>) after the <code>1.</code>, from the binary we found on the step I. : <code>10010101111010111000101</code> (truncated to fit). This is out "Mantissa".</p> |
| 62 | +<h4>IV. Split into IEEE 754 Parts</h4> |
63 | 63 | <ul> |
64 | | -<li>Sign: <code>0</code> (positive)</li> |
65 | | -<li>Biased Exponent: <code>10000000</code></li> |
66 | | -<li>Mantissa: Take the first 23 bits after the 1.: <code>10010101111010111000101</code> (truncated to fit)</li> |
| 64 | +<li><strong>Sign</strong>: <code>0</code> (positive)</li> |
| 65 | +<li><strong>Biased Exponent</strong>: <code>10000000</code></li> |
| 66 | +<li><strong>Mantissa</strong>: <code>10010101111010111000101</code></li> |
67 | 67 | </ul> |
68 | | -<h4>IV. Result</h4> |
| 68 | +<h4>V. Result</h4> |
69 | 69 | <p>Indeed, <code>0 10000000 10010101111010111000101</code> is the IEEE 754 of the number 3.14... You can verify this by using a handy IEEE 754 calculator [1].</p> |
70 | 70 | <hr /> |
71 | 71 | <h2>The (10^100) + 1 − (10^100) Problem</h2> |
|
0 commit comments