Skip to content

Commit f8a7555

Browse files
authored
Merge pull request #5 from whirlicote/patch-1
minor fixes
2 parents d3c5e6b + 8a98c5e commit f8a7555

File tree

1 file changed

+19
-14
lines changed

1 file changed

+19
-14
lines changed

proposals/rounding-mode-control/Overview.md

Lines changed: 19 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,7 @@ This proposal proposes to extend the matrix of floating point instructions by co
114114
| 0xFC | 0x7A | 0b01111010 | f64.convert_i64_u_trunc | convert_i64_u_trunc |
115115
| 0xFC | 0x7B | 0b01111011 | f64.promote_f32_trunc | promote_f32_trunc |
116116

117-
## semantics
117+
## Semantics
118118

119119
The semantics are specified by the IEEE standard and are as follows: For an instruction `O.f_I_round` with
120120

@@ -130,25 +130,30 @@ it is that:
130130

131131
| | | |
132132
|----------------|-----|------------------------------------------------------|
133-
| O.f_I_ceil(x) | `=` | `min { y \| y ∈ O, F(x) <= y }` |
134-
| O.f_I_floor(x) | `=` | `max { y \| y ∈ O, y <= F(x) }` |
135-
| O.f_I_trunc(x) | `=` | `if 0 < F(x) then O.f_I_floor(x) else O.f_I_ceil(x)` |
133+
| `O.f_I_ceil(x)` | `=` | `min { y \| y ∈ O, F(x) <= y }` |
134+
| `O.f_I_floor(x)` | `=` | `max { y \| y ∈ O, y <= F(x) }` |
135+
| `O.f_I_trunc(x)` | `=` | `if 0 < F(x) then O.f_I_floor(x) else O.f_I_ceil(x)` |
136136

137-
This definition is not complete as the result might be zero and in that case the sign has to be determined. In that case IEEE defines the result as `+0.0` with following exception: Should `f` be `+` or `-` and the result be zero then the sign is `-`. In Formal notation this gives us: (higher listed rules have precedence)
137+
This definition is not complete as the result might be zero and in that case the sign has to be determined. In that case IEEE defines the result as the usual sign rules (`+0.0` for `±`, `copysign(1,a*b) = copysign(1,a)*copysign(1,b)`) with following exception: Should `f` be `+` or `-` and the result be zero then the sign is `-`. In formal notation this gives us: (higher listed rules have precedence)
138138

139139

140140
| | | |
141141
| :------------- | :---: | :----------------------------------------------------------- |
142-
| O.±_I_floor(x) | `=` | `min (maximal { y \| y ∈ O, y <= F(x) })` |
143-
| O.f_I_ceil(x) | `=` | `max { y \| y ∈ O\{-0.0}, F(x) <= y }` |
144-
| O.f_I_floor(x) | `=` | `min { y \| y ∈ O\{-0.0}, y <= F(x) })` |
145-
| O.f_I_trunc(x) | `=` | `if 0 < F(x) then O.f_I_floor(x) else O.f_I_ceil(x)` |
142+
| `O.±_I_floor(x)` | `=` | `min (maximal { y \| y ∈ O, y <= F(x) })` |
143+
| `O.±_I_ceil(x)` | `=` | `max (minimal { y \| y ∈ O, F(x) <= y })` |
144+
| `O.f_I_ceil(x)` | `=` | `adapt_zero_signₓ { y \| y ∈ O, F(x) <= y }` |
145+
| `O.f_I_floor(x)` | `=` | `adapt_zero_signₓ { y \| y ∈ O, y <= F(x) }` |
146+
| `O.f_I_trunc(x)` | `=` | `if 0 < F(x) then O.f_I_floor(x) else O.f_I_ceil(x)` |
147+
| `adapt_zero_signₓ( {r} )` | `=` | `r` |
148+
| `adapt_zero_signₓ( {-0.0,+0.0} )` | `=` | `copysign(0.0, x)` |
149+
| `adapt_zero_sign₍ₗ,ᵣ₎( {-0.0,+0.0} )` | `=` | `copysign(0.0, copysign(1.0, l) * copysign(1.0, r))` |
146150

147-
Here means `minimal` a functions that gives the set of minimal elements of the input set. So the result set is either a non zero number singleton or the set `{-0.0, 0.0}`. Here `max` and `min` are parameterized on the relation `<` with the additional having `-0.0 < 0.0` to make the relation total.
148151

149-
## redundants
152+
Here `maximal` means a function that gives the set of maximal elements of the input set. So the result set is either a non zero number singleton or the set `{-0.0, 0.0}`. Here `max` and `min` are parameterized on the relation `<` with the additional having `-0.0 < 0.0` to make the relation total.
150153

151-
The following functions are redundant and simple to implement:
154+
## Redundants
155+
156+
The following functions are redundant and simple to implement. For instance:
152157
```
153158
f32.convert_i32_s_ceil
154159
f32.convert_i32_s_floor
@@ -181,7 +186,7 @@ f64.convert_i32_u_ceil
181186
f64.convert_i32_u_floor
182187
f64.convert_i32_u_trunc
183188
```
184-
It turns out that the functions above are respectively equivalent to `f64.convert_i32_s` and `f64.convert_i32_u`. The reason is that an `f64` float contains a 53-bit mantissa, which is sufficient to accommodate an entire 'i32'.
189+
It turns out that the functions above are respectively equivalent to `f64.convert_i32_s` and `f64.convert_i32_u`. The reason is that an `f64` float contains a 53-bit mantissa, which is sufficient to accommodate an entire `i32`.
185190

186191

187192

@@ -199,7 +204,7 @@ Technically the redundant functions do not add much normative value. But they ar
199204

200205
The operation and conversion tensor does not get arbitrary holes. This makes it easier to reason about the operations. The mathematical defenition of the semantic of `f64.promote_f32_ceil` is still different from `f64.promote_f32`. It is easier to express intend that way.
201206

202-
Having the full conversion tensor may improve portability of WebAssembly: With rounded instructions it is possible to write algorithms that are independent of and equivalent over different number formats. For example a user of the `wasm2c` tool could purposefully relax the requirement of `f32` to be IEEE floating point. The `c` standard does not require `flaot` to be IEEE. Lets say `f32` gets implemented by the plattform as `posit32`. `posit32` is a number format with more precicion around `1.0` than IEEE. That way there might be numbers in `float` that are not representable in `double`. Now you need `promot` with a rounding variation so that your iterating enclosing loop is still converging.
207+
Having the full conversion tensor may improve portability of WebAssembly: With rounded instructions it is possible to write algorithms that are independent of and equivalent over different number formats. For example a user of the `wasm2c` tool could purposefully relax the requirement of `f32` to be IEEE floating point. The `c` standard does not require `flaot` to be IEEE. Lets say `f32` gets implemented by the plattform as `posit32`. `posit32` is a number format with more precicion around `1.0` than IEEE. That way there might be numbers in `float` that are not representable in `double`. Now you need a rounding variant of `promot` so that your iterating enclosing loop is still converging.
203208

204209
## performance
205210

0 commit comments

Comments
 (0)