Skip to content

Commit 1576766

Browse files
committed
Update paper
1 parent 26ae269 commit 1576766

File tree

2 files changed

+171
-86
lines changed

2 files changed

+171
-86
lines changed

Diff for: papers/p3505.bs

+71-25
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,12 @@ Title: Fix the default floating-point representation in std::format
33
Shortname: P3505
44
Revision: 0
55
Audience: LEWG
6-
Status: D
6+
Status: P
77
Group: WG21
88
URL:
99
Editor: Victor Zverovich, [email protected]
1010
No abstract: true
11-
Date: 2025-02-01
11+
Date: 2025-03-15
1212
Markup Shorthands: markdown yes
1313
</pre>
1414

@@ -23,9 +23,9 @@ Introduction {#intro}
2323
When `std::format` was proposed for standardization, floating-point formatting
2424
was defined in terms of `std::to_chars` to simplify specification. While being
2525
a positive change overall, this introduced a small but undesirable change
26-
compared to the reference implementation in [[FMT]], resulting in surprising
27-
behavior to users, performance regression and an inconsistency with other
28-
mainstream programming languages that have similar facilities. This paper
26+
compared to the design and reference implementation in [[FMT]], resulting in
27+
surprising behavior to users, performance regression and an inconsistency with
28+
other mainstream programming languages that have similar facilities. This paper
2929
proposes fixing this issue, bringing the floating-point formatting on par with
3030
other languages and in line with the original design intent.
3131

@@ -112,7 +112,7 @@ described above. It was great for explicit format specifiers such as `e` but,
112112
as it turned out recently, it introduced an undesirable change to the default
113113
format. This problem is that `std::to_chars` defines "shortness" in terms of the
114114
number of characters in the output which is different from the "shortness" of
115-
decimal significand normally used both in the literature and in the reference.
115+
decimal significand normally used both in the literature and in the industry.
116116

117117
The exponent range is much easier to reason about. For example, in this model
118118
`100000.0` and `120000.0` are printed in the same format:
@@ -134,9 +134,9 @@ auto s2 = std::format("{}", 120000.0); // s2 == "120000"
134134

135135
It seems surprising and undesirable.
136136

137-
If the shortness of the output was indeed the main criteria then it is unclear
138-
why the output format requires redundant `+` and leading zero in the exponent.
139-
Those are included in the output because, according to the specification of
137+
Note that the output `1e+05` is not really of the shortest possible number of
138+
characters, because + and the leading zero in the exponent are redundant.
139+
In fact, those are required, according to the specification of
140140
`to_chars` ([[charconv.to.chars](https://eel.is/c++draft/charconv.to.chars)]),
141141

142142
> `value` is converted to a string in the style of `printf` in the `"C"` locale.
@@ -146,6 +146,9 @@ and the exponential format is defined as follows by the C standard ([[N3220]]):
146146
> A `double` argument representing a floating-point number is converted in the
147147
> style *[−]d.ddd e±dd* ...
148148

149+
Nevertheless, users interpreting the shortness condition too literally may find
150+
this surprising.
151+
149152
Even more importantly, the current representation violates the original
150153
shortness requirement from [[STEELE-WHITE]]:
151154

@@ -175,19 +178,34 @@ rounding condition
175178
> ([[round.style](https://eel.is/c++draft/round.style)]).
176179

177180
Apart from giving a false sense of accuracy to users it also has negative
178-
performance implications. Producing "garbage digits" means that you
179-
may no longer be able to use the optimized float-to-string algorithm such as
180-
Dragonbox ([[DRAGONBOX]]) and Ryū ([[RYU]]) in some cases. It also introduces
181-
complicated logic to handle those cases. If the fallback algorithm does
182-
multiprecision arithmetic this may even require additional allocation(s).
181+
performance implications. Many of the optimized float-to-string algorithms
182+
based on Steele and White's criteria, such as Dragonbox ([[DRAGONBOX]]) and
183+
Ryū ([[RYU]]), only focus on those criteria, especially the shortness of
184+
decimal significand rather than the number of characters. As a result, an
185+
implementation of the default floating-point handling of `std::format`
186+
(and `std::to_chars`) cannot just directly rely on these otherwise perfectly
187+
appropriate algorithms. Instead, it has to introduce non-trivial logic
188+
dedicated for computing these "garbage digits". Furthermore, having to
189+
introduce dedicated logic is likely not just because of the lack of advancement
190+
in the algorithm research, because in this case we do need to compute more
191+
digits than the actual precision implied by the data type, thus it is natural
192+
to expect that we may need more precision than the case without garbage digits.
193+
(In other words, even though a new algorithm that correctly deals with this
194+
garbage digits case according to the current C++ standard is invented, it is
195+
likely that it still includes some special handling of that case, in one form
196+
or another.)
183197

184198
The performance issue can be illustrated on the following simple benchmark:
185199

186200
```c++
187201
#include <format>
188202
#include <benchmark/benchmark.h>
189203

204+
// Output: "1.2345678901234568e+22"
190205
double normal_input = 12345678901234567000000.0;
206+
207+
// Output (current): "1234567890123456774144"
208+
// Output (desired): "1.2345678901234568e+21"
191209
double garbage_input = 1234567890123456700000.0;
192210

193211
void normal(benchmark::State& state) {
@@ -211,7 +229,7 @@ BENCHMARK_MAIN();
211229

212230
Results on macOS with Apple clang version 16.0.0 (clang-1600.0.26.6) and libc++:
213231

214-
```
232+
```text
215233
% ./double-benchmark
216234
Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
217235
This does not affect benchmark measurements, only the metadata output.
@@ -234,10 +252,10 @@ garbage 91.4 ns 91.4 ns 7675186
234252
Results on GNU/Linux with gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 and
235253
libstdc++:
236254

237-
```
238-
$ ./int-benchmark
255+
```text
256+
$ ./double-benchmark
239257
2025-02-02T17:22:25+00:00
240-
Running ./int-benchmark
258+
Running ./double-benchmark
241259
Run on (2 X 48 MHz CPU s)
242260
CPU Caches:
243261
L1 Data 128 KiB (x2)
@@ -254,10 +272,10 @@ garbage 90.6 ns 90.6 ns 7360351
254272
Results on Windows with Microsoft (R) C/C++ Optimizing Compiler Version
255273
19.40.33811 for ARM64 and Microsoft STL:
256274

257-
```
258-
>int-benchmark.exe
275+
```text
276+
>double-benchmark.exe
259277
2025-02-02T08:10:39-08:00
260-
Running int-benchmark.exe
278+
Running double-benchmark.exe
261279
Run on (2 X 2000 MHz CPU s)
262280
CPU Caches:
263281
L1 Instruction 192 KiB (x2)
@@ -283,6 +301,33 @@ normal(benchmark::State&):
283301
159.00 ms ... std::__1::to_chars_result std::__1::_Floating_to_chars[abi:ne180100]<...>(char*, char*, double, std::__1::chars_format, int)
284302
```
285303

304+
For comparison here are the results of running the same benchmark with
305+
`std::format` replaced with `fmt::format` which doesn't produce "garbage
306+
digits":
307+
308+
```text
309+
$ ./double-benchmark
310+
Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
311+
This does not affect benchmark measurements, only the metadata output.
312+
***WARNING*** Failed to set thread affinity. Estimated CPU frequency may be incorrect.
313+
2025-03-15T08:18:56-07:00
314+
Running ./double-benchmark
315+
Run on (8 X 24 MHz CPU s)
316+
CPU Caches:
317+
L1 Data 64 KiB
318+
L1 Instruction 128 KiB
319+
L2 Unified 4096 KiB (x8)
320+
Load Average: 3.00, 3.91, 4.85
321+
------------------------------------------------------
322+
Benchmark Time CPU Iterations
323+
------------------------------------------------------
324+
fmt_normal 53.0 ns 53.0 ns 13428484
325+
fmt_garbage 53.4 ns 53.4 ns 13032712
326+
```
327+
328+
As expected, the time is nearly identical between the two cases. It demonstrates
329+
that the performance gap can be eliminated if this paper is accepted.
330+
286331
Locale makes the situation even more confusing to users. Consider the following
287332
example:
288333

@@ -463,8 +508,9 @@ Table 105 — Meaning of type options for floating-point types
463508
<td>
464509
<ins>Let <code>fmt</code> be `chars_format::fixed` if <code>value</code>
465510
is in the range [10<sup>-4</sup>, 10<sup><i>n</i></sup>), where
466-
10<sup><i>n</i></sup> is 2<sup><code>std::numeric_limits&lt;decltype(value)&gt;::digits</code></sup>
467-
rounded to the nearest power of 10, `chars_format::scientific` otherwise.
511+
10<sup><i>n</i></sup> is
512+
2<sup><code>std::numeric_limits&lt;decltype(value)&gt;::digits</code> + 1</sup>
513+
rounded down to the nearest power of 10, `chars_format::scientific` otherwise.
468514
</ins>
469515

470516
If *precision* is specified, equivalent to
@@ -484,8 +530,8 @@ Implementation and usage experience {#impl}
484530

485531
The current proposal is based on the existing implementation in [[FMT]] which
486532
has been available and widely used for over 6 years. Similar logic based on the
487-
value range rather than output size is implemented in Python, Java, JavaScript,
488-
Rust and Swift.
533+
value range rather than the output size is implemented in Python, Java,
534+
JavaScript, Rust and Swift.
489535

490536
<!-- Grisu in {fmt}: https://github.com/fmtlib/fmt/issues/147#issuecomment-461118641 -->
491537

0 commit comments

Comments
 (0)