@@ -3,12 +3,12 @@ Title: Fix the default floating-point representation in std::format
3
3
Shortname : P3505
4
4
Revision : 0
5
5
Audience : LEWG
6
- Status : D
6
+ Status : P
7
7
Group : WG21
8
8
URL :
9
9
Editor : Victor Zverovich, [email protected]
10
10
No abstract : true
11
- Date : 2025-02-01
11
+ Date : 2025-03-15
12
12
Markup Shorthands : markdown yes
13
13
</pre>
14
14
@@ -23,9 +23,9 @@ Introduction {#intro}
23
23
When `std::format` was proposed for standardization, floating-point formatting
24
24
was defined in terms of `std::to_chars` to simplify specification. While being
25
25
a positive change overall, this introduced a small but undesirable change
26
- compared to the reference implementation in [[FMT]] , resulting in surprising
27
- behavior to users, performance regression and an inconsistency with other
28
- mainstream programming languages that have similar facilities. This paper
26
+ compared to the design and reference implementation in [[FMT]] , resulting in
27
+ surprising behavior to users, performance regression and an inconsistency with
28
+ other mainstream programming languages that have similar facilities. This paper
29
29
proposes fixing this issue, bringing the floating-point formatting on par with
30
30
other languages and in line with the original design intent.
31
31
@@ -112,7 +112,7 @@ described above. It was great for explicit format specifiers such as `e` but,
112
112
as it turned out recently, it introduced an undesirable change to the default
113
113
format. This problem is that `std::to_chars` defines "shortness" in terms of the
114
114
number of characters in the output which is different from the "shortness" of
115
- decimal significand normally used both in the literature and in the reference .
115
+ decimal significand normally used both in the literature and in the industry .
116
116
117
117
The exponent range is much easier to reason about. For example, in this model
118
118
`100000.0` and `120000.0` are printed in the same format:
@@ -134,9 +134,9 @@ auto s2 = std::format("{}", 120000.0); // s2 == "120000"
134
134
135
135
It seems surprising and undesirable.
136
136
137
- If the shortness of the output was indeed the main criteria then it is unclear
138
- why the output format requires redundant `+` and leading zero in the exponent.
139
- Those are included in the output because , according to the specification of
137
+ Note that the output `1e+05` is not really of the shortest possible number of
138
+ characters, because + and the leading zero in the exponent are redundant .
139
+ In fact, those are required , according to the specification of
140
140
`to_chars` ([[charconv.to.chars] (https://eel.is/c++draft/charconv.to.chars)]),
141
141
142
142
> `value` is converted to a string in the style of `printf` in the `"C"` locale.
@@ -146,6 +146,9 @@ and the exponential format is defined as follows by the C standard ([[N3220]]):
146
146
> A `double` argument representing a floating-point number is converted in the
147
147
> style *[−] d.ddd e±dd* ...
148
148
149
+ Nevertheless, users interpreting the shortness condition too literally may find
150
+ this surprising.
151
+
149
152
Even more importantly, the current representation violates the original
150
153
shortness requirement from [[STEELE-WHITE]] :
151
154
@@ -175,19 +178,34 @@ rounding condition
175
178
> ([[round.style] (https://eel.is/c++draft/round.style)]).
176
179
177
180
Apart from giving a false sense of accuracy to users it also has negative
178
- performance implications. Producing "garbage digits" means that you
179
- may no longer be able to use the optimized float-to-string algorithm such as
180
- Dragonbox ([[DRAGONBOX]] ) and Ryū ([[RYU]] ) in some cases. It also introduces
181
- complicated logic to handle those cases. If the fallback algorithm does
182
- multiprecision arithmetic this may even require additional allocation(s).
181
+ performance implications. Many of the optimized float-to-string algorithms
182
+ based on Steele and White's criteria, such as Dragonbox ([[DRAGONBOX]] ) and
183
+ Ryū ([[RYU]] ), only focus on those criteria, especially the shortness of
184
+ decimal significand rather than the number of characters. As a result, an
185
+ implementation of the default floating-point handling of `std::format`
186
+ (and `std::to_chars`) cannot just directly rely on these otherwise perfectly
187
+ appropriate algorithms. Instead, it has to introduce non-trivial logic
188
+ dedicated for computing these "garbage digits". Furthermore, having to
189
+ introduce dedicated logic is likely not just because of the lack of advancement
190
+ in the algorithm research, because in this case we do need to compute more
191
+ digits than the actual precision implied by the data type, thus it is natural
192
+ to expect that we may need more precision than the case without garbage digits.
193
+ (In other words, even though a new algorithm that correctly deals with this
194
+ garbage digits case according to the current C++ standard is invented, it is
195
+ likely that it still includes some special handling of that case, in one form
196
+ or another.)
183
197
184
198
The performance issue can be illustrated on the following simple benchmark:
185
199
186
200
```c++
187
201
#include <format>
188
202
#include <benchmark/benchmark.h>
189
203
204
+ // Output: "1.2345678901234568e+22"
190
205
double normal_input = 12345678901234567000000.0;
206
+
207
+ // Output (current): "1234567890123456774144"
208
+ // Output (desired): "1.2345678901234568e+21"
191
209
double garbage_input = 1234567890123456700000.0;
192
210
193
211
void normal(benchmark::State& state) {
@@ -211,7 +229,7 @@ BENCHMARK_MAIN();
211
229
212
230
Results on macOS with Apple clang version 16.0.0 (clang-1600.0.26.6) and libc++:
213
231
214
- ```
232
+ ```text
215
233
% ./double-benchmark
216
234
Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
217
235
This does not affect benchmark measurements, only the metadata output.
@@ -234,10 +252,10 @@ garbage 91.4 ns 91.4 ns 7675186
234
252
Results on GNU/Linux with gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 and
235
253
libstdc++:
236
254
237
- ```
238
- $ ./int -benchmark
255
+ ```text
256
+ $ ./double -benchmark
239
257
2025-02-02T17:22:25+00:00
240
- Running ./int -benchmark
258
+ Running ./double -benchmark
241
259
Run on (2 X 48 MHz CPU s)
242
260
CPU Caches:
243
261
L1 Data 128 KiB (x2)
@@ -254,10 +272,10 @@ garbage 90.6 ns 90.6 ns 7360351
254
272
Results on Windows with Microsoft (R) C/C++ Optimizing Compiler Version
255
273
19.40.33811 for ARM64 and Microsoft STL:
256
274
257
- ```
258
- >int -benchmark.exe
275
+ ```text
276
+ >double -benchmark.exe
259
277
2025-02-02T08:10:39-08:00
260
- Running int -benchmark.exe
278
+ Running double -benchmark.exe
261
279
Run on (2 X 2000 MHz CPU s)
262
280
CPU Caches:
263
281
L1 Instruction 192 KiB (x2)
@@ -283,6 +301,33 @@ normal(benchmark::State&):
283
301
159.00 ms ... std::__1::to_chars_result std::__1::_Floating_to_chars[abi:ne180100] <...>(char*, char*, double, std::__1::chars_format, int)
284
302
```
285
303
304
+ For comparison here are the results of running the same benchmark with
305
+ `std::format` replaced with `fmt::format` which doesn't produce "garbage
306
+ digits":
307
+
308
+ ```text
309
+ $ ./double-benchmark
310
+ Unable to determine clock rate from sysctl: hw.cpufrequency: No such file or directory
311
+ This does not affect benchmark measurements, only the metadata output.
312
+ ***WARNING*** Failed to set thread affinity. Estimated CPU frequency may be incorrect.
313
+ 2025-03-15T08:18:56-07:00
314
+ Running ./double-benchmark
315
+ Run on (8 X 24 MHz CPU s)
316
+ CPU Caches:
317
+ L1 Data 64 KiB
318
+ L1 Instruction 128 KiB
319
+ L2 Unified 4096 KiB (x8)
320
+ Load Average: 3.00, 3.91, 4.85
321
+ ------------------------------------------------------
322
+ Benchmark Time CPU Iterations
323
+ ------------------------------------------------------
324
+ fmt_normal 53.0 ns 53.0 ns 13428484
325
+ fmt_garbage 53.4 ns 53.4 ns 13032712
326
+ ```
327
+
328
+ As expected, the time is nearly identical between the two cases. It demonstrates
329
+ that the performance gap can be eliminated if this paper is accepted.
330
+
286
331
Locale makes the situation even more confusing to users. Consider the following
287
332
example:
288
333
@@ -463,8 +508,9 @@ Table 105 — Meaning of type options for floating-point types
463
508
<td>
464
509
<ins> Let <code> fmt</code> be `chars_format::fixed` if <code> value</code>
465
510
is in the range [10<sup> -4</sup> , 10<sup><i> n</i></sup> ), where
466
- 10<sup><i> n</i></sup> is 2<sup><code> std::numeric_limits<decltype(value)>::digits</code></sup>
467
- rounded to the nearest power of 10, `chars_format::scientific` otherwise.
511
+ 10<sup><i> n</i></sup> is
512
+ 2<sup><code> std::numeric_limits<decltype(value)>::digits</code> + 1</sup>
513
+ rounded down to the nearest power of 10, `chars_format::scientific` otherwise.
468
514
</ins>
469
515
470
516
If *precision* is specified, equivalent to
@@ -484,8 +530,8 @@ Implementation and usage experience {#impl}
484
530
485
531
The current proposal is based on the existing implementation in [[FMT]] which
486
532
has been available and widely used for over 6 years. Similar logic based on the
487
- value range rather than output size is implemented in Python, Java, JavaScript ,
488
- Rust and Swift.
533
+ value range rather than the output size is implemented in Python, Java,
534
+ JavaScript, Rust and Swift.
489
535
490
536
<!-- Grisu in {fmt}: https://github.com/fmtlib/fmt/issues/147#issuecomment-461118641 -->
491
537
0 commit comments