Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default to printing floats with decimal instead of scientific notation #22971

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

MasonRemaley
Copy link
Contributor

Currently, std.fmt defaults to formatting floats with scientific notation. IIUC this was originally due to some limitations in the float formatting code that have since been resolved.

This results in some silly output, such as @as(f32, 1) being formatted as 1e0. Outside of being a bit odd, it makes it easy to misinterpret output if you miss the e at the end of a number. You can just pass d to the formatter when formatting a single number, but this doesn't work if you're formatting e.g. a struct that contains fields with numbers. As such it's worth it to have a good default here.

This PR changes the default to decimal, and updates the corresponding tests.

@castholm
Copy link
Contributor

Obviously formatting 1 as 1e0 looks dumb, but have you considered the impact this change will have on very small or very large values? One upside of scientific notation is that there's a reasonable limit to the maximum length of the formatted string. Formatting std.math.floatTrueMin(f64) as a decimal requires 326 characters. Even if most values don't get close to the lower or upper limits, you don't need to get that far away from 0 before the output from printing larger structs/arrays starts to get overwhelmingly noisy and difficult for humans to parse.

Many other programming languages and shells seem to default to using decimal notation when the number is within a relatively small "human friendly" range and scientific notation otherwise. E.g. Python seems to prefer decimal for values > 1e-5 and < 1e+16, and for JavaScript it appears to be > 1e-7 and < 1e+21. Perhaps something like that would be a better default?

@mrjbq7
Copy link
Contributor

mrjbq7 commented Feb 23, 2025

It shouldn't require 326 characters, I think, if it uses a smart algorithm like Dragonbox.

https://github.com/jk-jeon/dragonbox

@tiehuis
Copy link
Member

tiehuis commented Feb 23, 2025

No, the worst case decimal outputs are going to require a lot of characters regardless of algorithm, as they are generally aiming to have round-trippable output.

@mrjbq7
Copy link
Contributor

mrjbq7 commented Feb 23, 2025

Dragonbox has these three properties:

https://github.com/jk-jeon/dragonbox?tab=readme-ov-file#introduction

The algorithm guarantees three things:

It has the roundtrip guarantee; that is, a correct parser interprets the generated output string as the original input floating-point number. (See here for some explanation on this.)

The output is of the shortest length; that is, no other output strings that are interpreted as the input number can contain less number of significand digits than the output of Dragonbox.

The output is correctly rounded: the number generated by Dragonbox is the closest to the actual value of the input number among possible outputs of minimum number of digits.

It is quite worth implementing here instead of decimal output.

@tiehuis
Copy link
Member

tiehuis commented Feb 23, 2025

Please read more about the algorithm you are suggesting. You are failing to understand what it is providing and what this MR is attempting to improve. Dragonbox is similar to Ryu (which zig implements) in that they are based on generating a signficand and exponent in shortest-form. The first sentence in your link explicitly specifies its purpose.

Happy to talk about this elsewhere but this is not related to this MR.

@mrjbq7
Copy link
Contributor

mrjbq7 commented Feb 24, 2025

You are failing to understand what it is providing and what this MR is attempting to improve.

I re-read the PR, and I see what you mean. Separately, Dragonbox is objectively better than Ryu. We switched to it and have been quite happy in the Factor programming language.

@andrewrk
Copy link
Member

It's extremely easy to evaluate these things objectively. There's no reason to assert such things without evidence. Please don't assert performance claims without pointing to some reproducible benchmark. It's just noise on the issue tracker. Data points or gtfo

@mrjbq7
Copy link
Contributor

mrjbq7 commented Feb 24, 2025

Please don't assert performance claims without pointing to some reproducible benchmark.

Sorry, I'm not asserting performance claims. I'm making an assertion around the satisfying human-readability and correctness of the algorithm. It is also pretty fast.

@MasonRemaley
Copy link
Contributor Author

@tiehuis let me know if you have any thoughts for or against merging this--I'm interested in your take since you provided the ryu implementation.

@andrewrk
Copy link
Member

you might have missed #22971 (comment)

@MasonRemaley
Copy link
Contributor Author

Ah that's a good point--I'll look into how other languages decide what the cutoff is here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants