Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add RoundMode (half even; half away from zero) and fix inconsistent rounding between float and decimal #21883

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Julian-J-S
Copy link
Contributor

@Julian-J-S Julian-J-S commented Mar 21, 2025

fix #21800 (though as @orlp mentioned there are theoretically more round modes to add)

Current Problem ⚠️

  • current round expression has no mode and is inconsistent
    • Decimal: uses "half to even"
    • Float: uses "half away from zero"

PR Solution 🍀

  • implement missing round modes
    • Decimal: implement "half away from zero"
    • Float: implement "half to even"
  • set "half to even" as default for float & decimal

Example

df = pl.DataFrame(
    {
        "f64": [-3.5, -2.5, -1.5, -0.5, 0.5, 1.5, 2.5, 3.5],
        "d": ["-3.5", "-2.5", "-1.5", "-0.5", "0.5", "1.5", "2.5", "3.5"],
    },
    schema={
        "f64": pl.Float64,
        "d": pl.Decimal(scale=1),
    },
)

df.with_columns(
    pl.all().round().name.suffix("_default"),
    pl.all().round(mode="half_away_from_zero").name.suffix("_away_from_zero"),
    pl.all().round(mode="half_to_even").name.suffix("_to_even"),
)

shape: (8, 8)
┌──────┬──────────────┬─────────────┬──────────────┬────────────────────┬──────────────────┬─────────────┬──────────────┐
│ f64df64_defaultd_defaultf64_away_from_zerod_away_from_zerof64_to_evend_to_even    │
│ ------------------------          │
│ f64decimal[*,1] ┆ f64decimal[*,1] ┆ f64decimal[*,1]     ┆ f64decimal[*,1] │
╞══════╪══════════════╪═════════════╪══════════════╪════════════════════╪══════════════════╪═════════════╪══════════════╡
│ -3.5-3.5-4.0-4.0-4.0-4.0-4.0-4.0         │
│ -2.5-2.5-2.0-2.0-3.0-3.0-2.0-2.0         │
│ -1.5-1.5-2.0-2.0-2.0-2.0-2.0-2.0         │
│ -0.5-0.5-0.00.0-1.0-1.0-0.00.0          │
│ 0.50.50.00.01.01.00.00.0          │
│ 1.51.52.02.02.02.02.02.0          │
│ 2.52.52.02.03.03.02.02.0          │
│ 3.53.54.04.04.04.04.04.0          │
└──────┴──────────────┴─────────────┴──────────────┴────────────────────┴──────────────────┴─────────────┴──────────────┘

Open

  • I will just quote @orlp:

Thus I'd suggest the following names (and their half_ cousins):
to_even, to nearest even (default),
to_zero, to 0,
away_from_zero, away from 0 to +inf for positive, -inf for negative,
ceil, to +inf,
floor, to -inf,
stochastic, as explained earlier.

IMO having consistent rounding across types and the two most used round modes will already be a great benefit and cover 99% of use-cases.
We can implement the other modes in the future if time and demand is there.

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Mar 21, 2025
Copy link

codecov bot commented Mar 21, 2025

Codecov Report

Attention: Patch coverage is 54.83871% with 42 lines in your changes missing coverage. Please review.

Project coverage is 80.91%. Comparing base (0f53539) to head (bd039bb).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
crates/polars-ops/src/series/ops/round.rs 53.70% 25 Missing ⚠️
.../polars-python/src/lazyframe/visitor/expr_nodes.rs 0.00% 12 Missing ⚠️
crates/polars-python/src/conversion/mod.rs 50.00% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #21883      +/-   ##
==========================================
- Coverage   80.92%   80.91%   -0.02%     
==========================================
  Files        1624     1624              
  Lines      234643   234694      +51     
  Branches     2693     2693              
==========================================
+ Hits       189878   189893      +15     
- Misses      44133    44169      +36     
  Partials      632      632              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@orlp
Copy link
Collaborator

orlp commented Mar 21, 2025

I think we need to do another pass over the names. I think half_away_from_zero is a bit too wordy, and I also just realized my usage of 'up' and 'down' conflicts with both Python and Java.

Can we discuss that in the issue? Also, please don't make me catch API changes in the review, please just stick with what the issue suggests or discuss changes in the issue first.

@Julian-J-S
Copy link
Contributor Author

@orlp as discussed I adjusted the names and implemented

  • half_to_even (was missing for f32/f64)
  • half_away_from_zero (was missing for Decimal)
  • set half_to_even as default for both

please have a look 🤓

Comment on lines +1292 to +1295
match mode {
RoundMode::HalfToEven => "half_to_even",
RoundMode::HalfAwayFromZero => "half_away_from_zero",
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the enum is annotated with strum, we can just use:

Into::<&str>::into(mode)

here, I think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please bump the minor IR version in visit.rs (in the impl of NodeTraverser).

Comment on lines +406 to +418
#[pyclass(name = "RoundMode")]
pub struct PyRoundMode {
inner: RoundMode,
}

#[pymethods]
impl PyRoundMode {
#[getter]
fn kind(&self) -> &str {
self.inner.into()
}
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then there is no need for these.

} else {
// Note we do the computation on f64 floats to not lose precision
// when the computation is done, we cast to f32
let multiplier = 10.0.pow(decimals as f64);
Copy link
Collaborator

@orlp orlp Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use powi without casting the exponent. This applies to the other spots too.

.apply_values(|v| {
// We use rounding=ROUND_HALF_EVEN
let res = match mode {
RoundMode::HalfToEven => ca.apply_values(|v| {
let rem = v % multiplier;
let is_v_floor_even = ((v - rem) / multiplier) % 2 == 0;
Copy link
Collaborator

@orlp orlp Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please avoid the extra division by doing a modulo with 2 * multiplier and some extra logic? i128 division/modulo is very very expensive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add RoundMode to round and fix rounding inconsistency (Decimal uses "half_even", float uses "up")
3 participants