-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Open
Labels
A-sqlArea: Polars SQL functionalityArea: Polars SQL functionalitybugSomething isn't workingSomething isn't workingneeds triageAwaiting prioritization by a maintainerAwaiting prioritization by a maintainerrustRelated to Rust PolarsRelated to Rust Polars
Description
Checks
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of Polars.
Reproducible example
#[test]
fn iss_avg_over_order_by_cumulative() {
// Test that AVG() OVER (ORDER BY ... ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
// produces a running/cumulative average.
//
// Currently broken: AVG uses visit_unary(Expr::mean) which doesn't have a cumulative
// equivalent like SUM/MIN/MAX do (cum_sum, cum_min, cum_max).
//
// The manual workaround using SUM()/ROW_NUMBER() works correctly.
let df = df! {
"foo" => [1, 2, 3, 4, 5],
"bar" => [10.0, 20.0, 30.0, 40.0, 50.0]
}
.unwrap()
.lazy();
let mut ctx = SQLContext::new();
ctx.register("df", df);
// This should compute cumulative average: [10, 15, 20, 25, 30]
let query_avg = r#"
SELECT
foo,
bar,
AVG(bar) OVER (
ORDER BY foo
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS mean_avg
FROM df
ORDER BY foo
"#;
// This workaround computes cumulative average correctly: [10, 15, 20, 25, 30]
let query_manual = r#"
SELECT
foo,
bar,
SUM(bar) OVER (
ORDER BY foo
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) / ROW_NUMBER() OVER (
ORDER BY foo
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS mean_manual
FROM df
ORDER BY foo
"#;
let df_avg = ctx.execute(query_avg).unwrap().collect().unwrap();
let df_manual = ctx.execute(query_manual).unwrap().collect().unwrap();
// Expected cumulative averages: 10/1=10, 30/2=15, 60/3=20, 100/4=25, 150/5=30
let expected_means = vec![10.0, 15.0, 20.0, 25.0, 30.0];
// Check that manual calculation produces correct results
let manual_means: Vec<f64> = df_manual
.column("mean_manual")
.unwrap()
.f64()
.unwrap()
.into_no_null_iter()
.collect();
assert_eq!(
manual_means, expected_means,
"Manual SUM/ROW_NUMBER calculation should produce cumulative average"
);
// Check that AVG OVER produces the same results (currently fails)
let avg_means: Vec<f64> = df_avg
.column("mean_avg")
.unwrap()
.f64()
.unwrap()
.into_no_null_iter()
.collect();
assert_eq!(
avg_means, expected_means,
"AVG() OVER (ORDER BY ... ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) should produce cumulative average"
);
}Log output
Issue description
AVG function in SQL engine does not have a cumulative option and therefore does not work with windows (returning the average of all rows instead). There's a workaround with sum and row_number over the same window which works, but I believe fixing avg is worth too.
Expected behavior
Queries in the example return the same data.
Installed versions
Details
working with a master version of polars directly in polars-sql crateReactions are currently unavailable
Metadata
Metadata
Assignees
Labels
A-sqlArea: Polars SQL functionalityArea: Polars SQL functionalitybugSomething isn't workingSomething isn't workingneeds triageAwaiting prioritization by a maintainerAwaiting prioritization by a maintainerrustRelated to Rust PolarsRelated to Rust Polars