[Back-end]: Replace pandas with polars #1402

jorgenherje · 2025-12-18T13:09:29Z

Remove usage of Pandas and replace usage with Polars

Note this PR does not replace for flow network access+service layer. It is handled in the following PR: #1395

Replace Pandas with Polars in:

Summary vector statistics calculation
Parameter utils
PVT converter

TODO:

Have to verify statistics calculations/aggregation for Polars compared to old version with Pandas+numpy
Remove Pandas from lib/services/pyproject.toml if [Fix]: Create correct flow network per tree types, and make tree type selectable in front-end #1395 is merged to main first.

Closes: #1401

- Summary vector statistics calc - Parameter utils - PVT converter

HansKallekleiv

👍
Approved, with a few comments after reading up on Polars.
Probably not important as everything is fairly fast, but might be worth thinking about at some point.
I think we should try to avoid things like loops and lamdas when working with Polars, as it forces serializing between rust and python.

backend_py/libs/services/src/webviz_services/sumo_access/parameter_access.py

backend_py/libs/services/src/webviz_services/summary_vector_statistics.py

backend_py/primary/primary/routers/pvt/converters.py

jorgenherje · 2025-12-19T13:30:14Z

During testing of the statistics calculations I found deviation between results.

Have to verify statistics calculations/aggregation for Polars compared to old version with Pandas+numpy. Old verison uses numpy.nanpercentile, numpy.nanmean, etc for calculating statistics for the summary vectors. The new Polars version has pl.col().percentile(), pl.col().mean() etc. These has numerical difference, and in some cases the mean calc seem to vary a lot. Testing states that this is not due to downcast from f64 to f32, but points towards difference in algorithms for mean and percentile?

@sigurdp recalls that numpy calc was used rather than Pandas' own statistics due to how the algorithms worked. Perhaps this yields for Polars as well? If percentiles are calculated using the entire array or estimating parts of the data?

jorgenherje · 2026-01-06T10:01:54Z

Further testing shows that when using numpy the aggregation methods are performed with same input format as the data. I.e. if the input is float32, the mean and percentiles are found using float32. This provides numerical inaccuracy compared to polars which seems to cast the data to float64 internally, and cast back to same format as input.

Both polars and numpy aggregation is tested with input table with float32 and float64. During testing polars gives same numerical result for both float precisions , whereas numpy results differentiate when input format is float32 and float64. This is stated to be due to the fact that the actual mean calc is performed using float64 even if input is float32 when using Polars.

During testing, I got same results using Polars and Pandas+numpy if i casted the array to float64 before aggregating statistics for the Pandas-algorithm.

Conclusion:
Use Polars, and do not cast to float64, as the aggregation methods seems to handle it internally.

Replace pandas with polars

1da0c2c

- Summary vector statistics calc - Parameter utils - PVT converter

jorgenherje self-assigned this Dec 18, 2025

jorgenherje requested review from HansKallekleiv, rubenthoms and sigurdp December 18, 2025 13:09

jorgenherje added 2025 EOY release enhancement New feature or request labels Dec 18, 2025

Fix linting/formatting

dae6869

jorgenherje marked this pull request as ready for review December 18, 2025 13:37

jorgenherje mentioned this pull request Dec 18, 2025

[Fix]: Create correct flow network per tree types, and make tree type selectable in front-end #1395

Open

1 task

Add downcast of float64

e0260ab

HansKallekleiv approved these changes Dec 19, 2025

View reviewed changes

jorgenherje marked this pull request as draft December 19, 2025 08:14

Adjust according to review

86f311d

jorgenherje added 2 commits January 6, 2026 11:07

Minor adjustment of statistics after testing and verification

0558f39

Adjust doc for readability

2aa2322

jorgenherje marked this pull request as ready for review January 6, 2026 12:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Back-end]: Replace pandas with polars #1402

[Back-end]: Replace pandas with polars #1402

jorgenherje commented Dec 18, 2025 •

edited

Loading

Uh oh!

HansKallekleiv left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jorgenherje commented Dec 19, 2025 •

edited

Loading

Uh oh!

jorgenherje commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Back-end]: Replace pandas with polars #1402

Are you sure you want to change the base?

[Back-end]: Replace pandas with polars #1402

Conversation

jorgenherje commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HansKallekleiv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jorgenherje commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorgenherje commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jorgenherje commented Dec 18, 2025 •

edited

Loading

jorgenherje commented Dec 19, 2025 •

edited

Loading