Skip to content

Conversation

@berland
Copy link
Contributor

@berland berland commented Dec 23, 2025

Issue
Resolves Ruff pandas violations

Merge #12545 first!

Approach
Fix or ignore in the case of false positives (ruff bugs)

  • PR title captures the intent of the changes, and is fitting for release notes.
  • Added appropriate release note label
  • Commit history is consistent and clean, in line with the contribution guidelines.
  • Make sure unit tests pass locally after every commit (git rebase -i main --exec 'just rapid-tests')

When applicable

  • When there are user facing changes: Updated documentation
  • New behavior or changes to existing untested code: Ensured that unit tests are added (See Ground Rules).
  • Large PR: Prepare changes in small commits for more convenient review
  • Bug fix: Add regression test for the bug
  • Bug fix: Add backport label to latest release (format: 'backport release-branch-name')

@berland berland self-assigned this Dec 23, 2025
@berland berland added the release-notes:maintenance Automatically categorise as maintenance change in release notes label Dec 23, 2025
@berland berland added this to SCOUT Dec 23, 2025
@berland berland moved this to In Progress in SCOUT Dec 23, 2025
@codecov-commenter
Copy link

codecov-commenter commented Dec 23, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.61%. Comparing base (dc78922) to head (e01a714).

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #12546      +/-   ##
==========================================
- Coverage   90.62%   90.61%   -0.02%     
==========================================
  Files         432      432              
  Lines       29738    29739       +1     
==========================================
- Hits        26951    26948       -3     
- Misses       2787     2791       +4     
Flag Coverage Δ
cli-tests 37.58% <8.33%> (-0.01%) ⬇️
gui-tests 68.69% <50.00%> (+<0.01%) ⬆️
performance-and-unit-tests 74.12% <75.00%> (-0.01%) ⬇️
test 38.32% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@codspeed-hq
Copy link

codspeed-hq bot commented Dec 23, 2025

CodSpeed Performance Report

Merging #12546 will not alter performance

Comparing berland:ruff_pd (e01a714) with main (dc78922)

Summary

✅ 22 untouched

* What it does
Checks for uses of `.values` on Pandas Series and Index objects.

* Why is this bad?
The `.values` attribute is ambiguous as its return type is unclear. As
such, it is no longer recommended by the Pandas documentation.

Instead, use `.to_numpy()` to return a NumPy array, or `.array` to return a
Pandas `ExtensionArray`.
* What it does
Checks for `inplace=True` usages in `pandas` function and method
calls.

* Why is this bad?
Using `inplace=True` encourages mutation rather than immutable data,
which is harder to reason about and may cause bugs. It also removes the
ability to use the method chaining style for `pandas` operations.

Further, in many cases, `inplace=True` does not provide a performance
benefit, as `pandas` will often copy `DataFrames` in the background.
* What it does
Checks for uses of `pd.merge` on Pandas objects.

* Why is this bad?
In Pandas, the `.merge` method (exposed on, e.g., `DataFrame` objects) and
the `pd.merge` function (exposed on the Pandas module) are equivalent.

For consistency, prefer calling `.merge` on an object over calling
`pd.merge` on the Pandas module, as the former is more idiomatic.

Further, `pd.merge` is not a method, but a function, which prohibits it
from being used in method chains, a common pattern in Pandas code.
* What it does
Check for uses of `.nunique()` to check if a Pandas Series is constant
(i.e., contains only one unique value).

* Why is this bad?
`.nunique()` is computationally inefficient for checking if a Series is
constant.

Consider, for example, a Series of length `n` that consists of increasing
integer values (e.g., 1, 2, 3, 4). The `.nunique()` method will iterate
over the entire Series to count the number of unique values. But in this
case, we can detect that the Series is non-constant after visiting the
first two values, which are non-equal.

In general, `.nunique()` requires iterating over the entire Series, while a
more efficient approach allows short-circuiting the operation as soon as a
non-equal value is found.

Instead of calling `.nunique()`, convert the Series to a NumPy array, and
check if all values in the array are equal to the first observed value.

```python
import pandas as pd

data = pd.Series(range(1000))
if data.nunique() <= 1:
    print("Series is constant")
```

Use instead:
```python
import pandas as pd

data = pd.Series(range(1000))
array = data.to_numpy()
if array.shape[0] == 0 or (array[0] == array).all():
    print("Series is constant")
```

- [Pandas Cookbook: "Constant Series"](https://pandas.pydata.org/docs/user_guide/cookbook.html#constant-series)
- [Pandas documentation: `nunique`](https://pandas.pydata.org/docs/reference/api/pandas.Series.nunique.html)
Unfortunately there is a bug in ruff where it thinks pl.DataFrame is a pd.Dataframe

# pandas-use-of-dot-pivot-or-unstack (PD010)

Derived from the **pandas-vet** linter.

## What it does
Checks for uses of `.pivot` or `.unstack` on Pandas objects.

## Why is this bad?
Prefer `.pivot_table` to `.pivot` or `.unstack`. `.pivot_table` is more general
and can be used to implement the same behavior as `.pivot` and `.unstack`.

## Example
```python
import pandas as pd

df = pd.read_csv("cities.csv")
df.pivot(index="city", columns="year", values="population")
```

Use instead:
```python
import pandas as pd

df = pd.read_csv("cities.csv")
df.pivot_table(index="city", columns="year", values="population")
```

## References
- [Pandas documentation: Reshaping and pivot tables](https://pandas.pydata.org/docs/user_guide/reshaping.html)
- [Pandas documentation: `pivot_table`](https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html#pandas.pivot_table)
Recommended by ruff rule PD010

This change has been verified to give exactly the same dataframe in all
invocations of this test function in the test suite.
Unfortunately with bugs in ruff (e.g. 2143) that requires noqa
statements where they really should not be - ruff mistakes polars
objects as pandas objects.
@berland berland moved this from In Progress to Ready for Review in SCOUT Dec 29, 2025
Copy link
Contributor

@andreas-el andreas-el left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@github-project-automation github-project-automation bot moved this from Ready for Review to Reviewed in SCOUT Jan 6, 2026
@berland berland merged commit 2a0015b into equinor:main Jan 6, 2026
35 checks passed
@github-project-automation github-project-automation bot moved this from Reviewed to Done in SCOUT Jan 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes:maintenance Automatically categorise as maintenance change in release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants