-
Notifications
You must be signed in to change notification settings - Fork 121
Enable ruff pandas ruleset #12546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable ruff pandas ruleset #12546
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #12546 +/- ##
==========================================
- Coverage 90.62% 90.61% -0.02%
==========================================
Files 432 432
Lines 29738 29739 +1
==========================================
- Hits 26951 26948 -3
- Misses 2787 2791 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
CodSpeed Performance ReportMerging #12546 will not alter performanceComparing Summary
|
* What it does Checks for uses of `.values` on Pandas Series and Index objects. * Why is this bad? The `.values` attribute is ambiguous as its return type is unclear. As such, it is no longer recommended by the Pandas documentation. Instead, use `.to_numpy()` to return a NumPy array, or `.array` to return a Pandas `ExtensionArray`.
* What it does Checks for `inplace=True` usages in `pandas` function and method calls. * Why is this bad? Using `inplace=True` encourages mutation rather than immutable data, which is harder to reason about and may cause bugs. It also removes the ability to use the method chaining style for `pandas` operations. Further, in many cases, `inplace=True` does not provide a performance benefit, as `pandas` will often copy `DataFrames` in the background.
* What it does Checks for uses of `pd.merge` on Pandas objects. * Why is this bad? In Pandas, the `.merge` method (exposed on, e.g., `DataFrame` objects) and the `pd.merge` function (exposed on the Pandas module) are equivalent. For consistency, prefer calling `.merge` on an object over calling `pd.merge` on the Pandas module, as the former is more idiomatic. Further, `pd.merge` is not a method, but a function, which prohibits it from being used in method chains, a common pattern in Pandas code.
* What it does
Check for uses of `.nunique()` to check if a Pandas Series is constant
(i.e., contains only one unique value).
* Why is this bad?
`.nunique()` is computationally inefficient for checking if a Series is
constant.
Consider, for example, a Series of length `n` that consists of increasing
integer values (e.g., 1, 2, 3, 4). The `.nunique()` method will iterate
over the entire Series to count the number of unique values. But in this
case, we can detect that the Series is non-constant after visiting the
first two values, which are non-equal.
In general, `.nunique()` requires iterating over the entire Series, while a
more efficient approach allows short-circuiting the operation as soon as a
non-equal value is found.
Instead of calling `.nunique()`, convert the Series to a NumPy array, and
check if all values in the array are equal to the first observed value.
```python
import pandas as pd
data = pd.Series(range(1000))
if data.nunique() <= 1:
print("Series is constant")
```
Use instead:
```python
import pandas as pd
data = pd.Series(range(1000))
array = data.to_numpy()
if array.shape[0] == 0 or (array[0] == array).all():
print("Series is constant")
```
- [Pandas Cookbook: "Constant Series"](https://pandas.pydata.org/docs/user_guide/cookbook.html#constant-series)
- [Pandas documentation: `nunique`](https://pandas.pydata.org/docs/reference/api/pandas.Series.nunique.html)
Unfortunately there is a bug in ruff where it thinks pl.DataFrame is a pd.Dataframe
# pandas-use-of-dot-pivot-or-unstack (PD010)
Derived from the **pandas-vet** linter.
## What it does
Checks for uses of `.pivot` or `.unstack` on Pandas objects.
## Why is this bad?
Prefer `.pivot_table` to `.pivot` or `.unstack`. `.pivot_table` is more general
and can be used to implement the same behavior as `.pivot` and `.unstack`.
## Example
```python
import pandas as pd
df = pd.read_csv("cities.csv")
df.pivot(index="city", columns="year", values="population")
```
Use instead:
```python
import pandas as pd
df = pd.read_csv("cities.csv")
df.pivot_table(index="city", columns="year", values="population")
```
## References
- [Pandas documentation: Reshaping and pivot tables](https://pandas.pydata.org/docs/user_guide/reshaping.html)
- [Pandas documentation: `pivot_table`](https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html#pandas.pivot_table)
Recommended by ruff rule PD010 This change has been verified to give exactly the same dataframe in all invocations of this test function in the test suite.
Unfortunately with bugs in ruff (e.g. 2143) that requires noqa statements where they really should not be - ruff mistakes polars objects as pandas objects.
andreas-el
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
Issue
Resolves Ruff pandas violations
Merge #12545 first!Approach
Fix or ignore in the case of false positives (ruff bugs)
git rebase -i main --exec 'just rapid-tests')When applicable