feat(python/sedonadb): add DataFrame.drop by jiayuasu · Pull Request #871 · apache/sedona-db

jiayuasu · 2026-05-23T07:41:49Z

Continues Phase P2 of #791 with DataFrame.drop — the smallest of the remaining P2 ops.

API

df.drop("a")
df.drop("a", "b", "c")

Varargs of column names (no columns= kwarg). Same pattern as sort() from feat(python/sedonadb): add DataFrame.sort with composable SortExpr #859 — matches DataFusion-python / Ibis / Polars, avoids the pandas-style keyword.
Strings only. Expr arguments are rejected at the Python boundary; "drop a computed expression" has no meaning at the schema level.

Unknown-column behavior

Worth calling out — DataFusion's DataFrame::drop_columns is permissive: it silently no-ops on names that aren't in the schema. That hides typos. To match user expectations (and pandas' KeyError behavior), this PR validates the column names Python-side and raises a KeyError listing the available columns when one is missing. The exact message format is pinned by test_drop_unknown_column_raises_keyerror.

This is more restrictive than select/filter (where DataFusion validates at plan-build time and includes "Valid fields are X, Y" in the error). The asymmetry is forced by DataFusion's silent-no-op default on drop_columns; the workaround is one Python-side schema lookup per call, which is cheap.

Implementation

File	Change
`python/sedonadb/src/dataframe.rs`	New `InternalDataFrame::drop_columns(Vec<String>)`. Materializes a `Vec<&str>` and calls DataFusion's `DataFrame::drop_columns`. Step-by-step comments.
`python/sedonadb/python/sedonadb/dataframe.py`	`DataFrame.drop(*cols: str)`. Validates non-empty, str-only, and known columns.

Test plan

9 tests in tests/expr/test_dataframe_drop.py:

Positive: single-column, multi-column, column-order preservation.
Lazy return: isinstance(out, DataFrame).
Errors: empty args → ValueError; non-str arg → TypeError; Expr arg → TypeError; columns= kwarg → Python's unexpected-keyword TypeError; unknown column → KeyError with exact pinned message listing available columns.

Local: 9 unit + 19 doctests + ruff format + ruff check all clean.

Pandas-style column drop on the lazy DataFrame, matching the varargs/non-pandas-keyword pattern locked in the sort PR. API: df.drop("a") df.drop("a", "b", "c") - Varargs of column names. No `columns=` kwarg; Python's standard unexpected-keyword TypeError covers misuse. - Strings only. Expr arguments are rejected at the Python boundary — drop is a schema op, not an expression op, and `df.drop(col("x") + col("y"))` has no meaning. - Empty args raise ValueError; non-str args raise TypeError. Unknown-column behavior: DataFusion's `drop_columns` is permissive and silently no-ops on names that aren't in the schema, which hides typos. We validate Python-side and raise a `KeyError` listing the available columns instead — matching pandas. The exact KeyError message is locked by `test_drop_unknown_column_raises_keyerror`. Rust side: `InternalDataFrame::drop_columns` is a thin wrapper that materializes a `Vec<&str>` and calls DataFusion's `DataFrame::drop_columns`. Step-by-step comments explain why we accept owned strings from Python and borrow at the call boundary. Tests cover single-column, multi-column, column-order preservation, lazy return, both error paths (empty / non-str), Expr-arg rejection, the kwarg rejection, and the typo-protecting KeyError.

github-actions Bot requested a review from prantogg May 23, 2026 07:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(python/sedonadb): add DataFrame.drop#871

feat(python/sedonadb): add DataFrame.drop#871
jiayuasu wants to merge 1 commit into
apache:mainfrom
jiayuasu:feature/df-drop

jiayuasu commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jiayuasu commented May 23, 2026

API

Unknown-column behavior

Implementation

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant