Skip to content

feat(DRAFT): Add Expr.sort_by #2547

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 29 commits into
base: main
Choose a base branch
from
Draft

feat(DRAFT): Add Expr.sort_by #2547

wants to merge 29 commits into from

Conversation

dangotbanned
Copy link
Member

Closes #2534

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

Important

Very early version, eager only, but solves the issue in (#2528 (comment))

  • Pushing this mostly as a proof-of-concept.
  • Still need to work through how this integrates with the lazy .over(order_by=...) rule

- Towards #2534
- `Series.sort_by` still doesn't exist
- Skipping 3 examples that we can't do yet
- Probably best to trim this down later
- Just repurposed `test_sort_expr`
- Wonder if we want to allow sorting by overlapping columns?
Need to flesh out the cases more, but the structure is good for now
Prrety sure `test_sort_by_self` could work, but the simpler case is already handled
Redefined `ArrowExpr.sort_by` in terms that could be used in `EagerExpr`
Reusing other high-level apis has the nice benefit of error handling for free 😏
Exactly the behavior needed for `first()` #2528 (comment)
@dangotbanned dangotbanned added the enhancement New feature or request label May 13, 2025
@dangotbanned
Copy link
Member Author

dangotbanned commented May 15, 2025

If anyone comes across this, I'm putting on a pause on it to first explore:
(#2483 (comment)), (#2483 (comment)), (#2483 (comment)), (#2534 (comment))

Some very early notes:

  • Things are coming together quite quickly, since we have the rust impl of polars to draw from
  • A lot of the complexity of ExprMetadata, _expression_parsing.py is just capturing rules that would only appear on specific nodes/groups
    • We might only need a small subset of the options in (#polars/22573) to appear on every node/function

Hoping to have something more concrete to show off later this week πŸ™

For now, here's some fluffier guidance I'm trying to work towards

Brainstorming an Expr internal represention.

Notes

  • Each Expr method should be representable by a single node
    • But the node does not need to be unique to the method
  • A chain of Expr methods should form a plan of operations
  • We must be able to enforce rules on what plans are permitted:
    • Must be flexible to both eager/lazy and individual backends
    • Must be flexible to a given context (select, with_columns, filter, group_by)
  • Nodes & plans are:
    • Immutable, but
      • Can be extended/re-written at both the Narwhals & Compliant levels
    • Introspectable, but
      • Store as little-as-needed for the common case
      • Provide properties/methods for computing the less frequent metadata

References

Related

WIP branch

(https://github.com/narwhals-dev/narwhals/tree/oh-nodes)

dangotbanned added a commit that referenced this pull request May 15, 2025
dangotbanned added a commit that referenced this pull request May 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enh]: Add Expr.sort_by
1 participant