Skip to content

fix: Treat series literals as list type #4370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 30, 2025
Merged

Conversation

colin-ho
Copy link
Contributor

@colin-ho colin-ho commented May 17, 2025

Changes Made

For series literals created from public api, wrap them in a list array. This is so that we don't get any broadcasting issues.
Additionally adds the ability to create List literals, i.e. daft.lit([1,2,3])

Related Issues

Fixes #3287 and #1992

Checklist

  • Documented in API Docs (if applicable)
  • Documented in User Guide (if applicable)
  • If adding a new documentation page, doc is added to docs/mkdocs.yml navigation
  • Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

@github-actions github-actions bot added the fix label May 17, 2025
@colin-ho colin-ho force-pushed the colin/hide-series-lit branch from 25990e6 to b55f7a1 Compare May 17, 2025 01:43
Copy link

codecov bot commented May 17, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (main@b606d6c). Learn more about missing BASE report.
Report is 22 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #4370   +/-   ##
=======================================
  Coverage        ?   77.50%           
=======================================
  Files           ?      846           
  Lines           ?   114644           
  Branches        ?        0           
=======================================
  Hits            ?    88856           
  Misses          ?    25788           
  Partials        ?        0           
Files with missing lines Coverage Δ
daft/expressions/expressions.py 96.85% <100.00%> (ø)
daft/series.py 91.92% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@colin-ho colin-ho requested a review from kevinzwang May 19, 2025 16:46
Copy link
Member

@kevinzwang kevinzwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that series to literal conversion is problematic right now, however I am worried that disabling/hiding it would be a breaking change.

Do you know where series literals are actually being used? Should we just treat them as list type?

@colin-ho
Copy link
Contributor Author

colin-ho commented May 27, 2025

Agreed that series to literal conversion is problematic right now, however I am worried that disabling/hiding it would be a breaking change.

Do you know where series literals are actually being used? Should we just treat them as list type?

The series lists are used kinda everywhere.

On the expressions side, they are used in is_in and count_matches, ofc as well as just being able to do daft.lit(series).

They are also used internally, such as for adding generated / partition fields on reads (i.e. adding a column of partition values from catalog / adding a column for file path) and in the SQL parser for tuples.

To me, the main benefit of this is that we can wrap series in expressions, such that when we eval the expression (in functions that expect expressions) it just returns the series.

Tbh the main problem i just wanna solve is i don't want user to do with_column("foo", daft.lit(series)) and then it blows up cuz of length mismatch.

If they are tryna broadcast a list, something like df.with_column("foo", daft.lit([1,2,3])) can be done with our Expr::List instead of series lit, which is possible today.

@colin-ho
Copy link
Contributor Author

Just adding on, i think the current behavior of doing with_column("foo", daft.lit(series)), and treating that series as a new column must be removed. It's very jank, especially in a distributed setting.

Like u said, instead of removing it, we can keep it the ability to do daft.lit(series), and instead wrap it in a list / fixed size list, which is ur suggestion: #3287

Either way it will be breaking change tho because we are changing the behavior (i guess this can be considered a bug fix too? lets just err on the safe side and put this in 0.5)

@colin-ho colin-ho force-pushed the colin/hide-series-lit branch from c08c846 to f5128e5 Compare May 27, 2025 02:10
@colin-ho colin-ho changed the title fix: Hide series literals fix: Treat series literals as list type May 28, 2025
@colin-ho colin-ho requested a review from kevinzwang May 29, 2025 20:29
@colin-ho colin-ho merged commit 25ee6fb into main May 30, 2025
47 checks passed
@colin-ho colin-ho deleted the colin/hide-series-lit branch May 30, 2025 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

daft.lit(val) where val is of type daft.Series has incorrect behavior
2 participants