Skip to content

Narwhals support for CLV aggregation#1809

Draft
williambdean wants to merge 6 commits intomainfrom
narwhals-support
Draft

Narwhals support for CLV aggregation#1809
williambdean wants to merge 6 commits intomainfrom
narwhals-support

Conversation

@williambdean
Copy link
Contributor

@williambdean williambdean commented Jul 3, 2025

Description

Still a work in progress.

The LazyFrame like libraries will require a provided observation_end_date. However, that can be found outside of the

Still building out the functionality for the:

  • remove first observation - mean with nans might differ in the backends

Related Issue

Checklist


📚 Documentation preview 📚: https://pymc-marketing--1809.org.readthedocs.build/en/1809/

@codecov
Copy link

codecov bot commented Jul 3, 2025

Codecov Report

❌ Patch coverage is 26.31579% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 28.71%. Comparing base (d4a90fc) to head (4991fc8).

Files with missing lines Patch % Lines
pymc_marketing/clv/utils.py 26.31% 14 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (d4a90fc) and HEAD (4991fc8). Click for more details.

HEAD has 16 uploads less than BASE
Flag BASE (d4a90fc) HEAD (4991fc8)
23 7
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1809       +/-   ##
===========================================
- Coverage   92.51%   28.71%   -63.81%     
===========================================
  Files          71       71               
  Lines       10279    10297       +18     
===========================================
- Hits         9510     2957     -6553     
- Misses        769     7340     +6571     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@juanitorduz
Copy link
Collaborator

OMG! 100% yes! 🥳

@ColtAllen
Copy link
Collaborator

ColtAllen commented Jul 15, 2025

OMG! 100% yes! 🥳

Indeed, thanks for starting this!

Do you think the current pandas functions should still be retained for a time even after this is merged? Also, it seems like the PR description in the original message requires more details.

@williambdean
Copy link
Contributor Author

Do you think the current pandas functions should still be retained for a time even after this is merged? Also, it seems like the PR description in the original message requires more details.

I was just doing some comparisons of the two at the moment. However, I think that the new one should just take it's place.

@williambdean
Copy link
Contributor Author

Maybe @ColtAllen is interested in taking this over?

Copy link

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @williambdean 👋🏼 I just find out this PR 🔥 left a couple of comments that might help😇

Comment on lines +306 to +307
if observation_period_end is None:
observation_period_end = transactions[datetime_col].cast(nw.Datetime).max()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this would work/is supported, but you might try to do:

Suggested change
if observation_period_end is None:
observation_period_end = transactions[datetime_col].cast(nw.Datetime).max()
if observation_period_end is None:
observation_period_end = pl.col("max").max()

to get the global max datetime value.

This might also help to avoid this requirement:

The LazyFrame like libraries will require a provided observation_end_date. However, that can be found outside of the

) -> IntoFrameT:
transactions = nw.from_native(transactions)

date = nw.col(datetime_col).cast(nw.Datetime)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very tempting, but consider creating a new column between operations - I would be afraid that for pandas the casting happens multiple times instead of once


customers = (
nw.from_native(repeated_transactions)
.group_by(customer_id_col)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some time now, it should be possible to pass an expression so that you can avoid the renaming down in the pipeline, but it's definitely more of a personal preference 😇

Suggested change
.group_by(customer_id_col)
.group_by(nw.col(customer_id_col).alias("customer_id"))

@ColtAllen
Copy link
Collaborator

ColtAllen commented Dec 26, 2025

Maybe @ColtAllen is interested in taking this over?

Yes - I wrote all the CLV agg utilities, and will reach out to @FBruzzesi as needed to drive this one home. We're still roadmapping what will be included in version 1.0, but I think this could be a great addition to it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants