-
Hey team, So I tried to fit the data using BG/NBD and Pareto/NBD models and it looks like it's not fitting well for reasons I don't quite understand I thought the issue might be in an extremely heavy tailed distribution of frequencies but it doesn't look like it The posterior predictive check also looks good What might be the issue and how should I approach it in your opinion? Thanks in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 9 replies
-
Thanks for sharing.
|
Beta Was this translation helpful? Give feedback.
-
Hey @lixgl, Looks like your model was only fit to the past year of data, but I see transactions going back more than three years in the CSV you provided. Excluding those previous years will bias results. I also see some strong weekly trends in your graph, so you may want to aggregate by weeks rather than days instead. On that note, is there a reason why you didn't use
This is correct. The transaction models assume a lot of non-repeat customers, so excluding them isn't recommended. |
Beta Was this translation helpful? Give feedback.
-
@lixgl Since you're using the whole data set on your training, are you validating your forecast against a holdout set that is not included in the data you uploaded? @ColtAllen Thanks for this clarification here:
The tutorials for pymc-marketing are consistent but don't necessarily call this out explicitly. This may merit a separate conversation, but I've encountered situations where fitting a model is met with a divide by zero error and they are always fixed by removing customer records with 0 frequency or 0 monetary_value. Is there guidance on how to avoid the divide by zero error otherwise? |
Beta Was this translation helpful? Give feedback.
So the first plot is just for the first year of results, then?
Aggregating by week would only rescale the T and recency variables (i.e., T=52 instead of T=365). It would not change the size of the dataset. To reduce the amount of effort required, I would suggest using
clv.rfm_summary
for this.