How many data points needed for the model? #1391
Replies: 3 comments
-
|
Subject: Practical Guidance on Data Points, Effects, and Overfitting Diagnostics Hi @jayjoshi33, Great questions. While Meridian leverages priors to handle data sparsity better than frequentist methods, having sufficient signal-to-noise ratio is crucial for the model to learn from data rather than just reproducing your priors. 1. Recommended Data Points per EffectYour interpretation of the ratio is directionally correct, but it’s helpful to be precise about what constitutes "Data Points" () versus "Effects" (). The Formula:
Rule-of-Thumb Guidance:
2. What counts as an effect?Yes, every independent variable that consumes a degree of freedom counts against your "budget."
Revised Equation: 3. Overfitting with AKS & DiagnosticsDiagnostics to detect Overfitting (beyond /MAPE):
Hope this helps! #https://developers.google.com/meridian/docs/pre-modeling/amount-data-needed
|
Beta Was this translation helpful? Give feedback.
-
|
Hi @jayjoshi33, Thank you for contacting us! To answer your questions:
You may check our documentation on assessing the model fit results and baseline, for more information on the same. Feel free to reach out if you have any questions or suggestions regarding the same. Thank you, Google Meridian Support Team |
Beta Was this translation helpful? Give feedback.
-
Your intuition is directionally correct, but a better rule is: Effective data per effect ≈ Free parameters include: Media channels Control variables Seasonality / holidays / trend AKS knots Rules of thumb National model: 20–30 time points per effect Geo / hierarchical model: 30–50 observations per effect AKS always increases data requirements
Yes, they all count: Non-media factors (seasonality, holidays, trend) Reach / frequency formulations Organic variables (only if carefully modeled — otherwise they often absorb media signal) If organic variables are included, data needs roughly double.
Watch for: Unstable response curves across refits Implausible saturation or elasticities ROAS flipping with small data changes Posteriors collapsing to priors Safe defaults ≤ 3–4 AKS knots per channel ≥ 15–20 time points per knot Keep effective obs : parameters ≥ 10:1 Final note Most MMMs don’t fail at modeling — they fail earlier at data quality, structure, and labeling. At SOIAA, we currently support teams with free data annotation and structuring for MMM inputs (geo splits, campaign metadata, controls), so models don’t overfit before they even start. Happy to collaborate if data is your current bottleneck. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Looking for practical numeric guidance on model complexity vs data sufficiency in Meridian.
Recommended data points per effect for:
Is this interpretation correct?
data per effect ≈ #time periods ÷ (#media channels + #controls + #AKS knots)
What counts as an effect?
Overfitting with AKS
AKS improves fit metrics, but increases flexibility.
What are recommended signals or diagnostics to detect overfitting beyond better fit
(e.g., unstable ROIs, implausible response curves, posterior issues)?
Any rule-of-thumb numbers (e.g., 5–10–15 data points per effect, knot caps) would be very helpful for production use.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions