How many data points needed for the model? #1391

jayjoshi33 · 2025-12-30T07:42:28Z

jayjoshi33
Dec 30, 2025

Looking for practical numeric guidance on model complexity vs data sufficiency in Meridian.

Recommended data points per effect for:
- National-level models
- Geo-level models
Is this interpretation correct?

data per effect ≈ #time periods ÷ (#media channels + #controls + #AKS knots)
What counts as an effect?

Should non-media controls (seasonality, holidays, trends) be counted?
Should organic variables be included?
Should include reach and frequency channel in above data per effect equation?

Overfitting with AKS

AKS improves fit metrics, but increases flexibility.

What are recommended signals or diagnostics to detect overfitting beyond better fit
(e.g., unstable ROIs, implausible response curves, posterior issues)?

Any rule-of-thumb numbers (e.g., 5–10–15 data points per effect, knot caps) would be very helpful for production use.

Thanks!

develper21 · 2025-12-31T10:48:45Z

develper21
Dec 31, 2025

Subject: Practical Guidance on Data Points, Effects, and Overfitting Diagnostics

Hi @jayjoshi33,

Great questions. While Meridian leverages priors to handle data sparsity better than frequentist methods, having sufficient signal-to-noise ratio is crucial for the model to learn from data rather than just reproducing your priors.

1. Recommended Data Points per Effect

Your interpretation of the ratio is directionally correct, but it’s helpful to be precise about what constitutes "Data Points" () versus "Effects" ().

The Formula:
Ratio=Total Observations/Total Parameters (Effects)

For National Models:
For Geo-level Models:

Rule-of-Thumb Guidance:

Ideal: 10:1 to 15:1. (10 observations for every 1 parameter). This allows the likelihood (data) to dominate the posterior.
Minimum Viable: 5:1. Below this, your posterior results will likely be heavily constrained by your priors (shrinkage), and you risk the model failing to capture true causality.

2. What counts as an effect?

Yes, every independent variable that consumes a degree of freedom counts against your "budget."

Media Channels: Count as 1 major effect each (though internally they involve adstock/saturation parameters, treat them as the heavy lifters).
Controls: Seasonality, holidays, trends, and macroeconomic variables absolutely count.
Organic Variables: Yes, these count.
Knots (AKS): Yes. If you set n_knots=5, that is effectively adding complexity similar to adding 5 control variables.

Revised Equation:

3. Overfitting with AKS & Diagnostics

Diagnostics to detect Overfitting (beyond /MAPE):

Out-of-Sample (OOS) Metrics: This is the gold standard. If your model has excellent in-sample fit but high WAIC (Watanabe–Akaike information criterion) or poor Leave-One-Out Cross-Validation (LOO-CV) scores compared to a simpler model, you are overfitting.
Uncertainty Intervals (Credible Intervals): Look at the posterior intervals for the trend component. If the intervals bloom (become extremely wide) specifically where data is sparse or noisy, the model is uncertain.
Baseline vs. Media Attribution: Visual check. If your "Trend/Baseline" is extremely wiggly and seems to dip exactly when Media spend spikes, the AKS is likely "cannibalizing" the media effect. The baseline should be relatively smooth.
ROI Stability: As you mentioned, if running the model on slightly different time windows (e.g., removing the last 4 weeks) causes the ROI of a channel to jump from $2.0 to $0.5, the model is over-parameterized and unstable.

Hope this helps!

#https://developers.google.com/meridian/docs/pre-modeling/amount-data-needed
#https://arymalabs.com/how-much-data-you-need-for-mmm/
#https://developers.google.com/meridian/docs/advanced-modeling/setting-knots
#https://developers.google.com/meridian/docs/post-modeling/model-fit
#https://www.pymc-labs.com/blog-posts/bayesian-media-mix-modeling-for-marketing-optimization

0 replies

cpulavarthi · 2026-01-05T08:48:31Z

cpulavarthi
Jan 5, 2026

Hi @jayjoshi33,

Thank you for contacting us!

To answer your questions:

Your interpretation of the formula for National models is correct:
Data points per effect ≈ # Time Periods / (# Media Channels + # Controls + # Knots)
- National Models: We usually suggest that ~15 data points per effect allows for directional information, whereas ~4 is considered a low sample-size scenario.
- Geo-level Models: Due to partial pooling (information sharing across geos), the effective sample size is higher than the raw calculation. A lower raw ratio (e.g., ~8 data points per effect) can be viable for Geo models because the hierarchy allows the model to "borrow strength" across locations. [ See Amount of data needed for more info].
Data quality also impacts the amount of data needed per effect. How many data points are required per effect depends on your specific dataset and you can detect when the data you have is insufficient by performing post-modeling checks. Lack of prior-posterior shift is one such check, you can experiment with different sets of uninformative priors and see if the posteriors are converging consistently. If the posteriors obtained are dependent on priors, this suggests that the data is insufficient or doesn't have a strong enough signal in it.
You should count all variables that the model estimates parameters for:
- Non-media controls: Yes. Variables like seasonality, holidays, or query volume count as effects.
- Organic variables: Yes. Organic channels are treated as "Treatment Variables" (with Adstock/Hill parameters) similar to paid media.
- Reach & Frequency: Yes. Include these in your channel count.
Beyond fit metrics (R², MAPE), use these diagnostics to detect overfitting or misspecification:
- Holdout Performance: Use the holdout_id argument to split data (train/test). Significant divergence in metrics (e.g., wMAPE) between training and test sets indicates overfitting.
- Negative Baseline: An extremely negative baseline often signals that treatment effects (media) are getting too much credit (overestimation).
- ROI/Contribution Caps: If a channel’s contribution exceeds 100% of the observed outcome or ROIs are implausibly high, the model is likely over-flexible.
- Response Curves: Check for curves that saturate immediately or have erratic shapes, which can imply the model is fitting noise.

You may check our documentation on assessing the model fit results and baseline, for more information on the same.

Feel free to reach out if you have any questions or suggestions regarding the same.

Thank you,

Google Meridian Support Team

0 replies

TT-SOIAA · 2026-02-06T09:20:48Z

TT-SOIAA
Feb 6, 2026

Model complexity vs data sufficiency

Your intuition is directionally correct, but a better rule is:

Effective data per effect ≈
(# time points × # geos) ÷ total free parameters

Free parameters include:

Media channels

Control variables

Seasonality / holidays / trend

AKS knots

Rules of thumb

National model: 20–30 time points per effect

Geo / hierarchical model: 30–50 observations per effect

AKS always increases data requirements

What counts as an “effect”?

Yes, they all count:

Non-media factors (seasonality, holidays, trend)

Reach / frequency formulations

Organic variables (only if carefully modeled — otherwise they often absorb media signal)

If organic variables are included, data needs roughly double.

AKS overfitting signals

Watch for:

Unstable response curves across refits

Implausible saturation or elasticities

ROAS flipping with small data changes

Posteriors collapsing to priors

Safe defaults

≤ 3–4 AKS knots per channel

≥ 15–20 time points per knot

Keep effective obs : parameters ≥ 10:1

Final note

Most MMMs don’t fail at modeling — they fail earlier at data quality, structure, and labeling.

At SOIAA, we currently support teams with free data annotation and structuring for MMM inputs (geo splits, campaign metadata, controls), so models don’t overfit before they even start.

Happy to collaborate if data is your current bottleneck.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How many data points needed for the model? #1391

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How many data points needed for the model? #1391

Uh oh!

jayjoshi33 Dec 30, 2025

Replies: 3 comments

Uh oh!

develper21 Dec 31, 2025

1. Recommended Data Points per Effect

2. What counts as an effect?

3. Overfitting with AKS & Diagnostics

Uh oh!

cpulavarthi Jan 5, 2026

Uh oh!

TT-SOIAA Feb 6, 2026

jayjoshi33
Dec 30, 2025

develper21
Dec 31, 2025

cpulavarthi
Jan 5, 2026

TT-SOIAA
Feb 6, 2026