Skip to content

Commit d838554

Browse files
committed
feat: new article
1 parent ad01931 commit d838554

4 files changed

+350
-10
lines changed

_posts/-_ideas/2030-01-01-future_articles_time_series.md

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -25,16 +25,6 @@ Here are several article ideas that would complement the ARIMAX time series mode
2525
- Provide examples and code implementation in Python. -->
2626

2727

28-
### 7. **"Prophet: A Modern Approach to Time Series Forecasting Developed by Facebook"**
29-
- Introduce the Prophet model developed by Facebook, which is designed to handle seasonality and holidays with ease.
30-
- Discuss how it differs from ARIMA, its ease of use for non-experts, and how it handles missing data and seasonality.
31-
- Provide code examples and case studies of its use in industry.
32-
33-
### 8. **"Evaluating Time Series Forecasting Models: Metrics and Best Practices"**
34-
- Discuss how to evaluate the performance of time series models using metrics like RMSE, MAE, AIC, BIC, and MAPE.
35-
- Explain model validation techniques such as cross-validation and out-of-sample testing.
36-
- Provide a guide to interpreting results and improving model performance.
37-
3828
### 9. **"Causality and Granger Causality in Time Series Data"**
3929
- Explore the concept of Granger Causality and how it is used to determine if one time series can predict another.
4030
- Provide mathematical explanations, real-world examples (e.g., economic indicators), and implementation in R or Python.
Lines changed: 334 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,334 @@
1+
---
2+
title: 'Evaluating Time Series Forecasting Models: Metrics and Best Practices'
3+
categories:
4+
- Time Series
5+
- Model Evaluation
6+
- Forecasting
7+
tags:
8+
- RMSE
9+
- MAE
10+
- MAPE
11+
- model validation
12+
- rolling-origin evaluation
13+
- forecasting metrics
14+
author_profile: false
15+
seo_title: Evaluating Time Series Forecasting Models
16+
seo_description: >-
17+
A comprehensive guide to evaluating time series forecasts using metrics like
18+
RMSE, MAE, AIC, and BIC. Learn validation strategies, practical coding
19+
examples, and best practices.
20+
excerpt: >-
21+
Effective model evaluation is essential for reliable time series forecasting.
22+
Learn the most important metrics, validation methods, and strategies for
23+
interpreting and improving forecasts.
24+
summary: >-
25+
This article explores the metrics and methods used to evaluate time series
26+
forecasting models. Covering RMSE, MAE, AIC/BIC, cross-validation techniques,
27+
and residual analysis, it helps practitioners ensure their models are robust,
28+
accurate, and actionable.
29+
keywords:
30+
- forecast accuracy
31+
- time series model evaluation
32+
- validation strategies
33+
- MAE and RMSE
34+
- forecasting best practices
35+
classes: wide
36+
date: '2025-08-03'
37+
header:
38+
image: /assets/images/data_science_9.jpg
39+
og_image: /assets/images/data_science_9.jpg
40+
overlay_image: /assets/images/data_science_9.jpg
41+
show_overlay_excerpt: false
42+
teaser: /assets/images/data_science_9.jpg
43+
twitter_image: /assets/images/data_science_9.jpg
44+
---
45+
46+
# Introduction
47+
48+
Time series forecasting is one of the most challenging yet essential tasks in data science, econometrics, engineering, and applied research. Accurate forecasts can drive business decisions, inform policy, optimize resources, and improve operational efficiency. However, building a forecasting model is only half the battle. The real challenge lies in **evaluating** the model's performance to ensure it is both reliable and actionable.
49+
50+
Evaluation of time series models requires a careful selection of performance metrics and validation techniques that account for the unique structure of time-dependent data. Unlike cross-sectional problems, time series data are ordered and often autocorrelated, meaning standard evaluation methods like random shuffling for cross-validation are not directly applicable.
51+
52+
This article provides a comprehensive overview of how to evaluate time series forecasting models. We discuss the most commonly used performance metrics such as RMSE, MAE, MAPE, AIC, and BIC, highlight best practices for model validation (including rolling-origin evaluation and time series cross-validation), and provide guidance on interpreting results for model improvement. Practical examples in Python and R are included for hands-on understanding.
53+
54+
# 1\. Importance of Evaluation in Time Series Forecasting
55+
56+
Evaluation ensures that forecasts are accurate, reliable, and robust under real-world conditions. Without proper evaluation:
57+
58+
- Models may appear accurate in-sample but fail out-of-sample.
59+
- Forecasts may be biased, systematically over- or under-predicting.
60+
- Models may overfit, capturing noise instead of true signals.
61+
62+
Therefore, evaluation is not just a technical step but a critical component of the forecasting process.
63+
64+
# 2\. Forecast Accuracy Metrics
65+
66+
Performance metrics quantify how well a model's predictions match the observed values. Below are the most widely used metrics for time series forecasting.
67+
68+
## 2.1 Mean Absolute Error (MAE)
69+
70+
The **MAE** measures the average absolute difference between forecasted and actual values:
71+
72+
$$ MAE = \frac{1}{n} \sum_{t=1}^n |y_t - \hat{y}_t| $$
73+
74+
- Easy to interpret (same units as the data).
75+
- Robust to outliers compared to squared-error metrics.
76+
77+
**Python Example:**
78+
79+
```python
80+
from sklearn.metrics import mean_absolute_error
81+
mae = mean_absolute_error(y_true, y_pred)
82+
```
83+
84+
**R Example:**
85+
86+
```r
87+
mae <- mean(abs(y_true - y_pred))
88+
```
89+
90+
## 2.2 Root Mean Squared Error (RMSE)
91+
92+
The **RMSE** is the square root of the average squared differences:
93+
94+
$$ RMSE = \sqrt{\frac{1}{n} \sum_{t=1}^n (y_t - \hat{y}_t)^2} $$
95+
96+
- Penalizes large errors more heavily than MAE.
97+
- Commonly used in competitions and benchmarks.
98+
99+
**Python Example:**
100+
101+
```python
102+
from sklearn.metrics import mean_squared_error
103+
import numpy as np
104+
rmse = np.sqrt(mean_squared_error(y_true, y_pred))
105+
```
106+
107+
**R Example:**
108+
109+
```r
110+
rmse <- sqrt(mean((y_true - y_pred)^2))
111+
```
112+
113+
## 2.3 Mean Absolute Percentage Error (MAPE)
114+
115+
The **MAPE** expresses errors as percentages:
116+
117+
$$ MAPE = \frac{100}{n} \sum_{t=1}^n \left| \frac{y_t - \hat{y}_t}{y_t} \right| $$
118+
119+
- Intuitive since it provides error in percentage terms.
120+
- Sensitive when actual values are near zero.
121+
122+
**Python Example:**
123+
124+
```python
125+
mape = np.mean(np.abs((y_true - y_pred) / y_true)) * 100
126+
```
127+
128+
**R Example:**
129+
130+
```r
131+
mape <- mean(abs((y_true - y_pred) / y_true)) * 100
132+
```
133+
134+
## 2.4 Symmetric Mean Absolute Percentage Error (sMAPE)
135+
136+
The **sMAPE** adjusts for asymmetry by dividing by the average of forecast and actual:
137+
138+
$$ sMAPE = \frac{100}{n} \sum_{t=1}^n \frac{|y_t - \hat{y}_t|}{(|y_t| + |\hat{y}_t|)/2} $$
139+
140+
- Avoids extreme errors when values are small.
141+
- Widely used in competitions like M3 and M4.
142+
143+
## 2.5 Mean Absolute Scaled Error (MASE)
144+
145+
Proposed by Hyndman & Koehler (2006), **MASE** compares forecast errors against a naive benchmark:
146+
147+
$$ MASE = \frac{MAE}{MAE_{naive}} $$
148+
149+
- Scale-independent and comparable across series.
150+
- Values > 1 mean worse than naive; < 1 means better.
151+
152+
## 2.6 Information Criteria: AIC and BIC
153+
154+
While MAE and RMSE measure predictive accuracy, **AIC** (Akaike Information Criterion) and **BIC** (Bayesian Information Criterion) assess model quality by balancing goodness of fit and complexity:
155+
156+
$$ AIC = 2k - 2\ln(L) $$
157+
158+
$$ BIC = k\ln(n) - 2\ln(L) $$
159+
160+
where $k$ is the number of parameters, $L$ the likelihood, and $n$ the number of observations.
161+
162+
- Lower AIC/BIC values indicate better models.
163+
- BIC penalizes complexity more strongly than AIC.
164+
165+
**Python Example:**
166+
167+
```python
168+
import statsmodels.api as sm
169+
model = sm.tsa.ARIMA(y, order=(1,1,1)).fit()
170+
print(model.aic, model.bic)
171+
```
172+
173+
**R Example:**
174+
175+
```r
176+
model <- arima(y, order=c(1,1,1))
177+
AIC(model); BIC(model)
178+
```
179+
180+
# 3\. Model Validation Techniques
181+
182+
Metrics alone are insufficient; how we validate the model matters equally. Standard random cross-validation is inappropriate for time series due to temporal dependencies.
183+
184+
## 3.1 Train-Test Split
185+
186+
Divide the series into training (first portion) and testing (last portion). Fit the model on training and evaluate forecasts on testing.
187+
188+
- Simple and intuitive.
189+
- Risk: results may depend heavily on split point.
190+
191+
## 3.2 Rolling-Origin Evaluation (Walk-Forward Validation)
192+
193+
Iteratively expand the training set forward in time, testing on the next observation or small batch. Repeat until the end of the series.
194+
195+
- Mimics real-world forecasting where new data arrives sequentially.
196+
- Provides multiple evaluation points.
197+
198+
**Python Example:**
199+
200+
```python
201+
from sklearn.model_selection import TimeSeriesSplit
202+
203+
ts_cv = TimeSeriesSplit(n_splits=5)
204+
for train_idx, test_idx in ts_cv.split(X):
205+
X_train, X_test = X[train_idx], X[test_idx]
206+
y_train, y_test = y[train_idx], y[test_idx]
207+
# fit and evaluate model here
208+
```
209+
210+
**R Example:**
211+
212+
```r
213+
library(caret)
214+
ts_cv <- createTimeSlices(1:length(y), initialWindow=100, horizon=10)
215+
```
216+
217+
## 3.3 Time Series Cross-Validation
218+
219+
Generalizes rolling-origin by systematically evaluating across multiple folds while preserving order.
220+
221+
- Provides more robust performance estimates.
222+
- Can be computationally expensive.
223+
224+
## 3.4 Out-of-Sample Testing
225+
226+
Keep the last part of the data entirely untouched until final evaluation. Prevents overfitting during model selection.
227+
228+
229+
# 4\. Interpreting Results and Improving Performance
230+
231+
## 4.1 Comparing Metrics
232+
233+
No single metric is universally best. Always evaluate multiple:
234+
235+
- Use RMSE for penalizing large errors.
236+
- Use MAE for robust absolute error measure.
237+
- Use MAPE/sMAPE for percentage errors (careful near zeros).
238+
- Use MASE for scale-independent comparisons.
239+
- Use AIC/BIC for model parsimony.
240+
241+
## 4.2 Bias and Residual Analysis
242+
243+
Inspect residuals:
244+
245+
- Residuals should resemble white noise (uncorrelated, zero mean, constant variance).
246+
- Autocorrelation in residuals suggests underfitting.
247+
- Non-constant variance suggests need for GARCH-type models.
248+
249+
## 4.3 Model Improvement Strategies
250+
251+
- **Hyperparameter tuning:** Explore ARIMA orders, neural network architectures, etc.
252+
- **Feature engineering:** Add external regressors (ARIMAX, VARX).
253+
- **Transformation:** Log or Box-Cox to stabilize variance.
254+
- **Ensemble methods:** Combine forecasts for improved accuracy.
255+
- **Hybrid approaches:** Blend statistical and machine learning models.
256+
257+
## 4.4 Avoiding Overfitting
258+
259+
- Keep models simple unless complexity is justified.
260+
- Use out-of-sample validation rigorously.
261+
- Monitor performance drift as new data arrives.
262+
263+
264+
# 5\. Case Study: Forecasting and Evaluation in Practice
265+
266+
## 5.1 Data Simulation
267+
268+
```python
269+
import numpy as np
270+
np.random.seed(0)
271+
t = np.arange(100)
272+
y = 0.5*t + 10*np.sin(2*np.pi*t/12) + np.random.normal(0, 3, 100)
273+
```
274+
275+
## 5.2 Train-Test Split
276+
277+
```python
278+
train, test = y[:80], y[80:]
279+
```
280+
281+
## 5.3 Fit Model
282+
283+
```python
284+
import statsmodels.api as sm
285+
model = sm.tsa.ARIMA(train, order=(2,1,2)).fit()
286+
forecast = model.forecast(steps=len(test))[0]
287+
```
288+
289+
## 5.4 Evaluate
290+
291+
```python
292+
from sklearn.metrics import mean_absolute_error, mean_squared_error
293+
import numpy as np
294+
mae = mean_absolute_error(test, forecast)
295+
rmse = np.sqrt(mean_squared_error(test, forecast))
296+
```
297+
298+
Resulting metrics provide insights into forecast performance and guide adjustments.
299+
300+
# 6\. Advanced Considerations
301+
302+
## 6.1 Forecast Horizons
303+
304+
Errors often grow with horizon length. Evaluate short-term and long-term forecasts separately.
305+
306+
## 6.2 Probabilistic Forecasts
307+
308+
Instead of point forecasts, models can generate prediction intervals or full predictive distributions. Evaluate with metrics like:
309+
310+
- **Coverage Probability:** Proportion of true values within prediction intervals.
311+
- **CRPS (Continuous Ranked Probability Score):** Measures accuracy of full distributions.
312+
313+
## 6.3 Multivariate and Hierarchical Forecasting
314+
315+
- **Multivariate:** Use metrics like multivariate RMSE or trace of error covariance.
316+
- **Hierarchical:** Ensure coherence across aggregation levels (e.g., bottom-up vs top-down forecasts).
317+
318+
## 6.4 Real-Time Constraints
319+
320+
In applied settings, evaluation must balance accuracy with computational efficiency and interpretability.
321+
322+
# 7\. Summary and Best Practices
323+
324+
- **Use multiple metrics.** RMSE, MAE, MAPE, MASE, AIC/BIC each provide unique insights.
325+
- **Validate properly.** Employ rolling-origin or cross-validation; avoid random shuffling.
326+
- **Analyze residuals.** Residual diagnostics reveal systematic issues.
327+
- **Prevent overfitting.** Simplicity often outperforms over-complex models.
328+
- **Match evaluation to context.** Select metrics aligned with application needs (e.g., absolute errors vs percentage errors).
329+
330+
# Conclusion
331+
332+
Evaluating time series forecasting models is a nuanced process that requires both statistical rigor and practical judgment. By carefully choosing appropriate accuracy metrics, implementing robust validation strategies, and thoroughly analyzing residuals, practitioners can ensure that their forecasts are not only accurate in historical data but also reliable in predicting the future.
333+
334+
As the field evolves, with deep learning, probabilistic forecasts, and hybrid models gaining traction, the principles of evaluation remain central. Strong evaluation practices form the foundation for trustworthy and actionable time series forecasting.

_posts/2025-08-07-smarter_tree_splits.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,14 @@ keywords:
2727
- "XGBoost"
2828
- "scikit-learn"
2929
classes: wide
30+
date: '2025-08-07'
31+
header:
32+
image: /assets/images/data_science_6.jpg
33+
og_image: /assets/images/data_science_6.jpg
34+
overlay_image: /assets/images/data_science_6.jpg
35+
show_overlay_excerpt: false
36+
teaser: /assets/images/data_science_6.jpg
37+
twitter_image: /assets/images/data_science_6.jpg
3038
---
3139

3240
When building regression trees, whether in standalone models or ensembles like Random Forests and Gradient Boosted Trees, the key objective is to decide the best way to split nodes for optimal predictive performance. Traditionally, this has been done using **Mean Squared Error (MSE)** as a split criterion. However, many modern implementations — such as those in **LightGBM**, **XGBoost**, and **scikit-learn’s HistGradientBoostingRegressor** — use a mathematically equivalent but computationally superior alternative: **Friedman MSE**.

_posts/2025-08-13-preregistering_structural_equation_modeling .md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,14 @@ keywords:
3232
- research reproducibility
3333
- confirmatory analysis
3434
classes: wide
35+
date: '2025-08-13'
36+
header:
37+
image: /assets/images/data_science_10.jpg
38+
og_image: /assets/images/data_science_10.jpg
39+
overlay_image: /assets/images/data_science_10.jpg
40+
show_overlay_excerpt: false
41+
teaser: /assets/images/data_science_10.jpg
42+
twitter_image: /assets/images/data_science_10.jpg
3543
---
3644

3745
Structural Equation Modeling (SEM) is a powerful analytical tool, capable of modeling complex latent structures and causal relationships between variables. From psychology to marketing, SEM is used in diverse fields to test theoretical models with observed data. Yet, the same flexibility that makes SEM attractive also opens the door to excessive researcher degrees of freedom. Without constraints, analysts can tweak specifications post hoc--knowingly or unknowingly--to produce more favorable results.

0 commit comments

Comments
 (0)