The full course (including data, code, and slides) is available for USD 19.99 on Udemy.
We explore statistical models and start playing with Stata. I show you how to load data into Stata from Excel or csv files.
I cover data types, sampling issues, outliers, and missing values.
I discuss whether you should transform variables (e.g., log transformation), how transformation affects linear relationships, and whether variables should be normally distributed for regression analysis. I show you how to AVOID COMMON MISTAKES when transforming variables.
I introduce the estpost and esttab commands, which enable you to export tables from Stata to Word, Excel, or other applications. I show you how to modify formats and optimise the layout. This produces production-ready tables for your dissertation project, consulting report or academic paper. NO NEED TO ADJUST TABLES BY HAND - LET STATA TAKE CARE OF IT!
Now it is your turn! Download the data and try to answer the questions for Workshop 1 (see slides). This video will walk you through a Descriptive Data Analysis step-by-step. We generate new variables, display descriptive statistics, and explore large survey data.
This video explains Regression Analysis without using theory. We will conduct a regression analysis in Stata and interpret the output. In particular, we explore correlations, scatter plots, linear models, OLS, dummies, and predictions.
Chapters
- 0:00 Welcome & Overview
- 1:19 Correlations & Scatter Plots
- 6:23 Distributions & Transformations
- 7:04 Linear Model
- 9:48 Ordinary Least Squares (OLS)
- 16:59 Application using Stata
- 38:12 Regression Output & Interpretation
- 46:53 Dummy Variables
- 52:52 Fitted Values
- 55:55 Model Assumptions
This video explains the concept of degrees of freedom. Using artificial data, we illustrate the minimum number of observations needed to determine a regression line in two dimensions (or higher dimensions). We show the impact on R-squared and demonstrate the adjusted R-squared. Using examples, we highlight the impact of additional observations and explanatory variables on degrees of freedom and R-squared.
This video explains multicollinearity and its consequences for regression analysis. We discuss how to detect multicollinearity and how to address the problem. Finally, we demonstrate multicollinearity using data on commodity prices.
Chapters
- 0:00 Multicollinearity
- 0:24 What is the problem?
- 2:43 How to fix it?
- 4:50 How to detect multicollinearity?
- 7:30 Example in Stata
This video explains heteroskedasticity and its consequences for regression analysis. We discuss how to detect heteroskedasticity and how to address the problem. Finally, we demonstrate heteroskedasticity using data on yields in farming.
Chapters
- 0:00 Welcome
- 0:44 Impact on p-values
- 2:07 Detecting the problem
- 4:56 How to fix it?
- 5:42 Worked example in Stata
This video explains how to detect and fix an omitted variable bias. If you forget to include an important explanatory variable in your regression model, an omitted variable bias can occur. I explain how you can detect this problem using the Ramsey RESET test. This test also indicates non-linear relationships. We will explore how we can distinguish between non-linear effects and omitted variables using fitted values.
Chapters
- 0:00 Omitted Variable Bias
- 1:34 Worked Example in Stata
- 3:55 Log Transformation
- 5:08 Regression Model
- 6:50 Ramsey RESET Test
- 9:10 Higher Orders
- 15:36 Collapse Command
- 17:01 Visualisation
This video explains how to detect endogeneity. Endogeneity is a common problem in regression analysis. I explain how you can detect this problem using an auxiliary regression approach. We discuss strategies to address endogeneity.
Chapters
- 0:00 Welcome
- 0:15 What is Endogeneity?
- 1:42 Detecting Endogeneity
- 3:08 Worked Example in Stata
- 11:33 How to fix Endogeneity?
This video explains how to work with panel data. We discuss the benefits of using panel data, including Granger causality and the assessment of policy changes. We introduced fixed and random effects models, which we implement in Stata. The regression otuputs are explained and compared.
Chapters
- 0:00 Introduction to Panel Data
- 0:26 Benefits of Panel Data
- 1:23 Analysing Policy Changes
- 1:58 Causality
- 3:01 Time Lags
- 3:19 Panel Data Models
- 3:59 SOLS or POLS
- 4:20 Fixed & Random Effects
- 7:11 Worked Example in Stata
- 8:40 Panel Regressions in Stata
- 9:36 The tsset Command
- 11:07 Interpretation of Output
- 13:46 Model Comparison
This video discusses whether you should use fixed or random effects for your panel data analysis. We explain how the Hausman test works and - most importantly - when the Hausman test fails! We cover biased estimators, the efficiency of estimators, and the implementation in Stata. Again, I focus on an intuitive understanding of the methods - no theory - just data fun!
Chapters
- 0:00 Fixed or Random Effects
- 0:26 Worked Example
- 0:53 How does the Hausman Test work?
- 1:12 Bias
- 1:45 Efficiency
- 3:21 Implementation in Stata
- 4:42 Interpretation of Output
- 6:23 Warning: Hausman Test fails!
This video explains the impact of serial correlation in panel data analysis. We discuss the underlying reasons for serial correlation. Then we introduce a test based on Wooldridge (2002). To fix serial correlation, we explore the Newey-West Estimator (robust estimation) and Dynamic Panel Data Estimation. Finally, we have some fun in Stata.
Chapters
- 0:00 Serial Correlation in Panel Data
- 0:40 Reasons for Serial Correlation
- 1:19 Testing for Serial Correlation
- 2:46 Newey-West Estimator
- 3:47 Dynamic Panel Data Estimation
- 4:12 Worked Example in Stata
- 5:26 Interpretation of Output
- 6:13 Solutions in Stata
This video explains interaction effects in panel data. It is common that certain groups of observations (e.g., companies, countries) exhibit differences in behaviour. These differences can be modelled using interaction effects. We explore shifts in the intercept and slope coefficient. In addition, I demonstrate how these models can be implemented in Stata.
Chapters
- 0:00 Interaction Effects
- 1:13 Shift in Intercept
- 2:21 Illustration of Shift
- 2:40 Interaction Term
- 4:08 Illustration of Interaction Effect
- 4:31 Implementation in Stata
This video comes with a TRIGGER WARNING! It contains mathematics, which some viewers might find distressing. I explain how the serial correlation test developed by Wooldridge (2002) can be derived. We cover the null hypothesis and related assumptions, iid distributed error terms, covariance and variance formulas. We also highlight linear operators and their properties. There is a little surprise at the end of the video!
This video introduces logistic regressions. We discuss binary choice models, where the dependent variable is either a positive or negative outcome (e.g., a decision). The problem is illustrated graphically - how to map a linear model to an interval suitable for modelling a probability. Most decision processes remain unobserved; hence, we briefly discuss latent variables. Finally, I demonstrate how these models can be implemented in Stata. Predicted probabilities are plotted to visualise the model, and we explore classifications.
Chapters
- 0:00 Binary Choice
- 1:34 Illustration of Problem
- 5:38 Latent Variable
- 9:48 Implementation in Stata
- 14:43 Plot Predicted Probabilities
- 17:51 Classification
This video explores a dataset of mergers (companies buying other companies). It is often interesting to predict whether a merger occurs as share prices tend to move. First, we explore the data, select variables, and visualise the trend of mergers in the US. You will learn new Stata commands to summarize data using collapse. Second, we run several logit models and derive predicted probabilities. Finally, we compare predictions based on firm-level data and macro data (merger wave). If you want to know more about mergers, have a look at our paper on "Endogenous mergers: bidder momentum and market reaction."
Chapters
- 0:00 Predicting Mergers
- 1:14 Exploring Data
- 2:44 Sum Command
- 4:12 Density Plot
- 4:30 Tabstat Command
- 5:46 Collapse Command
- 9:01 Sorting and By Command
- 11:07 Logit Models
- 17:05 Compare Predictions
This video explains the process of model specification, which is often overlooked in textbooks and many online courses. However, it is essential to understand how you actually derive the 'best model' for your data. We start by exploring different aims of studies, including forecasting and identification. The main approaches: General-to-Specific and Specific-to-General are introduced. We discuss the pros and cons of each approach. We explain the use of information criteria (AIC, BIC). Finally, we apply our knowledge to predicting stock market returns using a set of macroeconomic shock variables.
Chapters
- 0:00 Model Specification
- 0:31 Aims of Video
- 1:59 What is the 'Best Model'?
- 3:36 How to start?
- 5:03 Specification Methods
- 7:47 Information Criteria
- 9:10 Predicting Stock Market Returns
- 11:23 General-to-Specific Approach
This video goes deeper into Stata programming. We illustrate time-varying coefficients in regressions. This is an issue in time series analysis aimed at forecasting. How can you forecast if your model exhibits parameter instability? We illustrate the problem and our approach using overlapping periods. The implementation in Stata highlights the differences between the matrix and variable environment. We move between the two using the svmat command. Time-varying coefficients are plotted, and a structural break is highlighted.
Chapters
- 0:00 Parameter Stability
- 0:32 Illustration of Problem
- 2:52 Worked Example in Stata
- 4:06 Obtain Coefficients
- 5:03 Variable or Matrix in Stata
- 7:38 The svmat Command
- 9:52 The egen max() Trick
- 10:55 Fovalues Loop
- 14:55 Plotting Rolling Regression
This is our first live event dedicated to data analysis using Stata. We explore a cross-country dataset of macroeconomic variables. We try to model the impact of inflation on economic growth and explore non-linear effects.
This video provides a brief introduction to Confirmatory Factor Analysis (CFA). We discuss social constructs that cannot be easily measured. In practice, many concepts (e.g., overconfidence) cannot be observed directly (latent variables). These latent variables can be measured indirectly based on a set of factors that can be observed. We show that index construction, which is common, can be misleading. We discuss various ways to reduce dimensions, which is nowadays part of the machine learning (ML) literature. The methods include principal component analysis (PCA) and confirmatory factor analysis (CFA). Examples refer to our paper "Defining and measuring financial inclusion: A systematic review and confirmatory factor analysis".
Chapters
- 0:00 Introduction to CFA
- 0:11 Example: Financial Inclusion
- 0:41 Measure Latent Variable
- 1:59 Factors
- 2:57 Index Construction
- 3:59 Reduce Dimensions
- 4:46 PCA
- 5:50 CFA
- 6:13 Measurement Model
This video provides a step-by-step guide to conducting a Confirmatory Factor Analysis (CFA) in Stata. We introduce the sem command and explain the syntax for a measurement model. The models are estimated and post estimation analysis based on goodness of fir measures is conducted. If the RMSEA is larger than 0.05 and the CFI is below 0.95, adding covariances between error terms can be beneficial. To identify the most promising covariances to add, we calculate the Modification Index (MI). Examples refer to our paper "Defining and measuring financial inclusion: A systematic review and confirmatory factor analysis."
Chapters
- 0:00 How to estimate a CFA in Stata?
- 0:29 Illustration of Model
- 1:03 Model Fit
- 2:05 Modification Index
- 2:39 Advanced Topics in SEM
- 3:06 Data on Financial Inclusion
- 4:06 The sem Command
- 6:49 Post Estimation Analysis