Skip to content

This project explores the socio-economic and health-related factors influencing household tobacco consumption in the context of a national budget survey.

Notifications You must be signed in to change notification settings

abrahim-k/tobacco_budget

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📊 Tobacco Consumption and Household Budget Analysis

🧠 Overview

This project explores the socio-economic and health-related factors influencing household tobacco consumption in the context of a national budget survey. Through a series of descriptive statistics, regression analyses, and diagnostic tests, it identifies key relationships and evaluates model performance while accounting for potential specification errors.


📁 Data Sources

  • data1.dta: Primary dataset containing household budget data
  • health.dta: Health expenditure records to proxy tobacco-related harm

📚 Project Goals

  • Analyze tobacco consumption across households using economic and demographic indicators
  • Determine the statistical significance and magnitude of those indicators using regression modeling
  • Address modeling limitations such as normality, heteroscedasticity, and omitted variable bias
  • Include health-related proxies to enrich the model

🔍 Analysis Workflow

1️⃣ Data Preprocessing

  • Loaded .dta datasets using haven
  • Identified non-binary variables for statistical analysis
  • Merged health data with primary household dataset using hhid

2️⃣ Descriptive Statistics & Correlations

  • Summarized income, age, education, unit price, and tobacco consumption
  • Notable correlations:
    • Age ↔️ Education: −0.3486
    • Education ↔️ Income: 0.4200
    • Education ↔️ Unit Price: 0.5353
    • Unit Price ↔️ Income: 0.5028

3️⃣ Visual Explorations

  • Household Income vs Tobacco Consumption: No strong linear pattern; higher variance for low-income households
  • Unit Price vs Tobacco Consumption: Clear inverse relationship, price sensitivity evident

4️⃣ Normality Checks

Used Shapiro-Wilk and histograms for:

  • income: Strongly non-normal (p ≈ 0.444)
  • unitvalue: Also non-normal distribution

5️⃣ Baseline Regression

weight ~ income + unitvalue + age + female + leduc + own + child_less_14

About

This project explores the socio-economic and health-related factors influencing household tobacco consumption in the context of a national budget survey.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages