Skip to content

ehti-90/wine-alcohol-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

🍷 White Wine Quality Analysis: Confidence, Tolerance Intervals & Hypothesis Testing

This project applies statistical analysis and data visualization techniques to a real-world white wine dataset using Python. The focus is on analyzing the alcohol content of wines by computing descriptive statistics, constructing frequency distributions, calculating confidence and tolerance intervals, and conducting hypothesis testing.


📂 Dataset Overview

The dataset, sourced from Kaggle, contains ~5000 samples of white wine with the following attributes:

  • fixed acidity
  • volatile acidity
  • citric acid
  • residual sugar
  • chlorides
  • free sulfur dioxide
  • total sulfur dioxide
  • density
  • pH
  • sulphates
  • alcohol
  • quality

🔍 For this project, we focused only on the alcohol column to perform all statistical analysis and visualizations.


📊 Objectives

We aimed to:

  • Calculate mean and variance of alcohol percentage (both manually and using Pandas)
  • Create histograms and pie charts to visualize alcohol distribution
  • Build a frequency distribution table using grouped bins
  • Estimate a 95% confidence interval and a 95% tolerance interval for alcohol content
  • Validate the tolerance interval using a 20% test split
  • Perform a one-sample t-test to test if the mean alcohol percentage is significantly different from a given value

Tools Used

  • Python 3
  • pandas – data manipulation
  • numpy – numerical operations
  • matplotlib – visualizations
  • scipy.stats – statistical testing and interval calculations



Summary of Tasks Performed

| Data Cleaning | Removed missing values from dataset | Descriptive Stats | Computed mean & variance of alcohol | Visualization | Histogram of alcohol % and pie chart for ranges | Frequency Dist. | Created bins and counted samples in each | Confidence Interval | 95% CI of mean alcohol using t-distribution | Tolerance Interval | 95% range expected to cover future samples | Hypothesis Test | One-sample t-test to check if μ ≠ 10.5%


Statistical Hypotheses

We tested the hypothesis:

  • Null Hypothesis (H₀): Mean alcohol percentage = 10.5%
  • Alternative Hypothesis (H₁): Mean alcohol percentage ≠ 10.5%

Based on the computed p-value and t-statistic, we determined whether to reject H₀.

Result Highlights

  • Alcohol mean and variance were successfully calculated manually and with Python.
  • Visualizations provided clear insight into alcohol distribution.
  • Both 95% confidence and tolerance intervals were constructed.
  • Hypothesis test showed whether alcohol % significantly differed from 10.5%.
  • Over 90% of test data fell within the computed tolerance interval, validating its effectiveness.

📚 References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published