🍷 White Wine Quality Analysis: Confidence, Tolerance Intervals & Hypothesis Testing

This project applies statistical analysis and data visualization techniques to a real-world white wine dataset using Python. The focus is on analyzing the alcohol content of wines by computing descriptive statistics, constructing frequency distributions, calculating confidence and tolerance intervals, and conducting hypothesis testing.

📂 Dataset Overview

The dataset, sourced from Kaggle, contains ~5000 samples of white wine with the following attributes:

fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality

🔍 For this project, we focused only on the alcohol column to perform all statistical analysis and visualizations.

📊 Objectives

We aimed to:

Calculate mean and variance of alcohol percentage (both manually and using Pandas)
Create histograms and pie charts to visualize alcohol distribution
Build a frequency distribution table using grouped bins
Estimate a 95% confidence interval and a 95% tolerance interval for alcohol content
Validate the tolerance interval using a 20% test split
Perform a one-sample t-test to test if the mean alcohol percentage is significantly different from a given value

Tools Used

Python 3
pandas – data manipulation
numpy – numerical operations
matplotlib – visualizations
scipy.stats – statistical testing and interval calculations

Summary of Tasks Performed

Statistical Hypotheses

We tested the hypothesis:

Null Hypothesis (H₀): Mean alcohol percentage = 10.5%
Alternative Hypothesis (H₁): Mean alcohol percentage ≠ 10.5%

Based on the computed p-value and t-statistic, we determined whether to reject H₀.

Result Highlights

Alcohol mean and variance were successfully calculated manually and with Python.
Visualizations provided clear insight into alcohol distribution.
Both 95% confidence and tolerance intervals were constructed.
Hypothesis test showed whether alcohol % significantly differed from 10.5%.
Over 90% of test data fell within the computed tolerance interval, validating its effectiveness.

📚 References

Wine Quality Dataset – Kaggle
McKinney, W. (2010). Data Analysis with Python and Pandas. O'Reilly Media.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
ES-111 PROJECT FINAL.docx		ES-111 PROJECT FINAL.docx
README.md		README.md
es-project.ipynb		es-project.ipynb
winequality-white.xls		winequality-white.xls

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🍷 White Wine Quality Analysis: Confidence, Tolerance Intervals & Hypothesis Testing

📂 Dataset Overview

📊 Objectives

Tools Used

Summary of Tasks Performed

Statistical Hypotheses

Result Highlights

📚 References

About

Uh oh!

Releases

Packages

Languages

ehti-90/wine-alcohol-analysis

Folders and files

Latest commit

History

Repository files navigation

🍷 White Wine Quality Analysis: Confidence, Tolerance Intervals & Hypothesis Testing

📂 Dataset Overview

📊 Objectives

Tools Used

Summary of Tasks Performed

Statistical Hypotheses

Result Highlights

📚 References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages