Discover the hidden patterns behind what drives house prices.
This project performs an inβdepth exploratory data analysis (EDA) on the
Kaggle House Prices β Advanced Regression Techniques dataset using Python and popular data science libraries.
- Rows: 1,460 Β | Β Columns: 81
- Source: Kaggle competition (House Prices β Advanced Regression Techniques)
- Target variable:
SalePrice
-
Clone the repository -git clone https://github.com/rivu-intel45/house-prices-regression.git -cd house-prices-regression
-
Install dependencies -pip install -r requirements.txt
-
Run the notebook -jupyter notebook Open
house-regression.ipynband run all cells. -
Add data (if needed)
Iftrain.csvis not present, download it from Kaggle
and place it in aninput/folder (or update the path in the notebook).
-
Target distribution
Distribution and spread ofSalePrice, including skewness and outliers. -
Missing value analysis
Bar plots to visualize columns with the highest proportion of missing data. -
Key feature relationships
-
GarageAreavs.SalePrice -
OverallQualvs.SalePrice -
Impact of
SaleTypeon prices -
Visual insights
-
Histograms, KDE plots, and boxplots for numeric features
-
Scatter plots for important featureβprice relationships
-
Major findings
-
Higher
OverallQualandGarageAreaare strongly associated with higherSalePrice. -
Some
SaleTypecategories correspond to notable price outliers. -
Several categorical features contain substantial missing values that need careful handling.
- Python 3.8+
pandasnumpymatplotlibseaborn
Pull requests are welcome.
For larger changes, please open an issue first to discuss what you would like to add or modify.
Happy analyzing! π π