This project aims to predict house prices using Machine Learning algorithms based on features such as square footage, bedrooms, bathrooms, year built, and neighborhood. The model assists real estate stakeholders in making data-driven decisions.
- Records: 50,000 houses
- Features: Square footage, bedrooms, bathrooms, neighborhood, year built
- Target: House sale price
- Python (Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn)
- Jupyter Notebook for analysis and visualization
-
Data Preprocessing:
- Removed outliers using IQR method
- Encoded categorical features (Neighborhood) using Label Encoding
- Scaled numerical features using Min-Max scaling
-
Exploratory Data Analysis (EDA):
- Visualized price distributions and feature correlations
- Analyzed trends like square footage vs. price and year built vs. price
-
Model Training & Evaluation:
- Models Implemented:
✔️ Linear Regression
✔️ Ridge Regression
✔️ Decision Tree Regressor
✔️ Gradient Boosting Regressor - Evaluation Metrics: R² Score, MAE, MSE, RMSE
- Models Implemented:
| Model | R² Score | MAE | RMSE |
|---|---|---|---|
| Linear Regression | 0.57 | $39,866 | $49,681 |
| Ridge Regression | 0.57 | $39,866 | $49,680 |
| Decision Tree | 0.56 | $40,026 | $49,886 |
| Gradient Boosting | 0.57 | $39,890 | $49,730 |
- Square footage is the most significant predictor of house prices
- Neighborhood has a strong impact on pricing trends
- Gradient Boosting provides the best performance among non-linear models
- LinkedIn: Adithya Vardhan Reddy
- GitHub: Project Repository
© 2025 Adithya Vardhan Reddy