AI-Powered Real Estate Price Forecasting
A machine learning application that predicts future home prices based on property type, location, historical trends, and market indicators. Built with Random Forest regression models and an interactive React frontend.
The Home Price Predictor leverages statistical modeling and machine learning to forecast real estate prices across different property types and locations. By analyzing historical market data, property indices, and year-over-year trends, the system provides accurate price predictions to help buyers, sellers, and investors make informed decisions.
- Multiple Random Forest Regressors trained for different property types
- High accuracy predictions with optimized R² scores
- Feature engineering including logarithmic transformations and encoding
- Robust preprocessing with StandardScaler and OrdinalEncoder
- Composite (Comp): Overall market benchmark
- Single Family Detached (SFDetach): Standalone homes
- Single Family Attached (SFAttach): Townhomes and semi-detached
- Townhouse (THouse): Multi-level attached homes
- Apartment (Apart): Condos and apartment units
- Location-based predictions with one-hot encoding
- Temporal analysis using date encoding
- Market indicators: Property indices, benchmarks, and YoY changes
- Fallback mechanisms for missing data
- React-based UI for seamless user experience
- Real-time predictions via Flask API
- Responsive design with modern styling
- Visual data representation and trend analysis
Raw Data (CSV)
↓
Data Cleaning & Preprocessing
↓
Feature Engineering
• Logarithmic transformations
• One-hot encoding (Location)
• Ordinal encoding (Date)
• Standard scaling
↓
Train/Test Split (80/20)
↓
Random Forest Models (5 types)
↓
Model Evaluation & Optimization
↓
Model Serialization (joblib)
User Input (Frontend)
↓
POST Request → Flask API
↓
Load Pre-trained Models
↓
Data Preprocessing
↓
Model Prediction
↓
JSON Response → Frontend
↓
Display Results
Backend:
- Python 3.x
- Flask (REST API)
- Flask-CORS (Cross-origin support)
Machine Learning:
- Scikit-Learn (Random Forest, preprocessing)
- Pandas (data manipulation)
- NumPy (numerical operations)
- Joblib (model serialization)
Frontend:
- React
- HTML5
- CSS3
- JavaScript (ES6+)
Data Visualization:
- Matplotlib
- Seaborn
Each model uses a tailored feature set:
Common Features:
- Date (temporal component)
- Location (one-hot encoded)
- Property-specific Index
- Property-specific Benchmark
- Property-specific YoY Change
Target Variables:
- CompBenchmark
- SFDetachBenchmark
- SFAttachBenchmark
- THouseBenchmark
- ApartBenchmark
- Missing Value Handling: Dropna on raw data
- Logarithmic Transformation: Applied to skewed features
log(value + 1) to handle near-zero values
- Categorical Encoding: One-hot encoding for locations
- Temporal Encoding: Ordinal encoding for dates
- Feature Scaling: StandardScaler for normalization
RandomForestRegressor()
- n_estimators: default (100 trees)
- Trained on 80% of data
- Evaluated on 20% test set
- Saved using joblib for production use- Python 3.7 or higher
- Node.js and npm
- pip (Python package manager)
-
Clone the repository:
git clone https://github.com/yourusername/home-price-predictor.git cd home-price-predictor/backend -
Create a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Python dependencies:
pip install flask flask-cors pandas numpy scikit-learn matplotlib seaborn joblib
-
Prepare the data:
- Place your
MLS.csvfile in theBackend_Modelsdirectory - Update the file path in the training script if necessary
- Place your
-
Train the models:
python train_models.py
This will generate:
forest_comp_model.pklforest_SFDetach_model.pklforest_SFAttach_model.pklforest_THouse_model.pklforest_Apart_model.pkldate_encoder.pkl- Column files for each model
-
Start the Flask API:
python app.py
The API will run on
http://localhost:5000
-
Navigate to frontend directory:
cd ../frontend -
Install Node.js dependencies:
npm install
-
Start the development server:
npm start
The app will open at
http://localhost:3000
-
Select Property Type:
- Choose from Composite, SF Detached, SF Attached, Townhouse, or Apartment
-
Enter Location:
- Input the geographic area or neighborhood
-
Select Date:
- Choose the date for which you want the price prediction
-
Get Prediction:
- Click "Predict" to receive the estimated home price
POST /predict_price
Request Body:
{
"location": "Downtown",
"date": "2024-08-15",
"hometype": "SFDetach"
}Response:
{
"predictions": 1425000
}Supported Home Types:
Comp- CompositeSFDetach- Single Family DetachedSFAttach- Single Family AttachedTHouse- TownhouseApart- Apartment
home-price-predictor/
├── backend/
│ ├── Backend_Models/
│ │ ├── MLS.csv
│ │ ├── train_models.py
│ │ └── data.csv
│ ├── app.py
│ ├── forest_comp_model.pkl
│ ├── forest_SFDetach_model.pkl
│ ├── forest_SFAttach_model.pkl
│ ├── forest_THouse_model.pkl
│ ├── forest_Apart_model.pkl
│ ├── date_encoder.pkl
│ ├── x_train_*_columns.pkl
│ └── requirements.txt
├── frontend/
│ ├── src/
│ │ ├── components/
│ │ ├── App.js
│ │ └── index.js
│ ├── public/
│ └── package.json
└── README.md
The Random Forest models achieve high R² scores on test data:
Model Performance Metrics:
- Comp Model: R² = [Your Score]
- SFDetach Model: R² = [Your Score]
- SFAttach Model: R² = [Your Score]
- THouse Model: R² = [Your Score]
- Apart Model: R² = [Your Score]
(Run the training script to see actual scores)
- Random Forest Regression: Ensemble learning for robust predictions
- Feature Engineering: Log transformations to handle skewed distributions
- Cross-validation: Train/test split for unbiased evaluation
- Dimensionality Reduction: Selective feature dropping to prevent multicollinearity
- Encoding Strategies: Optimal encoding for categorical and temporal data
- Regularization: Implicit through Random Forest's ensemble nature
- Exploratory Data Analysis (EDA) with Pandas
- Distribution Analysis using Matplotlib and Seaborn
- Trend Extraction from historical price data
Model Improvements:
- Implement GridSearchCV for hyperparameter tuning
- Test alternative algorithms (XGBoost, LightGBM, Neural Networks)
- Add cross-validation with k-fold splits
- Ensemble multiple models for improved accuracy
Feature Expansion:
- Property size (square footage)
- Number of bedrooms/bathrooms
- Property age and condition
- Proximity to amenities (schools, transit)
- Economic indicators (interest rates, unemployment)
Application Features:
- Historical price trend visualization
- Comparative market analysis
- Price range predictions with confidence intervals
- Save and compare multiple predictions
- User authentication and prediction history
- Mobile app version
- Real-time market data integration
Deployment:
- Deploy backend on AWS/Heroku
- Host frontend on Vercel/Netlify
- Set up CI/CD pipeline
- Implement caching for faster predictions
- Add monitoring and logging
Model files not found:
# Ensure you've run the training script first
python train_models.pyCORS errors:
# Verify Flask-CORS is installed and configured
pip install flask-corsDate encoding errors:
# Check that date format matches training data
# Format should be: YYYY-MM-DDMissing data in CSV:
# Ensure data.csv contains all required columns:
# Location, HomeType, {Type}Index, {Type}Benchmark, {Type}YoYChangeflask==2.3.0
flask-cors==4.0.0
pandas==2.0.0
numpy==1.24.0
scikit-learn==1.3.0
matplotlib==3.7.0
seaborn==0.12.0
joblib==1.3.0{
"dependencies": {
"react": "^18.2.0",
"react-dom": "^18.2.0",
"axios": "^1.4.0"
}
}This project is open source and available under the MIT License.
For questions or feedback, please open an issue or reach out through GitHub.