The objective of this project is to predict the monthly rental prices of real estate properties across the USA. The dataset Apartment for Rent Classified [Dataset]. (2019). UCI Machine Learning Repository. https://doi.org/10.24432/C5X623 consists of approximately 100,000 entries, providing comprehensive details about each property, such as size (square footage), number of bathrooms, etc., and additional features like amenities (e.g., air conditioning, garage, pool). This project aims to leverage these features to develop accurate predictive models for rental pricing using Linear Regression, XGB Regressor, and a Neural Network.
| Model | Split | Encoding | MSE | RMSE | MAE | STD | R^2 |
|---|---|---|---|---|---|---|---|
| LinearRegression | (60/20/20) | target encoding | 232525.45 | 482.21 | 259.83 | 860.21 | 0.75 |
| LinearRegression | (80/20) | target encoding | 232844.02 | 482.54 | 259.77 | 860.21 | 0.75 |
| LinearRegression | (60/20/20) | label encoding | 556765.25 | 746.17 | 453.85 | 860.21 | 0.28 |
| LinearRegression | (80/20) | label encoding | 556278.57 | 745.84 | 454.03 | 860.21 | 0.28 |
| NeuralNet | (60/20/20) | target encoding | 235137.10 | 484.91 | 268.31 | 860.21 | NaN |
| NeuralNet | (80/20) | target encoding | 245613.92 | 495.59 | 263.95 | 860.21 | NaN |
| NeuralNet | (60/20/20) | label encoding | 595048.13 | 771.39 | 548.00 | 860.21 | NaN |
| NeuralNet | (80/20) | label encoding | 563128.43 | 750.42 | 514.33 | 860.21 | NaN |
| XGBRegressor | (60/20/20) | target encoding | 171560.15 | 414.20 | 210.52 | 860.21 | 0.83 |
| XGBRegressor | (80/20) | target encoding | 149103.65 | 386.14 | 207.05 | 860.21 | 0.83 |
| XGBRegressor | (60/20/20) | label encoding | 228073.42 | 477.57 | 244.76 | 860.21 | 0.78 |
| XGBRegressor | (80/20) | label encoding | 213766.41 | 462.35 | 244.42 | 860.21 | 0.78 |
- XGBRegressor (80/20) - target encoding outperforms all models in terms of MSE, RMSE, MAE, and R², making it the best-performing model.
- Linear Regression models with target encoding perform better than those with label encoding, but they still lag behind XGBRegressor models.
- Neural Network models do not provide meaningful improvements
- Target encoded data outperforms label encoded data
Considering the Standard Deviation ($860.21) as the threshold, the performance of the XGBRegressor (80/20) with target encoding is deemed sufficient. The model's error metrics RMSE ($386.55) and MAE ($207.08) are significantly lower than the standard deviation, indicating relatively good performance. However, while the MAE ($207.08) represents a noticeable deviation, this discrepancy can be attributed to the inherent complexity and variability of the market, as well as limitations in the available data and assumptions made by the model.
The Neural Network did not yield significant improvements, likely due to the limited amount of data, suggesting that Linear Regression and XGB Regressor perform better and are more suitable for smaller data sets.