Skip to content

This project aims to develop and evaluate regression models for predicting diamond prices based on various features

Notifications You must be signed in to change notification settings

AhmadSadik1/DiamondPricePredication

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Diamond Price Detection

Overview:

This project aims to develop and evaluate regression models for predicting diamond prices based on various features. The process involves data preprocessing, exploratory data analysis, feature engineering, model training, evaluation, and hyperparameter tuning.

Table of Contents:

Imports and Configurations

Data Preprocessing

Exploratory Data Analysis (EDA)

Handling Categorical Variables

Outlier Detection and Removal

Model Training and Evaluation

Model Comparison

Hyperparameter Tuning

Imports and Configurations:

The project starts by importing necessary libraries for data handling, modeling, preprocessing, and visualization. Key libraries include numpy, pandas, sklearn, matplotlib, and seaborn. Visualization settings are configured to improve the readability of plots.

Data Preprocessing:

-Dropping Duplicates: Duplicate rows are identified and removed.

-Dropping Faulty Values.

-Feature Engineering: The Id column is dropped, and a new volume feature is created by multiplying x, y, and z. The original x, y, and z columns are then dropped.

Exploratory Data Analysis (EDA)

-Correlation Matrix: A heatmap of the correlation matrix is plotted to understand relationships between numerical features and the target variable price.

-Histograms and Scatterplots: Various histograms and scatterplots are used to visualize the distribution of features and their relationships with the target variable.

-Categorical Variables: Unique values and distributions of categorical variables (cut, color, clarity) are examined through bar plots, box plots, and pie charts.

Handling Categorical Variables

Categorical variables are converted to dummy/indicator variables using pd.get_dummies.

Outlier Detection and Removal

A custom function is defined to remove outliers based on the Interquartile Range (IQR) method.

Model Training and Evaluation

Various regression models are trained and evaluated:

Linear Regression

Lasso Regression

Ridge Regression

Decision Tree Regression

Random Forest Regression

XGBoost

Adaboost

For each model:

The training and testing times are recorded.

Performance is evaluated using Root Mean Squared Error (RMSE) and R-squared (R2) metrics.

Model Comparison:

Training and testing R2 scores of all models are compared through a line plot. The Random Forest Regressor is selected for further tuning due to its high performance.

Hyperparameter Tuning:

Random Forest Hyperparameter Tuning: A grid search with cross-validation is used to tune the hyperparameters of the Random Forest Regressor.

XGBoost Model: An XGBoost Regressor is also trained and evaluated.

Observations:

Data Quality: Initial data exploration and preprocessing steps ensure that the dataset is clean and ready for modeling.

Feature Engineering: The creation of the volume feature adds a new dimension for the model to learn from.

Model Performance: The Random Forest Regressor outperforms other models in terms of R2 score, making it a suitable choice for this regression task.

Hyperparameter Tuning: Grid search with cross-validation further improves the model performance, highlighting the importance of fine-tuning.

Recommendations:

Feature Engineering: Additional feature engineering, such as polynomial features or interaction terms, could potentially improve model performance.

Advanced Models: Consider using advanced models like Gradient Boosting or Neural Networks for potentially better performance.

Regularization: Regularization techniques like L1 and L2 should be explored more thoroughly to handle multicollinearity and overfitting.

Automated EDA Tools: Utilize automated EDA tools to expedite the data exploration process.

About

This project aims to develop and evaluate regression models for predicting diamond prices based on various features

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%