This project is an machine learning toolset for performing linear regression analysis on car mileage and price data. Linear regression is a foundational technique in artificial intelligence and machine learning, used for predicting continuous values. This project covers the full pipeline: data scraping, cleaning, model training, prediction, and evaluation. The codebase is modular, allowing users to generate datasets, train a model, make predictions, and evaluate model precision.
- 🤖 ML Powered: Utilizes linear regression, a core machine learning algorithm, to model and predict car prices.
- 🕸️ Dataset Generation: Scrape car mileage and price data from the web, clean and save as CSV.
- 🤖 Model Training: Train a linear regression model to predict car prices based on mileage.
- 🔮 Prediction: Predict car prices for given mileage using trained model parameters.
- 📊 Precision Calculation: Evaluate model performance with MSE, RMSE, and R-squared metrics.
- 📉 Data Visualization: Plot data points and regression line.
42_ft_linear_regression/
├── data/ # Datasets and model parameters (CSV files)
├── requirements.txt # Python dependencies
└── src/
├── dataset_generator.py # Scrape and clean data
├── precision_calculator.py # Evaluate model precision
├── predictor.py # Predict price from mileage
├── trainer.py # Train linear regression model
└── utils/
├── data_plotter.py # Plotting utilities
├── errors.py # Error handling
├── file_utils.py # Data/model I/O
├── linear_regression.py # Core regression logic
├── scraper.py # Web scraping logic
├── url.py # URL handling
├── validators.py # CSV/data validation
└── __init__.py
- Clone the repository:
git clone <repo-url> cd 42_ft_linear_regression
- Install dependencies:
pip install -r requirements.txt
Scrape and clean car data, then save as CSV:
python src/dataset_generator.py -p data/data_original.csv -n 1000Train a linear regression model on your dataset:
python src/trainer.py -d data/data_original.csv --save_thetas -t data/thetas.csv --show_dataPredict the price for a given mileage:
python src/predictor.py -p data/thetas.csvCalculate MSE, RMSE, and R-squared for your model:
python src/precision_calculator.py -d data/data_original.csv -t data/thetas.csv- pandas
- numpy
- matplotlib
- tqdm
- requests
- beautifulsoup4
- lxml
Install all dependencies with:
pip install -r requirements.txt- Data Acquisition: 🕸️ Scrape car data → Clean and optimize dataset → Save as CSV.
- Model Training: 🤖 Load dataset → Train linear regression model → Save thetas.
- Prediction: 🔮 Load thetas → Input mileage → Predict price.
- Evaluation: 📊 Load dataset and thetas → Calculate precision metrics.