This project predicts house prices for Ames, Iowa, using advanced machine learning techniques. It covers the full data science workflow: from data exploration and cleaning, through feature engineering and model selection, to deployment as a production-ready API. The solution is modular, reproducible, and ready for both experimentation and real-world use.
- notebooks/: Step-by-step Jupyter notebooks for data exploration, cleaning, feature engineering, model selection, and tuning.
- demo/: A simplified end-to-end demo pipeline, including model training, evaluation, and export, using a reduced set of features for clarity.
- docker/: All files needed to build and run the API in a Docker container, including the trained model and preprocessing pipeline.
The docker/ folder contains the necessary files to containerize and serve the trained model as a REST API. It includes:
- 📄 A
Dockerfilethat defines the environment and dependencies for the API. - 🗃️ An
app/directory with the FastAPI application, preprocessing blocks, and scripts to generate the pipeline and model. - 🏷️ Pre-trained model and pipeline files ready for production use.
- 📚 Documentation for building, running, and testing the containerized API locally or in the cloud.
This setup ensures the solution is portable, reproducible, and easy to deploy in any environment that supports Docker.
The demo/ folder provides a minimal, reproducible pipeline for demonstration. It loads processed data, applies preprocessing (column selection, encoding, feature engineering), splits the data, trains and compares several models, visualizes results, and saves the best model. Hyperparameter tuning is also included.
The notebooks/ directory documents the full workflow in detail, from initial data exploration to advanced model tuning. Each notebook focuses on a specific stage, making the process transparent and easy to follow.
🛠️ Step 1: Authenticate and select your Google Cloud project
Log in and set your project:
gcloud auth login
gcloud config set project [PROJECT-ID]📦 Step 2: Create an Artifact Registry repository
Create a Docker repository in your desired location:
gcloud artifacts repositories create housing-price-predictor \
--repository-format=docker \
--location=[YOUR-LOCATION] \
--description="Docker repository for ML model"⚙️ Step 3: Configure Docker authentication
Allow Docker to use your GCP credentials for pushing images:
gcloud auth configure-docker [YOUR-LOCATION]-docker.pkg.dev🐳 Step 4: Build the Docker image
Make sure your Dockerfile and dependencies are ready. Build the image locally:
docker build -t housing-price-predictor-api .🧱 Step 5: Tag the image
Tag the image for uploading to your repository:
docker tag housing-price-predictor-api \
[YOUR-LOCATION]-docker.pkg.dev/[PROJECT-ID]/housing-price-predictor/housing-price-predictor-api⬆️ Step 6: Push the image to Artifact Registry
Push the tagged image to your GCP repository:
docker push [YOUR-LOCATION]-docker.pkg.dev/[PROJECT-ID]/housing-price-predictor/housing-price-predictor-api☁️ Step 7: Deploy to Google Cloud Run
Deploy the service to Cloud Run using the uploaded image:
gcloud run deploy ml-api \
--image [YOUR-LOCATION]-docker.pkg.dev/[PROJECT-ID]/housing-price-predictor/housing-price-predictor-api \
--platform managed \
--region [YOUR-LOCATION] \
--allow-unauthenticated🔎 Final notes:
- The
ml-apiservice will be publicly accessible via a URL generated by Cloud Run. - You can view logs and manage the service from the Google Cloud Console.
- Replace
[PROJECT-ID]and[YOUR-LOCATION]with your project and region values. - To update the image, repeat steps 4 to 7.