Skip to content

samecrowder/ac215_MSMBAllstars

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AC215 - Milestone 5 (MSMBAllstars)

Team Members

  • Itamar Belson
  • Kenny Chen
  • Sam Crowder
  • Clay Coleman

Group Name

MSMBAllstars

Project Overview

Our project develops a machine learning application that predicts tennis match outcomes using historical ATP match data. The system combines an LSTM-based prediction model with an LLM-powered chat interface for user interaction.

Milestone 5 - Kubernetes Deployment & GPU Acceleration & ML Pipeline

For this milestone, we've implemented a robust Kubernetes deployment on Google Cloud Platform (GCP) with the following key features:

  1. Kubernetes Cluster Architecture

    • Multi-node GKE cluster with both CPU and GPU nodes
    • GPU node pool using NVIDIA L4 GPUs for LLM acceleration
    • Load balancing and auto-scaling capabilities
    • Resource optimization across nodes
  2. Service Components

    • API Service (FastAPI)
    • Probability Model Service (Tennis prediction model)
    • LLM Service (Chat interface)
    • Ollama Service (GPU-accelerated LLM model)
  3. Infrastructure as Code

    • Ansible-based deployment automation
    • Kubernetes manifests for all services
    • GPU resource management and scheduling
    • Container orchestration and scaling
  4. GPU Acceleration

    • NVIDIA device plugin integration
    • GPU-optimized Ollama container
    • Efficient resource allocation for ML workloads
  5. ML Pipeline

    • Single pipeline for preprocessing (see run_pipeline.sh in root)
    • Training on GCP Vertex AI and sweep optimization on Weights & Biases
    • Deployment of model only if passes validation metric threshold

System Architecture

System Overview

Deployment Architecture

The system is deployed on GKE with the following node configuration:

  • 3 CPU nodes (e2-medium) for general workloads
  • 1 GPU node (g2-standard-4) with NVIDIA L4 for LLM acceleration

Node Pool Configuration

# CPU Node Pool
gcloud container node-pools create default-pool \
    --machine-type=e2-medium \
    --num-nodes=3

# GPU Node Pool
gcloud container node-pools create l4-gpu-pool \
    --machine-type=g2-standard-4 \
    --accelerator type=nvidia-l4,count=1 \
    --num-nodes=1

Deployment Process

  1. Setup GCP Project
# Set project ID
export PROJECT_ID="tennis-match-predictor"
gcloud config set project $PROJECT_ID
  1. Create GKE Cluster
gcloud container clusters create tennis-predictor-cluster \
    --zone us-central1-a \
    --machine-type g2-standard-4
  1. Deploy Services with Ansible

There are two ways to deploy:

a. Using the deployment script, which first builds and pushes the Docker images for all services, then deploys them to Kubernetes using Ansible.

cd src/deploy
./deploy.zsh

b. Using GitHub Actions:

  • Push to main branch, or
  • Manually trigger the "Deploy to GKE" workflow

The deployment script handles:

  • Building and pushing Docker images for all services
  • Deploying services to Kubernetes using Ansible
  1. Verify Deployment
kubectl get pods -o wide
kubectl get services

Service Endpoints

The application exposes the following endpoints:

  • API Service: http://<external-ip>:8000

    • /predict - Match prediction endpoint
    • /chat - WebSocket chat endpoint
  • Probability Model: Internal service on port 8001

  • LLM Service: Internal service on port 8002

  • Ollama Service: Internal service on port 11434

Monitoring and Maintenance

  1. Check GPU Status
kubectl describe node <gpu-node-name> | grep nvidia
  1. View Service Logs
kubectl logs -f deployment/api
kubectl logs -f deployment/ollama
  1. Monitor Resources
kubectl top nodes
kubectl top pods

Testing

To test the deployed services:

  1. Prediction API
curl -X POST "http://<external-ip>:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{
    "player_a_id": "Novak Djokovic",
    "player_b_id": "Roger Federer",
    "lookback": 10
  }'
  1. Chat API
curl -X POST "http://<external-ip>:8000/chat" \
  -H "Content-Type: application/json" \
  -d '{
    "player_a_id": "Novak Djokovic",
    "player_b_id": "Roger Federer",
    "query": "Who is more likely to win between Federer and Novak?",
    "history": []
  }'

Project Organization

├── README.md
├── src
    ├── ansible/                    # Ansible deployment configuration
    │   ├── inventory/
    │   ├── roles/
    │   └── deploy-k8s.yml
    ├── api/                        # FastAPI application
    ├── llm/                        # LLM service
    ├── probability_model/          # Tennis prediction model
    └── ollama/                     # GPU-accelerated LLM container

Future Improvements

  1. Implement horizontal pod autoscaling (HPA)
  2. Add monitoring with Prometheus and Grafana
  3. Implement CI/CD pipeline for automated deployments
  4. Add backup and disaster recovery procedures

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •