Team Members
- Itamar Belson
- Kenny Chen
- Sam Crowder
- Clay Coleman
Group Name
MSMBAllstars
Our project develops a machine learning application that predicts tennis match outcomes using historical ATP match data. The system combines an LSTM-based prediction model with an LLM-powered chat interface for user interaction.
For this milestone, we've implemented a robust Kubernetes deployment on Google Cloud Platform (GCP) with the following key features:
-
Kubernetes Cluster Architecture
- Multi-node GKE cluster with both CPU and GPU nodes
- GPU node pool using NVIDIA L4 GPUs for LLM acceleration
- Load balancing and auto-scaling capabilities
- Resource optimization across nodes
-
Service Components
- API Service (FastAPI)
- Probability Model Service (Tennis prediction model)
- LLM Service (Chat interface)
- Ollama Service (GPU-accelerated LLM model)
-
Infrastructure as Code
- Ansible-based deployment automation
- Kubernetes manifests for all services
- GPU resource management and scheduling
- Container orchestration and scaling
-
GPU Acceleration
- NVIDIA device plugin integration
- GPU-optimized Ollama container
- Efficient resource allocation for ML workloads
-
ML Pipeline
- Single pipeline for preprocessing (see
run_pipeline.sh
in root) - Training on GCP Vertex AI and sweep optimization on Weights & Biases
- Deployment of model only if passes validation metric threshold
- Single pipeline for preprocessing (see
The system is deployed on GKE with the following node configuration:
- 3 CPU nodes (e2-medium) for general workloads
- 1 GPU node (g2-standard-4) with NVIDIA L4 for LLM acceleration
# CPU Node Pool
gcloud container node-pools create default-pool \
--machine-type=e2-medium \
--num-nodes=3
# GPU Node Pool
gcloud container node-pools create l4-gpu-pool \
--machine-type=g2-standard-4 \
--accelerator type=nvidia-l4,count=1 \
--num-nodes=1
- Setup GCP Project
# Set project ID
export PROJECT_ID="tennis-match-predictor"
gcloud config set project $PROJECT_ID
- Create GKE Cluster
gcloud container clusters create tennis-predictor-cluster \
--zone us-central1-a \
--machine-type g2-standard-4
- Deploy Services with Ansible
There are two ways to deploy:
a. Using the deployment script, which first builds and pushes the Docker images for all services, then deploys them to Kubernetes using Ansible.
cd src/deploy
./deploy.zsh
b. Using GitHub Actions:
- Push to main branch, or
- Manually trigger the "Deploy to GKE" workflow
The deployment script handles:
- Building and pushing Docker images for all services
- Deploying services to Kubernetes using Ansible
- Verify Deployment
kubectl get pods -o wide
kubectl get services
The application exposes the following endpoints:
-
API Service:
http://<external-ip>:8000
/predict
- Match prediction endpoint/chat
- WebSocket chat endpoint
-
Probability Model: Internal service on port 8001
-
LLM Service: Internal service on port 8002
-
Ollama Service: Internal service on port 11434
- Check GPU Status
kubectl describe node <gpu-node-name> | grep nvidia
- View Service Logs
kubectl logs -f deployment/api
kubectl logs -f deployment/ollama
- Monitor Resources
kubectl top nodes
kubectl top pods
To test the deployed services:
- Prediction API
curl -X POST "http://<external-ip>:8000/predict" \
-H "Content-Type: application/json" \
-d '{
"player_a_id": "Novak Djokovic",
"player_b_id": "Roger Federer",
"lookback": 10
}'
- Chat API
curl -X POST "http://<external-ip>:8000/chat" \
-H "Content-Type: application/json" \
-d '{
"player_a_id": "Novak Djokovic",
"player_b_id": "Roger Federer",
"query": "Who is more likely to win between Federer and Novak?",
"history": []
}'
├── README.md
├── src
├── ansible/ # Ansible deployment configuration
│ ├── inventory/
│ ├── roles/
│ └── deploy-k8s.yml
├── api/ # FastAPI application
├── llm/ # LLM service
├── probability_model/ # Tennis prediction model
└── ollama/ # GPU-accelerated LLM container
- Implement horizontal pod autoscaling (HPA)
- Add monitoring with Prometheus and Grafana
- Implement CI/CD pipeline for automated deployments
- Add backup and disaster recovery procedures