Skip to content

Latest commit

 

History

History
11 lines (9 loc) · 1001 Bytes

File metadata and controls

11 lines (9 loc) · 1001 Bytes

Model Serving

This directory provides components to build and deploy Large Language Model (LLM) serving endpoints.

  • k8s/: Kubernetes manifests for model serving components.
  • images/: Dockerfiles for building model serving container images.
  • dev/tasks: Development-related scripts for model serving.
    • download-model: fetch the required model weights (e.g., Gemma 3 12B IT).
    • build-images: runs download-model, and then build the Docker image using the provided Dockerfile in images/.
    • deploy-to-gke or dev/tasks/deploy-to-kind: runs build-images, and then deploy the model serving Kubernetes manifests to Google Kubernetes Engine (GKE) or a local KinD cluster. Once deployed, the model server will be accessible via a Kubernetes Service defined in the manifest. You can use kubectl get svc to find the service details and access its endpoint.
    • run-local: run the model server locally for testing purposes, bypassing Kubernetes.