This project sets up LiteLLM on AWS ECS using Fargate
- AWS CLI configured with appropriate permissions
- Terraform v1.0+
- Docker with buildx support
- AWS account with:
- ECS, ECR, IAM, VPC permissions
-
Clone and Setup
git clone <your-repo-url> cd litellm_ecs-deployment
-
Configure AWS
- Install AWS CLI and configure your credentials
- Ensure you have permissions for ECS, ECR, IAM roles, VPC, etc
your cluster should run the task definition in AWS console
Simply put, ECS runs litellm this way:
Cluster -> Service -> Task
you can have multiple task definitions / services under one cluster (prod, staging, dev environments)
- use ECS Fargate for serverless container execution (use EC2 if you prefer manual control)
- build your own docker image and push it to ECR (Elastic Container Registry)
- store api keys in aws secret manager
- host on a VPC across different subnets (public / private ip)
- add an application load balancer (optional)
- use cloud watch for monitoring container logs
Edit config.yaml
to configure your LLM providers and settings.
required env variables in taskdefinition:
DATABASE_URL=<postgres://>
LITELLM_MASTER_KEY="sk-1234" # should start with sk
LITELLM_SALT_KEY="secure-hash-key" # store creds in your db
store api keys in AWS Secrets Manager secrets.tf and also add them in taskdefinition
Modify provider.tf
if using different region/profile.
-
Initialize Terraform
terraform init
-
Plan Deployment
terraform plan
-
Apply Infrastructure
terraform apply
-
Build & Deploy Application
./build.sh
This builds your Docker image on linux/amd64, pushes to ECR and triggers ECS deployment. you can use this command to force an update to ECS service
create a new directory in AWS Cloudwatch and add it to taskdefinition.tf
example: "awslogs-group": "/ecs/litellm",
view logs in cloudwatch
- Service Status: Check ECS console or use AWS CLI
- Load Balancer: Monitor ALB metrics in CloudWatch
- Run
terraform validate
to check syntax - Use
terraform state list
to inspect resources
- API keys are stored in AWS Secrets Manager
- For production: Add SSL certificate, restrict IP ranges, use WAF
-
Scaling:
- Adjust
desired_count
inservice.tf
- Set cpu = num_workers
- Don't use static memory limits when you configure CPUs to scale
- Adjust
-
Resources: Modify CPU/memory in
taskdefinition.tf
-
Networking: Update VPC/subnets in
vpc.tf
-
Health Checks: Configure in ALB target group