Skip to content

Commit 5fcd1f6

Browse files
committed
Set Ray Serve min_replicas to 1 to avoid cold starts
Change Ray Serve deployment min_replicas from 0 to 1 to keep one replica always loaded with the model. This eliminates cold start latency when querying the service. With min_replicas: 0, the 20GB model download and loading would timeout on first request (~3-5 minutes). Keeping 1 replica active provides immediate response times.
1 parent b1b353d commit 5fcd1f6

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

clusters/k3s-stpetersburg/apps/ai/deepseek-ocr/rayservice.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ spec:
1919
deployments:
2020
- name: deepseek-ocr
2121
autoscaling_config:
22-
min_replicas: 0
22+
min_replicas: 1
2323
max_replicas: 2
2424
target_ongoing_requests: 1
2525
upscale_delay_s: 30

0 commit comments

Comments
 (0)