Description
Describe the bug
When running examples/gsm8k/gsm8.pdl with the full 1319 iterations, PDL tries to submit all 1319 completions at nearly the same time.
Sometimes Ollama logs 503, which is "Service Unavailable"
[GIN] 2025/03/06 - 11:42:21 | 503 | 42.044125ms | 127.0.0.1 | POST "/api/generate"
[GIN] 2025/03/06 - 11:42:21 | 503 | 43.173125ms | 127.0.0.1 | POST "/api/generate"
[GIN] 2025/03/06 - 11:42:21 | 503 | 44.790209ms | 127.0.0.1 | POST "/api/generate"
[GIN] 2025/03/06 - 11:42:21 | 503 | 45.941833ms | 127.0.0.1 | POST "/api/generate"
Also, PDL logs the following message:
gsm8.pdl:26 - Error during 'ollama/granite3.2:8b' model call: litellm.APIConnectionError: OllamaException - litellm.Timeout: Connection timed out after 600.0 seconds.
Failure generating the trace: Error during 'ollama/granite3.2:8b' model call: litellm.APIConnectionError: OllamaException - litellm.Timeout: Connection timed out after 600.0 seconds.
This suggests that LiteLLM or Ollama limits us to 10 minutes for a response, even for the 1319th entry, which won't be ready until the other 1318 entries were processed -- taking over an hour.
Also, Ollama logs the following message:
[GIN] 2025/03/06 - 12:10:19 | 500 | 9m59s | 127.0.0.1 | POST "/api/generate"
time=2025-03-06T12:10:20.053-05:00 level=INFO source=server.go:727 msg="aborting completion request due to client closing the connection"
when running with 256 iterations, suggesting that LiteLLM or PDL gives up after 10 minutes and does not accept the response that Ollama finally generates.
To Reproduce
Edit gsm8.pdl to have MAX_ITERATIONS: 1319
and run gsm8.pdl.
Expected behavior
Perhaps PDL or LiteLLM should retry 503s after some delay?
Desktop (please complete the following information):
- OS: Mac M3
- Version Ollama 0.5.13