TTL settings don't seem to have any effect

I have the following in my docker compose file:
```
    environment:
      - CHAT_COMPLETION_BASE_URL=http://192.168.11.27:11434/v1
      - CHAT_COMPLETION_API_KEY=xxx
      # Keep models in memory forever to stop Ollama from hogging all of the VRAM
      - STT_MODEL_TTL=-1
      - TTS_MODEL_TTL=-1
```

If I make a request to the API, I see the GPU VRAM usage go back to zero a few minutes after the request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TTL settings don't seem to have any effect #607

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TTL settings don't seem to have any effect #607

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions