Open
Description
Before submitting your bug report
- I believe this is a bug. I'll try to join the Continue Discord for questions
- I'm not able to find an open issue that reports the same bug
- I've seen the troubleshooting guide on the Continue Docs
Relevant environment info
- OS: Ubuntu 25.04
- Continue version: 1.1.35 (Tested on 1.1.10)
- IDE version: 1.99.3
- Model:
- config:
completionOptions:
stream: false
keepAlive: -1
context:
- provider: code
- provider: docs
- provider: problems
- provider: repo-map
- provider: codebase
- provider: tree
params:
nRetrieve: 100
nFinal: 5
useReranking: true
- provider: os
# - provider: http
# params:
# url: http://0.0.0.0:35568/continue-search
models:
- name: olderman-coder
provider: ollama
model: qwen2.5-coder:3b
temperature: 0.1
apiBase: http://10.28.33.120:11434
contextLength: 32768
numCtx: 32768
top_k: 10
top_p: 0.95
seed: 13
CompletionOptions:
stream: false
keepAlive: -1
roles:
- autocomplete
- chat
- edit
- name: Nomic Embed Text
provider: ollama
model: nomic-embed-text
apiBase: http://10.28.33.120:11434
CompletionOptions:
stream: false
keepAlive: -1
roles:
- embed
# - name: LLM Reranker
# provider: openai
# model: gpt-4o
# roles:
# - rerank
rules:
- You are an expert programmer that writes concise code and
smaller explanations.
OR link to assistant in Continue hub:
Description
When i use continue.dev with ollama, Ollama didnt free model from GPU.
When i use openwebui for example it free resources, my Ollama config is:
environment:
- OLLAMA_KEEP_ALIVE=-1
- OLLAMA_NUM_PARALLEL=2
- OLLAMA_MAX_LOADED_MODELS=1
- OLLAMA_CONTEXT_LENGTH=10240
- OLLAMA_GPU_OVERHEAD=0
- OLLAMA_MAX_QUEUE=5
- OLLAMA_NUM_THREADS=1
Call from continue.dev:
[GIN] 2025/05/18 - 15:49:39 | 200 | 5.643570627s | 172.19.0.1 | POST "/api/chat"
time=2025-05-18T15:49:39.227Z level=DEBUG source=sched.go:472 msg="context for request finished"
time=2025-05-18T15:49:39.227Z level=DEBUG source=sched.go:342 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/qwen2.5-coder:3b runner.inference=cuda runner.devices=1 runner.size="3.4 GiB" runner.vram="3.4 GiB" runner.num_ctx=16384 runner.parallel=2 runner.pid=28 runner.model=/root/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba duration=30m0s
Tried to change config.json and config.yaml, nothing had worked.
Solution (Bad way to change it)
So i have changed directly on vscode_extension, i am using ubuntu25.04, so i changed on path : ~/.vscode/extensions/continue.continue-1.1.35-linux-x64;
Steps to change
- Open the extension folder in VS Code.
- Navigate to the
out/extension.js
directory within the extension. - Locate the variable
options.keepAlive
. - Changes all ocurrencies of
keep_alive: options.keepAlive ?? 30 * 60
tokeepAlive: 1
. - Save the changes.
After making these changes, restart VS Code and try running your code again. This should resolve the issue with the keep-alive timeout.
In my case
To reproduce
- Call Ollama model using chat of continue.dev
Log output
Metadata
Metadata
Assignees
Type
Projects
Status
Todo