Skip to content

Continue.dev overwriting OLLAMA Keep Alive model #5721

Open
@kleyton67

Description

@kleyton67

Before submitting your bug report

Relevant environment info

- OS: Ubuntu 25.04
- Continue version: 1.1.35 (Tested  on 1.1.10)
- IDE version: 1.99.3
- Model:
- config:
  
completionOptions:
  stream: false
  keepAlive: -1
context:
  - provider: code
  - provider: docs
  - provider: problems
  - provider: repo-map
  - provider: codebase
  - provider: tree
    params:
      nRetrieve: 100
      nFinal: 5
      useReranking: true
  - provider: os
  # - provider: http
  #   params:
  #     url: http://0.0.0.0:35568/continue-search
models:
  - name: olderman-coder
    provider: ollama
    model: qwen2.5-coder:3b
    temperature: 0.1
    apiBase: http://10.28.33.120:11434
    contextLength: 32768
    numCtx: 32768
    top_k: 10
    top_p: 0.95
    seed: 13
    CompletionOptions:
      stream: false
      keepAlive: -1
    roles:
      - autocomplete
      - chat
      - edit
  - name: Nomic Embed Text
    provider: ollama
    model: nomic-embed-text
    apiBase: http://10.28.33.120:11434
    CompletionOptions:
      stream: false
      keepAlive: -1
    roles:
      - embed
  # - name: LLM Reranker
  #     provider: openai
  #     model: gpt-4o
  #     roles:
  #       - rerank
rules:
  - You are an expert programmer that writes concise code and
    smaller explanations.
  
  OR link to assistant in Continue hub:

Description

When i use continue.dev with ollama, Ollama didnt free model from GPU.
When i use openwebui for example it free resources, my Ollama config is:

environment:
    - OLLAMA_KEEP_ALIVE=-1
    - OLLAMA_NUM_PARALLEL=2
    - OLLAMA_MAX_LOADED_MODELS=1
    - OLLAMA_CONTEXT_LENGTH=10240
    - OLLAMA_GPU_OVERHEAD=0
    - OLLAMA_MAX_QUEUE=5
    - OLLAMA_NUM_THREADS=1

Call from continue.dev:

[GIN] 2025/05/18 - 15:49:39 | 200 |  5.643570627s |      172.19.0.1 | POST     "/api/chat"
time=2025-05-18T15:49:39.227Z level=DEBUG source=sched.go:472 msg="context for request finished"
time=2025-05-18T15:49:39.227Z level=DEBUG source=sched.go:342 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/qwen2.5-coder:3b runner.inference=cuda runner.devices=1 runner.size="3.4 GiB" runner.vram="3.4 GiB" runner.num_ctx=16384 runner.parallel=2 runner.pid=28 runner.model=/root/.ollama/models/blobs/sha256-4a188102020e9c9530b687fd6400f775c45e90a0d7baafe65bd0a36963fbb7ba duration=30m0s

Tried to change config.json and config.yaml, nothing had worked.

Solution (Bad way to change it)

So i have changed directly on vscode_extension, i am using ubuntu25.04, so i changed on path : ~/.vscode/extensions/continue.continue-1.1.35-linux-x64;

Steps to change

  1. Open the extension folder in VS Code.
  2. Navigate to the out/extension.js directory within the extension.
  3. Locate the variable options.keepAlive.
  4. Changes all ocurrencies of keep_alive: options.keepAlive ?? 30 * 60 to keepAlive: 1.
  5. Save the changes.

After making these changes, restart VS Code and try running your code again. This should resolve the issue with the keep-alive timeout.
In my case

To reproduce

  1. Call Ollama model using chat of continue.dev

Log output

Metadata

Metadata

Labels

area:configurationRelates to configuration optionskind:bugIndicates an unexpected problem or unintended behavior

Type

No type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions