Running career-ops locally with Ollama on 16GB RAM #385

mahindra89 · 2026-04-20T08:05:52Z

mahindra89
Apr 20, 2026

Hey everyone! I wanted to share my experience trying to run career-ops fully locally using Ollama, in case it helps others with similar hardware.

My Setup

Windows 11
16GB RAM
Ollama with local models
Claude Code v2.1.114

What I Tried

llama3.1:8b — Got past the login screen but the model couldn't follow the skill files at all. /career-ops just gave generic chatbot responses instead of the proper command menu.

qwen2.5-coder:7b — Same issue. The model was actively hallucinating file contents instead of actually reading them. When I asked it to read modes/_shared.md, it made up fake content instead of reading the real file.

qwen2.5-coder:14b with 32k context (custom Modelfile) — Got further, but took nearly 3 minutes just to respond to a simple file check. Way too slow to be practical for multi-step agentic tasks like scan or auto-pipeline.

Key Findings for Local Setup

The slash command routing doesn't work reliably with smaller models. /career-ops scan triggers a generic response instead of loading the skill files. A workaround is to explicitly say "Read modes/_shared.md and modes/scan.md, then scan for offers" but even then smaller models hallucinate the file contents.
16GB RAM is the hard limit. After Windows overhead you have ~10-12GB free, which means 14b models run but are extremely slow for agentic multi-step tasks. 32b+ models won't fit at all.
The scan mode specifically needs Playwright browser automation which requires very reliable tool use — something 7b and 14b models can't handle consistently.
The Ollama environment variable setup works fine — the login prompt is bypassed correctly with:

ANTHROPIC_AUTH_TOKEN=ollama
ANTHROPIC_API_KEY=""
ANTHROPIC_BASE_URL=http://localhost:11434

but it is not able to use the career ops functionality is there anyone who figured out a way to use Career ops with ollama and keeping everything 100% local?

kenshln47 · 2026-04-23T18:33:48Z

kenshln47
Apr 23, 2026

You’re 100% hitting a memory swap wall. A 14b Q4 model is already eating ~8.5GB, and that 32k context window is bloating your KV cache by another few gigs. Once you factor in Windows 11 overhead and Playwright spinning up a headless browser for the scan task, your 16GB is toast. That 3-minute delay is just your OS paging to disk to keep up.

The hallucinations on 7b/8b models happen because Claude Code’s engine is hardcoded for Anthropic’s XML tool-calling schema (, etc.). Base Llama/Qwen models don’t naturally parse those triggers well, so they just default to "chatbot mode" and make up file contents instead of actually calling the read tool.

To make this work locally on your current setup:

Drop your num_ctx to 8192 in the Modelfile. It’ll stop the paging/lag, though you'll lose some long-context reasoning for big JDs.

Use LiteLLM as a proxy between Claude Code and Ollama. It maps Anthropic’s tool-calling requests into standard OpenAI formats that models like qwen2.5-coder actually understand. This usually fixes the hallucination issues where the model "fakes" reading a file.

Honestly, 16GB is barely enough to keep the OS and a browser happy, let alone an agentic loop with Playwright. If LiteLLM doesn't cut it, you're probably better off using something like Aider that's built for local LLMs from the ground up.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Running career-ops locally with Ollama on 16GB RAM #385

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Running career-ops locally with Ollama on 16GB RAM #385

Uh oh!

mahindra89 Apr 20, 2026

My Setup

What I Tried

Key Findings for Local Setup

Replies: 1 comment

Uh oh!

kenshln47 Apr 23, 2026

mahindra89
Apr 20, 2026

kenshln47
Apr 23, 2026