Running career-ops locally with Ollama on 16GB RAM #385
Replies: 1 comment
-
|
You’re 100% hitting a memory swap wall. A 14b Q4 model is already eating ~8.5GB, and that 32k context window is bloating your KV cache by another few gigs. Once you factor in Windows 11 overhead and Playwright spinning up a headless browser for the scan task, your 16GB is toast. That 3-minute delay is just your OS paging to disk to keep up. The hallucinations on 7b/8b models happen because Claude Code’s engine is hardcoded for Anthropic’s XML tool-calling schema (, etc.). Base Llama/Qwen models don’t naturally parse those triggers well, so they just default to "chatbot mode" and make up file contents instead of actually calling the read tool. To make this work locally on your current setup: Drop your num_ctx to 8192 in the Modelfile. It’ll stop the paging/lag, though you'll lose some long-context reasoning for big JDs. Use LiteLLM as a proxy between Claude Code and Ollama. It maps Anthropic’s tool-calling requests into standard OpenAI formats that models like qwen2.5-coder actually understand. This usually fixes the hallucination issues where the model "fakes" reading a file. Honestly, 16GB is barely enough to keep the OS and a browser happy, let alone an agentic loop with Playwright. If LiteLLM doesn't cut it, you're probably better off using something like Aider that's built for local LLMs from the ground up. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey everyone! I wanted to share my experience trying to run career-ops fully locally using Ollama, in case it helps others with similar hardware.
My Setup
What I Tried
llama3.1:8b — Got past the login screen but the model couldn't follow the skill files at all.
/career-opsjust gave generic chatbot responses instead of the proper command menu.qwen2.5-coder:7b — Same issue. The model was actively hallucinating file contents instead of actually reading them. When I asked it to read
modes/_shared.md, it made up fake content instead of reading the real file.qwen2.5-coder:14b with 32k context (custom Modelfile) — Got further, but took nearly 3 minutes just to respond to a simple file check. Way too slow to be practical for multi-step agentic tasks like scan or auto-pipeline.
Key Findings for Local Setup
The slash command routing doesn't work reliably with smaller models.
/career-ops scantriggers a generic response instead of loading the skill files. A workaround is to explicitly say "Read modes/_shared.md and modes/scan.md, then scan for offers" but even then smaller models hallucinate the file contents.16GB RAM is the hard limit. After Windows overhead you have ~10-12GB free, which means 14b models run but are extremely slow for agentic multi-step tasks. 32b+ models won't fit at all.
The
scanmode specifically needs Playwright browser automation which requires very reliable tool use — something 7b and 14b models can't handle consistently.The Ollama environment variable setup works fine — the login prompt is bypassed correctly with:
but it is not able to use the career ops functionality is there anyone who figured out a way to use Career ops with ollama and keeping everything 100% local?
Beta Was this translation helpful? Give feedback.
All reactions