Local and arbitrary model support #9619

zachlloyd · 2026-04-30T15:44:40Z

zachlloyd
Apr 30, 2026
Maintainer

We are trying to figure out the best way to implement local model support and I wanted to start a discussion on our different potential approaches to see what resonates most with the community.

The reason local model support is not trivial for us to implement is that our harness is split between our client (rust, open-source) and server (golang, not currently open). Moving the harness to be entirely on the client is a fair amount of work.

The options we are considering here (not mutually exclusive):

Port the entire harness to the client and open-source it (most work).
Implement Warp as an ACP client and allow folks to use other harnesses within our rich terminal UI.
Implement a new rust-based "lite" local harness that speaks the same protocol that our client understands but supports local models and arbitrary endpoints.
Route local model requests through our server and back to the client via something like ngrok (hacky, but quick).

Questions on my mind:

How important to users is it to use our real harness as opposed to a harness that works as an ACP server?
How important is it that local model requests are truly local, with no server interaction?
Which aspects of our UI are most important for folks wanting local model support?

djdanielsson · 2026-04-30T16:36:10Z

djdanielsson
Apr 30, 2026

Answers to the Questions part: I am not sure I have a good answer for number 1 and 3 but 2 it's extremely important to me that local model is truly local that is likely the point of why I am using a local model to start with for that task. I haven't been using warp for a long time because it wasn't open source so I do not have many thoughts on what parts of the UI I want at this time for local models, the little I have done with like Claude cli and the information that the UI provides for that is really nice and to have something like that for local models would be cool but it might depend on what harness people are using idk.

2 replies

zachlloyd Apr 30, 2026
Maintainer Author

super helpful and tracks with what i expected

djdanielsson Apr 30, 2026

going to the options stuff I am kinda interested in number 2 the most I think, I personally want to control my harness and the context I am feeding my agents vs using just another harness.

FFatTiger · 2026-04-30T17:00:29Z

FFatTiger
Apr 30, 2026

No server transit for any requests

1 reply

harry-xm May 1, 2026

This. At my company it's a policy violation to use something like ngrok.

officiallymarky · 2026-04-30T17:04:52Z

officiallymarky
Apr 30, 2026

Warp looks really cool, but the fact it only worked with cloud models always was a deal breaker for me. I would love to have full local AI support for not only for coding, but for the terminal agent when reacting to commands.

0 replies

phidauex · 2026-04-30T17:08:27Z

phidauex
Apr 30, 2026

Thanks for opening up some discussion!

I think for your options, #1 is most appealing, but yes, more work. I'd like to think it could also help your architecture long term by relying less on scaling server side components along with client components. #2 is a bit hacky but could be quick. I connect to my Hermes agent using OpenWebUI because it is nicer than the raw terminal client - I'd connect to it through Warp if it were an option, but that doesn't really support Warp being a standalone tool. #3 could be most practical because it would be fully local, not require a second tool, and for most local models, not being feature-complete would be OK. The smaller context windows mean fewer turns, fewer tools available, etc. But my use case in Warp would mostly be "uh help me remember how this command is used" not "build out an entire ansible deployment for a lab". #4 is probably not worth doing - if someone wants to use a local model, its because they want it local.

I'd rank them - 1, 3, 2, 4

For the questions:

I'm split because I use another agent tool already. However for new users, only having it work if you already have another harness running feels duplicative.
Quite important - I either have no/patchy internet at the time I'm working, or I'm doing something that I need to keep local for confidentiality reasons, and in either case, server routing breaks the whole point.
I don't use the most advanced tools in Warp now, so for me, the "advanced command completion" and inline planning directly in the terminal UI is what I use, and would love to use with a local model that would be fully competent at that.

1 reply

zachlloyd Apr 30, 2026
Maintainer Author

extremely helpful as we think this through

mastertyko · 2026-04-30T17:45:42Z

mastertyko
Apr 30, 2026

My vote is strongly for option 1.

If Warp supports local or arbitrary models, I think it should mean truly local execution with no server transit. I understand that porting the harness to the client and open sourcing it is the most work, but it seems like the right long term architecture for privacy, offline use, trust, and extensibility.

Thanks!

0 replies

crazygamerZ783 · 2026-04-30T18:02:59Z

crazygamerZ783
Apr 30, 2026

hey got a warp fork of my own ,trying to get the ollama support work repo:https://github.com/crazygamerZ783/warp-ollama
i would appreciate a bit of help

3 replies

rozsazoltan May 1, 2026

The repo feels a bit strange because the original Git history is missing, and the first commit doesn't show what you actually changed.

crazygamerZ783 May 1, 2026

it was probably because of github glitches

regismesquita May 1, 2026

ollama supports openai-compatible as far as I remember, and there are already some implementations out there supporting that.

VicZhang6 · 2026-04-30T19:45:17Z

VicZhang6
Apr 30, 2026

Honestly, I want to use DeepSeek V4 Flash inside Warp — it’s cheap, and it allows me to interact with the terminal using natural language.

1 reply

jensenojs May 2, 2026

same here, Although I can understand from a business logic perspective why supporting an open method of simply providing a URL + API key is not allowed, from a user demand standpoint, I think it would be more natural.

FunkyFresh67 · 2026-04-30T20:07:06Z

FunkyFresh67
Apr 30, 2026

Local models in Ollama or similar should be configurable as sources within Warp. Once available, you should be able to select a model during a session, either manually or by directing Warp to use it automatically based on the task or preference.

0 replies

FelixZoe · 2026-04-30T22:13:37Z

FelixZoe
Apr 30, 2026

6666

0 replies

regismesquita · 2026-05-01T13:23:00Z

regismesquita
May 1, 2026

There are already a handful of "local warp server" implementations on your PR list , and on the wild forking from this repo.

People just want to be able to use a software that they really like (warp) without going through something that they don't need (your servers). we might end up with some opensource spin-off leading this if you don't just release a minimalist opensource server that simply allows people to use warp with a openai-compatible upstream.

In the future you can add something feature-rich and supporting a bunch of stuff... but for now people just want to use warp and remote models without touching someone else servers.

0 replies

apetti1920 · 2026-05-01T15:33:28Z

apetti1920
May 1, 2026

Model selection should be allowed to be

local (lm studio, ollama, etc) 2. also allowed to be configurable and tiered, small model for cmd suggestions (with bash history injection) as well as large for agent interactions

0 replies

zachlloyd · 2026-05-01T15:46:10Z

zachlloyd
May 1, 2026
Maintainer Author

All this feedback makes sense. We will have a proposed solution here shortly.

0 replies

bernardodsanderson · 2026-05-01T18:47:14Z

bernardodsanderson
May 1, 2026

I am mostly interested as I want to use one source of models (openrouter/GLM Coding Plan) for it.

0 replies

chukwunonsomichael189-boop · 2026-05-02T00:52:14Z

chukwunonsomichael189-boop
May 2, 2026

Warp looks really cool, but the fact it only worked with cloud models always was a deal breaker for me. I would love to have full local AI support for not only for coding, but for the terminal agent when reacting to commands.

0 replies

chukwunonsomichael189-boop · 2026-05-02T00:53:03Z

chukwunonsomichael189-boop
May 2, 2026

There are already a handful of "local warp server" implementations on your PR list , and on the wild forking from this repo.

People just want to be able to use a software that they really like (warp) without going through something that they don't need (your servers). we might end up with some opensource spin-off leading this if you don't just release a minimalist opensource server that simply allows people to use warp with a openai-compatible upstream.

In the future you can add something feature-rich and supporting a bunch of stuff... but for now people just want to use warp and remote models without touching someone else servers.

0 replies

AkikoOrenji · 2026-05-11T00:15:46Z

AkikoOrenji
May 11, 2026

Came here as heard Warp now supporting Windows. Installed and then immediately uninstalled after realising the product is effectively useless without sign up and sending data to yet another provider. With most workstation level laptops now coming with a dedicated NPU or GPU with a few gig of VRAM (even shared RAM is OK for lower end qwen models) people may as well make use of to make terminal life easier. Personally just want an AI shell for system management and basic automation:

Handling of remote sessions including auto switching from Windows powershell to whatever shell the remote system has. Can't install anything on the remote end so the ability for the harness to recognise the change in shell and keep operating.
Ability for AI to log into remote system and carry out tasks at direction with strong granular guardrails (regex).
Password hand-off back to user on logon for password input (either via SSH key password or remote user password)
Automatic detection of password prompts in remote session for hand-off e.g. sudo, keystores, secondary SSH connections.
The diff tool would be good for other tasks such as comparing configurations before and after changes.
Basically any use case where remote desktop is not available but complex operations need to be carried out e.g. modifying registry keys
generation of quick single shell one liners to manipulate and write data. what’s that sed or awk flag i needed.
generate complex powershell without consulting reams of documentation.

Not interested in coding capabilities as use other tools for that.

1 and 3 are the better options. Given there are already forks in the wild why not encourage them to PR (if they haven't already)

0 replies

smthpickboy · 2026-05-12T03:30:23Z

smthpickboy
May 12, 2026

It's frustrating how big companies constantly try to seize control of your computer and data. LLMs and their harnesses should act as assistants to the terminal, not as supervisors. If Warp continues with its closed mindset, open alternatives like OpenWarp or other terminal+LLM apps will thrive and take its place.

1 reply

mz135135 May 19, 2026

+1

Mikewhodat · 2026-05-15T02:25:03Z

Mikewhodat
May 15, 2026

If I can contribute in any way whatsoever whether it's bug bounty for your project or contributing code. Please let me know I'd be highly interested. I have a personal vendetta against warp. I was just thinking along the lines of instead of starting. My own repository in starting this whole task, out from scratch, I would join the community.

This is something that I did not do with deep seek

0 replies

compgeniuses · 2026-05-18T17:33:41Z

compgeniuses
May 18, 2026

now that's music to the ears,Also add ability to fetch model, for /models endpoint, as well as context windows, and whether model has vision or not.

0 replies

petradonka · 2026-05-22T17:09:43Z

petradonka
May 22, 2026
Collaborator

Quick update here: we’ve shipped two related pieces of this work.

BYOK is now available on the Free plan for individual users, and Warp now supports custom inference endpoints compatible with the OpenAI Chat Completions API.

That means you can use your own OpenAI, Anthropic, or Google API key, or connect Warp to an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, a gateway, or a similar setup.

Docs:

BYOK: https://docs.warp.dev/agent-platform/inference/bring-your-own-api-key/
Custom inference endpoints: https://docs.warp.dev/agent-platform/inference/custom-inference-endpoint/

Fully client-side local model support is still a separate direction. We’re planning a lightweight local client harness so Warp can connect directly to local models without routing through Warp’s servers, and we’re also planning support for Agent Client Protocol so developers can bring other harnesses into Warp’s terminal UI.

If you try the new flow and hit a specific issue with a provider, endpoint, or model, please open a focused GitHub issue with the details so we can track it directly.

9 replies

petradonka May 22, 2026
Collaborator

Thanks for the report - could you please open a GitHub issue with anything you may be running into? It'll be easier for us to get these fixed that way!

regismesquita May 22, 2026

So you are saying that it was supposed to accept internal ips and non-https endpoints and that something is wrong?

petradonka May 22, 2026
Collaborator

No, internal IPs wouldn't work — that'll need the fully client-side model support I mentioned. Whether https should be required, I'm not certain off the top of my head, I could see it being a requirement over the public internet.

regismesquita May 22, 2026

got it, I can see you edit the comment now, I will wait for the local client, thanks!

gigberg Jun 10, 2026

So why when I use openrouter endpoint ， /agent mode still not work with error:

who are you
I'm sorry, I couldn't complete that request.

Request failed with error: ErrorStatus(403, "{\"error\":\"Your account has been blocked from using AI features. If you think this is in error, please contact appeals@warp.dev. Otherwise, please upgrade to a paid plan at https://app.warp.dev/upgrade.\"}")

jleivo · 2026-05-24T15:18:13Z

jleivo
May 24, 2026

I was about to test this, with the understanding that our internal LiteLLM proxy works, but as it turns out it needs to be internet accessible - which is understandable if the traffic bounces through your services. I will wait for the version that does not require publicly accessible endpoint.

though,I got to say, this page https://docs.warp.dev/agent-platform/inference/custom-inference-endpoint/ states
Custom inference endpoint | Connect Warp to an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. | Free and all eligible paid plans
"internal gateway" is not yet true then.

0 replies

bandesz · 2026-05-24T18:48:05Z

bandesz
May 24, 2026

If you want to test your local LLM before Warp adds support, I used the free tier of Cloudflare Zero Trust Connectors (it's a tunnel that you can run in a Docker container) to make my local LM Studio available publicly (with authentication of course). You'll need a domain though.

Cloudflare turns on "Block AI training bots" by default for your domain, I had to disable that, otherwise Warp was getting 403s.

0 replies

alizehctl · 2026-05-26T16:42:06Z

alizehctl
May 26, 2026

This discussion in linked in #8759, but I don't see any mention of support for using ChatGPT Pro/Plus subscriptions. It would be really great to have this on the roadmap. Not everyone can afford paying per-token with API keys.

Echoing @Patrik88:

I want to use Warp as my full-time coding agent, but relying on standard per-token API keys (BYOK) is too expensive for heavy daily use. Implementing a flexible provider layer—similar to the open-source @mariozechner/pi-ai library—would solve this perfectly.

0 replies

Cznorth · 2026-05-27T11:15:20Z

Cznorth
May 27, 2026

For folks who need fully self-hosted local/remote model routing today (not just client-side harness migration), one option outside Warp ecosystem: WinkTerm — open-source AI terminal where AI and user share the same PTY. Bring your own API key, type # at the prompt for in-terminal chat, agent pre-fills commands and you press Enter.

Docker deploy, SSH/SFTP, HTTP Agent API, MIT: https://github.com/Cznorth/winkterm

Different product (web self-hosted vs native terminal), but relevant if local-model support timeline matters for your workflow.

0 replies

trigger2k20 · 2026-06-12T05:48:15Z

trigger2k20
Jun 12, 2026

Looking forward to use, any new status on fully client-side local model support or lightweight local client harness so Warp can connect directly to local models without routing trough external (as we say Kirche ums Dorf bringen - Bringing the church around the village) ?

0 replies

man-brain · 2026-06-17T16:43:18Z

man-brain
Jun 17, 2026

Cool to see such discussion. Looking forward to have this feature in warp! I wish I saw it earlier because I've spent quiet a lot of time implementing local agent in https://github.com/man-brain/warp/tree/local-agent. It works, I use it day to day and rebase regularly but I don't really want to maintain a fork.

How important is it that local model requests are truly local, with no server interaction?

It's the most important thing.

The options we are considering here (not mutually exclusive):
3. Implement a new rust-based "lite" local harness that speaks the same protocol that our client understands but supports local models and arbitrary endpoints.

I finished with implementing option 3 but before that I tried two other approaches: my own implementation of Warp API (it was very tempting not to edit the warp code at all) and replacing the client to Warp API. It didn't work since the API is not a thin proxy, it has a lot of logic including agent loop and custom tools.

0 replies

casey-dement · 2026-06-19T18:17:16Z

casey-dement
Jun 19, 2026

Late to the party, but here's my $0.02:

Regarding the 4 suggested options:

The options we are considering here (not mutually exclusive):

Options 1 and 3 seem essentially equivalent assuming the configuration of the local harness for option 3 is just a setting in Warp vs some additional installation and configuration task. so I'd welcome either with equal satisfaction.
Option 2 is potentially workable, but seems like it would be more complex to manage so I'd rank this below options 1 & 3.
Option 4 is probably as much a deal breaker for most users as the current requirement for public access - it simplifies the implementation but doesn't change the core problem.

officiallymarky
Jun 19, 2026

Is it possible to use local models yet?
Last I tried, I get errors using local IP space behind my SSL domains and it doesn't support non-SSL. So I have no way to point to my local VLLM server w/ SSL without exposing it to the Internet.

0 replies

PeterPro-grammer · 2026-06-22T19:57:26Z

PeterPro-grammer
Jun 22, 2026

I appreciate the work on Custom Inference, but the current architecture makes local setups difficult to use.

From the explanation in this thread, requests are routed through Warp's backend rather than being sent directly from the client. In my case, this prevents Warp from connecting to a local load balancer on a private network (http://192.168.1.x/).

I also don't think using tools like ngrok is a practical solution. One of the main reasons for running local models is to keep everything local, including network traffic.

A direct client-side connection mode that can communicate with localhost and private IPs without going through Warp's infrastructure would make this feature far more useful for users running self-hosted models.

1 reply

officiallymarky Jun 22, 2026

I appreciate the work on Custom Inference, but the current architecture makes local setups difficult to use.

From the explanation in this thread, requests are routed through Warp's backend rather than being sent directly from the client. In my case, this prevents Warp from connecting to a local load balancer on a private network (http://192.168.1.x/).

I also don't think using tools like ngrok is a practical solution. One of the main reasons for running local models is to keep everything local, including network traffic.

A direct client-side connection mode that can communicate with localhost and private IPs without going through Warp's infrastructure would make this feature far more useful for users running self-hosted models.

I am having the same issue, I believe they said this is to be expected but will change when they make a fully local option available. I'm still waiting to hear back on this as well.

GeekLuffy · 2026-06-26T14:44:19Z

GeekLuffy
Jun 26, 2026

Hey everyone! Just wanted to share a quick update on this. I was looking into custom/local endpoints and ended up setting up support for custom OpenAI-compatible base URLs in the BYOK settings.

If anyone is interested in how it works or wants to check out the implementation, I put it together in a PR here: [PR]

This basically lets you route standard OpenAI models to a custom URL (like a local proxy, enterprise gateway, or Copilot). Hopefully, it's a helpful step for anyone looking for a similar setup!

4 replies

officiallymarky Jun 27, 2026

Hey everyone! Just wanted to share a quick update on this. I was looking into custom/local endpoints and ended up setting up support for custom OpenAI-compatible base URLs in the BYOK settings.

If anyone is interested in how it works or wants to check out the implementation, I put it together in a PR here: [PR]

This basically lets you route standard OpenAI models to a custom URL (like a local proxy, enterprise gateway, or Copilot). Hopefully, it's a helpful step for anyone looking for a similar setup!

I have no problems setting up a custom end point, the problem is it is routed through Warp's servers, so private ip space fails and only works if you expose your llm server to the Internet. Unless I am missing something.

GeekLuffy Jun 27, 2026

You're spot on! Yes, the requests are still proxied through Warp's backend, so direct localhost or private subnet IPs will still fail unless you tunnel them (using something like ngrok or Cloudflare Tunnels) to give them a public endpoint.

The main goal of this PR is to make it easy to use custom, publicly accessible OpenAI-compatible endpoints (like Azure OpenAI instances, enterprise gateways, or services like Copilot) without having to manually define all standard OpenAI models as "custom models" and break built-in presets/presets.

For true local-only inference (direct client-to-localhost connection), Warp would need to support a client-side execution path, which is a much larger architectural change

officiallymarky Jun 27, 2026

You're spot on! Yes, the requests are still proxied through Warp's backend, so direct localhost or private subnet IPs will still fail unless you tunnel them (using something like ngrok or Cloudflare Tunnels) to give them a public endpoint.

The main goal of this PR is to make it easy to use custom, publicly accessible OpenAI-compatible endpoints (like Azure OpenAI instances, enterprise gateways, or services like Copilot) without having to manually define all standard OpenAI models as "custom models" and break built-in presets/presets.

For true local-only inference (direct client-to-localhost connection), Warp would need to support a client-side execution path, which is a much larger architectural change

They have said they are working on it, but that will take longer. Setting up a custom endpoint seems very easy though, if I was using a public one I don't think I'd have any issues setting it up as is.

GeekLuffy Jun 27, 2026

Yeah, for a single public endpoint, the custom endpoint modal definitely works! The main convenience here is avoiding having to manually redefine every standard model and tweak your presets if you're pointing standard OpenAI models to a proxy or gateway (like Azure or Copilot)

Uh oh!

Local and arbitrary model support #9619

Uh oh!

zachlloyd Apr 30, 2026 Maintainer

Replies: 41 comments · 45 replies

Uh oh!

Uh oh!

Uh oh!

zachlloyd Apr 30, 2026 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zachlloyd Apr 30, 2026 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zachlloyd May 1, 2026 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

petradonka May 22, 2026 Collaborator

Uh oh!

petradonka May 22, 2026 Collaborator

Uh oh!

zachlloyd
Apr 30, 2026
Maintainer

Replies: 41 comments 45 replies

zachlloyd Apr 30, 2026
Maintainer Author

zachlloyd Apr 30, 2026
Maintainer Author

zachlloyd
May 1, 2026
Maintainer Author

petradonka
May 22, 2026
Collaborator

petradonka May 22, 2026
Collaborator