Chatbox AI app backend support #1047

magikRUKKOLA · 2025-04-05T00:12:07Z

magikRUKKOLA
Apr 5, 2025

In case someone would want to use ktransformers (multiple ones, running at different machines) as a backend for apps like Chatbox AI that support Ollama API they could use the following.

[EDIT] (link to the latest version of nginx config updated)
chatboxai/chatbox#2221 (comment)

the picture below shows the usage of the llms from three different backends (ollama and 2 ktransformers).

vibe-Chen · 2025-04-07T15:41:22Z

vibe-Chen
Apr 7, 2025

Is there a way to fold the <think> produced by R1 series? My local deployment is too thinkable that its <think> is longer than its answer...

6 replies

vibe-Chen Apr 12, 2025

They DO have a <think> folding support, but is only available for certain (builtin) model providers, not for custom providers.
TBH I have no idea why they decided to do so... whatever. And if there are any other better frontend, please let me know.
Yet redirecting Deepseek API requests to local to enable the builtin <think> folding feature is possible, it shouldn't be the right solution. More annoying is that it's yet open source, thus I don't wanna to even issue it...

magikRUKKOLA Apr 12, 2025
Author

They DO have a folding support, but is only available for certain (builtin) model providers, not for custom providers. TBH I have no idea why they decided to do so... whatever. And if there are any other better frontend, please let me know. Yet redirecting Deepseek API requests to local to enable the builtin folding feature is possible, it shouldn't be the right solution. More annoying is that it's yet open source, thus I don't wanna to even issue it...

better frontend? well, yes. actually i am using mods everywhere.
https://github.com/charmbracelet/mods

its non-gui app but works on linux and android (via termux).

vibe-Chen Apr 17, 2025

With v1.11.12, Chatbox WILL fold <think> now regardless provider. Though I may prefer open-webui for now...

magikRUKKOLA May 5, 2025
Author

I figured it out. The problem was that the content inside the tags was outputted from ktransformers as 'content'. The fix was to use 'reasoning_content' instead of just 'content':

{"id":"xxxx-xxxx-xxxx-xxxxxxxx","object":"chat.completion.chunk","created":1746465931,"model":"deepseek-reasoner","system_fingerprint":"fp_xxxx","choices":[{"index":0,"delta":{"role":"assistant","content":null,"reasoning_content":""},"logprobs":null,"finish_reason":null}]}

magikRUKKOLA May 5, 2025
Author

Also I understood why they used a separate json entity -- to skip sending the thinking content back to the llm inside the conversation while concatenating the messages. This way only the regular content is getting sent to the backend overall reducing the used context of the llm. The downside is that one would have to wait longer if the backend support the kv-cache state caching. But that's okay.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chatbox AI app backend support #1047

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Chatbox AI app backend support #1047

magikRUKKOLA Apr 5, 2025

Replies: 1 comment · 6 replies

vibe-Chen Apr 7, 2025

vibe-Chen Apr 12, 2025

magikRUKKOLA Apr 12, 2025 Author

vibe-Chen Apr 17, 2025

magikRUKKOLA May 5, 2025 Author

magikRUKKOLA May 5, 2025 Author

magikRUKKOLA
Apr 5, 2025

Replies: 1 comment 6 replies

vibe-Chen
Apr 7, 2025

magikRUKKOLA Apr 12, 2025
Author

magikRUKKOLA May 5, 2025
Author

magikRUKKOLA May 5, 2025
Author