Integration with Hugging Face endpoints #1082

valerio-bozzolan · 2025-06-04T09:41:09Z

valerio-bozzolan
Jun 4, 2025

(Premise: I'm totally n00b about AI, and about Cheshire 😸 I'm just onboarding)

Dear Cheshire hackers,

I'm trying to integrate Cheshire with literally whatever non-proprietary AI (e.g. avoid ChatGPT) and so I tried to connect Cheshire with Hugging Face, since it seems an interesting "neutral/agnostic AI proxy".

As far as I understand, it's possible, e.g. the "Playground" of Hugging Face supports multiple stuff:

https://huggingface.co/playground?modelId=meta-llama/Llama-3.3-70B-Instruct

Hugging Face basically calls https://router.huggingface.co/sambanova/v1/chat/completions under the hood.

My Hugging Face token works with cURL:

curl https://router.huggingface.co/sambanova/v1/chat/completions \
    -H 'Authorization: Bearer SECRET_OMISSIS' \
    -H 'Content-Type: application/json' \
    -d '{
        "messages": [
            {
                "role": "user",
                "content": "What is the capital of France?"
            }
        ],
        "model": "Meta-Llama-3.1-8B-Instruct",
        "stream": false
    }'

Respose:

{"choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"The capital of France is Paris.","role":"assistant"}}],"created":1749029611.3043625,"id":"omissis","model":"Meta-Llama-3.1-8B-Instruct","object":"chat.completion","system_fingerprint":"fastcoe","usage":{"completion_tokens":7,"completion_tokens_after_first_per_sec":325.6952943003572,"completion_tokens_after_first_per_sec_first_ten":0,"completion_tokens_per_sec":164.13492995225798,"end_time":1749029611.304327,"is_last_response":true,"prompt_tokens":42,"prompt_tokens_details":{"cached_tokens":0},"start_time":1749029611.2616792,"stop_reason":"stop","time_to_first_token":0.024225711822509766,"total_latency":0.0426478385925293,"total_tokens":49,"total_tokens_per_sec":1148.9445096658058}}

So, my Hugging Face token is authorized for "Make calls to Inference Providers" and it seems to work with cURL.

Question

How to plug Cheshire to Hugging Face? what is the Endpoint Url?

What I tried

I tried selecting "HuggingFace Endpoint" with Endpoint Url https://api.sambanova.ai/v1/chat/completions and putting my token, but I get error Bad request: Model name is required

In the console I get this error:

cheshire_cat_core    | ERROR:   Bad request:
cheshire_cat_core    | ERROR:   Model name is required
cheshire_cat_core    | Traceback (most recent call last):
cheshire_cat_core    |   File "/usr/local/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 409, in hf_raise_for_status
cheshire_cat_core    |     response.raise_for_status()
cheshire_cat_core    |   File "/usr/local/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
cheshire_cat_core    |     raise HTTPError(http_error_msg, response=self)
cheshire_cat_core    | requests.exceptions.HTTPError: 400 Client Error: BAD REQUEST for url: https://api.sambanova.ai/v1/chat/completions

Thanks for small help 🙏

Sorry for this totally stupid question. I'm probably using a wrong endpoint.

P.S. official documentation (?)

https://cheshirecat.ai/custom-large-language-model/

https://huggingface.co/docs/hub/api

I'm a bit surprised to have not found nothing about this topic (?)

https://github.com/cheshire-cat-ai/core/discussions?discussions_q=HuggingFace

https://github.com/cheshire-cat-ai/core/discussions?discussions_q=Hugging+Face

#66

Thanks and sorry for being so n00b 🙏 🐱

Answered by pieroit

Jun 4, 2025

Hi @valerio-bozzolan ,
huggingface "endpoints" are a service people use for production (dedicated paid endpoints)

We had an adapter for the public API but for some reason (I remember low availability and too much variance in model inputs/outputs) we ditched it.

Not sure at the moment what works with HF, and what not, and I'm not investing time in it. Most people running local models use Ollama or vLLM, or many other tools you can use via OpenAI-compatible adapter.

Still you can write your own LLM adapter (see plugins already published about Groq or TogetherAI)

Peace and thank you for playing with the cat ;)
Welcome

View full answer

pieroit · 2025-06-04T15:26:25Z

pieroit
Jun 4, 2025
Maintainer

Hi @valerio-bozzolan ,
huggingface "endpoints" are a service people use for production (dedicated paid endpoints)

We had an adapter for the public API but for some reason (I remember low availability and too much variance in model inputs/outputs) we ditched it.

Not sure at the moment what works with HF, and what not, and I'm not investing time in it. Most people running local models use Ollama or vLLM, or many other tools you can use via OpenAI-compatible adapter.

Still you can write your own LLM adapter (see plugins already published about Groq or TogetherAI)

Peace and thank you for playing with the cat ;)
Welcome

1 reply

valerio-bozzolan Jun 4, 2025
Author

Interesting 🐱 I will then adopt local Ollama on my local toaster. Thaaanks 💟

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration with Hugging Face endpoints #1082

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Integration with Hugging Face endpoints #1082

Uh oh!

Uh oh!

valerio-bozzolan Jun 4, 2025

Question

What I tried

Replies: 1 comment · 1 reply

Uh oh!

pieroit Jun 4, 2025 Maintainer

Uh oh!

valerio-bozzolan Jun 4, 2025 Author

valerio-bozzolan
Jun 4, 2025

Replies: 1 comment 1 reply

pieroit
Jun 4, 2025
Maintainer

valerio-bozzolan Jun 4, 2025
Author