-
Couldn't load subscription status.
- Fork 221
Description
Describe the bug
I was trying to use structured output with TGI using InferenceEndpointsLLM. I have noticed some problems.
First, I need to initiate the class InferenceEndpointsLLM with tokenizer_id. This will be important here otherwise the chat message will be redirected to _generate_with_chat_completion and this does not contain the grammar param for TGI.
With huggingface_hub we need to use the param model instead of base_url, otherwise there is an Unauthorized error: huggingface/huggingface_hub#2804. I have changed here and it works.
I did not confirm if these updates will have impact in other parts of the code.
To reproduce
from distilabel.llms import InferenceEndpointsLLM
from distilabel.typing import OutlinesStructuredOutputType
from pydantic import BaseModel
class Capital(BaseModel):
name: str
llm = InferenceEndpointsLLM(
tokenizer_id="Qwen/Qwen2.5-7B-Instruct",
base_url="https://q44idlf3rlibqq-8080.proxy.runpod.net/",
api_key="EMPTY",
structured_output=OutlinesStructuredOutputType(
schema=Capital.model_json_schema(),
format="json"
)
)
llm.load()
output = llm.generate_outputs(inputs=[[{"role": "user", "content": "What is the capital city of Portugal?"}]])
###[{'generations': ['{ "name": "Lisbon" }'],
### 'statistics': {'input_tokens': [38], 'output_tokens': [10]}}]
Expected behavior
No response
Screenshots
No response
Environment
- Distilabel 1.5.3
- TGI 3.2.1
Additional context
Happy labeling!