Skip to content

[BUG] Structured Output with InferenceEndpointsLLM and TGI #1135

@joaomsimoes

Description

@joaomsimoes

Describe the bug

I was trying to use structured output with TGI using InferenceEndpointsLLM. I have noticed some problems.

First, I need to initiate the class InferenceEndpointsLLM with tokenizer_id. This will be important here otherwise the chat message will be redirected to _generate_with_chat_completion and this does not contain the grammar param for TGI.

With huggingface_hub we need to use the param model instead of base_url, otherwise there is an Unauthorized error: huggingface/huggingface_hub#2804. I have changed here and it works.

I did not confirm if these updates will have impact in other parts of the code.

To reproduce

from distilabel.llms import InferenceEndpointsLLM
from distilabel.typing import OutlinesStructuredOutputType
from pydantic import BaseModel

class Capital(BaseModel):
    name: str

llm = InferenceEndpointsLLM(
    tokenizer_id="Qwen/Qwen2.5-7B-Instruct",
    base_url="https://q44idlf3rlibqq-8080.proxy.runpod.net/",
    api_key="EMPTY",
    structured_output=OutlinesStructuredOutputType(
        schema=Capital.model_json_schema(),
        format="json"
    )
)
llm.load()
output = llm.generate_outputs(inputs=[[{"role": "user", "content": "What is the capital city of Portugal?"}]])
###[{'generations': ['{ "name": "Lisbon" }'],
### 'statistics': {'input_tokens': [38], 'output_tokens': [10]}}]

Expected behavior

No response

Screenshots

No response

Environment

  • Distilabel 1.5.3
  • TGI 3.2.1

Additional context

Happy labeling!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions