Skip to content

Response ignores requested format and always returns Title + Description #675

Description

@Morgoth67

Checklist

  • I'm running the newest version of LLM Vision https://github.com/valentinfrlch/ha-llmvision/releases/latest
  • I have enabled debug logging for the integration.
  • I have filled out the issue template to the best of my ability.
  • This issue only contains 1 issue (if you have multiple issues, open one issue for each issue).
  • This is a bug and not a feature request.
  • I have searched open issues for my problem.

Describe the issue

Version: 1.7.0 (this worked correctly before updating)
Provider: Groq (very similar behavior also observed with Gemini)

Description:

llmvision.image_analyzer injects its own hardcoded system message into every request, forcing the model to respond with
{"title": "...", "description": "..."}. This happens even with generate_title: false and response_format: text, and completely
overrides any custom output format requested in the user prompt.

My user prompt (passed correctly and in full) asks the model to answer in a specific "yes/no + one sentence reasoning" format for an automation. Instead, the model follows the injected system prompt and returns:

{"title": "No activity", "description": "The garden is empty."}

This worked as expected before updating to 1.7.0 — the same automation, unchanged, returned the model's response in the requested format.

This may be related to #598 / #599, which show similar title/description-related breakage, but here it's clearly traceable to this injected system prompt taking priority over the user's instructions.

Expected: With generate_title: false and response_format: text, no system prompt should be injected, and the model should respond directly to the user's message as it did before 1.7.0.

Reproduction steps

  1. Call llmvision.image_analyzer with:

    • generate_title: false
    • response_format: text
    • expose_images: false
    • A custom message prompt that explicitly asks for a different output format, e.g.:
      "Answer strictly in this format:
      Line 1: yes or no
      Line 2: one sentence explaining your reasoning"
    • Any image_entity
    • Provider: Groq (Gemini will behave the same. I haven´t tested with other)
  2. Check the debug logs for the outgoing "Request data" to the provider.

  3. Observe that a system message has been injected before the user's
    message, forcing JSON output with title/description fields,
    completely unrelated to the custom prompt's requested format.

  4. The model (correctly) follows the system prompt instead of the user
    prompt, and response_text returns {"title": "...", "description": "..."} instead of the requested "yes/no + reasoning" text.

Debug logs

DEBUG [custom_components.llmvision] Service call data: 
  {'provider': '<provider_id>', 
   'message': 'This is a security camera image of a private garden with 
   lawn, tiled path, ornamental stones and a green privacy fence with a 
   white side gate.\n\nCONTEXT: People outside the fence on the 
   street/path are not a \nconcern. A person can be present near the 
   gate, fence or path inside the property.\nThe image may be bright or 
   backlit.\nBefore answering, consider:\n- Is there a face, limb or body 
   shape that could be a person inside \n  the property, even partially 
   hidden or near the gate/fence?\n- If you are genuinely unsure whether 
   something is a person or part of the structure, answer YES — safety 
   comes first.\n- Only answer NO if you are confident no person is 
   present inside.\n\nAre there any human beings visible inside the 
   property?\n\nLine 1: yes or no\nLine 2: one sentence explaining your 
   reasoning\n', 
   'image_entity': ['camera.cam_entrance'], 
   'include_filename': True, 'target_width': 1280, 'max_tokens': 3000, 
   'generate_title': False, 'expose_images': False, 
   'response_format': 'text'}

  DEBUG [custom_components.llmvision.providers] Provider initialized: 
  Groq(model=meta-llama/llama-4-scout-17b-16e-instruct, endpoint={...})

  DEBUG [custom_components.llmvision.providers] Request data: 
  {'messages': [
    {'role': 'system', 'content': 'Analyze the security camera image 
     and respond with ONLY a valid JSON object. No markdown, no code 
     blocks, no explanation - just the raw JSON. Output format: 
     {"title": "<2-5 word summary>", "description": "<1-2 factual 
     sentences in present tense>"}. If no people, vehicles, or animals 
     are present: title must be exactly "No activity".'}, 
    {'role': 'user', 'content': [
      {'type': 'text', 'text': '<my full custom prompt, same as above, 
       sent unmodified>'}, 
      {'type': 'image_url', 'image_url': {'url': '<base64_image>'}}
    ]}
  ], 
  'model': 'meta-llama/llama-4-scout-17b-16e-instruct', 
  'max_completion_tokens': 3000, 'temperature': 0.3, 'top_p': 0.8}

  DEBUG [custom_components.llmvision.providers] Posting to 
  https://api.groq.com/openai/v1/chat/completions

  DEBUG [custom_components.llmvision.providers] Response data: 
  {'id': '<id>', 'object': 'chat.completion', 'created': <ts>, 
   'model': 'meta-llama/llama-4-scout-17b-16e-instruct', 
   'choices': [{'index': 0, 'message': {'role': 'assistant', 
   'content': '{"title": "No activity", "description": "The garden is 
   empty."}'}, 'logprobs': None, 'finish_reason': 'stop'}], 
   'usage': {...}}

  DEBUG [custom_components.llmvision.providers] Provider: Groq, Model: 
  meta-llama/llama-4-scout-17b-16e-instruct, Response: 
  {"title": "No activity", "description": "The garden is empty."}

  DEBUG [custom_components.llmvision.providers] Is Glimpse Model: False

  INFO [custom_components.llmvision] Response: 
  {'response_text': '{"title": "No activity", "description": "The 
  garden is empty."}'}

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions