Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
26 changes: 26 additions & 0 deletions models/gemma-3-27b-it.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
model_id: gemma-3-27b-it
domain: open_router
description: Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to [Gemma 2](google/gemma-2-27b-it)
categories:
- openrouter
- cloud
urls:
- https://openrouter.ai/google/gemma-3-27b-it
config_entry_data:
api_key: !secret openrouter_api_key
subentries_data:
- subentry_type: conversation
title: Mock Title
data:
model: google/gemma-3-27b-it
llm_hass_api: assist
- subentry_type: ai_task_data
title: Mock Title
data:
model: google/gemma-3-27b-it
llm_hass_api: assist
rpm: 250
cost:
input_tokens: 0.04
output_tokens: 0.15
11 changes: 11 additions & 0 deletions reports/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
| qwen3-4b-instruct-2507-iq4-nl | $${71.3\\% \space\color{gray}\tiny{\textsf{(CI: 4.1, 2025.12.4)}}}$$ | | | | $${71.3\\% \space\color{gray}\tiny{\textsf{(CI: 4.1, avg)}}}$$ |
| gemini-2.0-flash-lite | $${65.9\\% \space\color{gray}\tiny{\textsf{(CI: 4.3, 2025.4.3)}}}$$ | $${88.3\\% \space\color{gray}\tiny{\textsf{(CI: 4.5, 2025.4.3)}}}$$ | $${63.2\\% \space\color{gray}\tiny{\textsf{(CI: 4.9, 2025.5.0.dev0)}}}$$ | $${53.3\\% \space\color{gray}\tiny{\textsf{(CI: 12.6, 2025.4.3)}}}$$ | $${69.2\\% \space\color{gray}\tiny{\textsf{(CI: 2.8, avg)}}}$$ |
| qwen3-1.7b | $${35.9\\% \space\color{gray}\tiny{\textsf{(CI: 4.4, 2025.7.1)}}}$$ | $${60.2\\% \space\color{gray}\tiny{\textsf{(CI: 6.9, 2025.7.1)}}}$$ | $${59.5\\% \space\color{gray}\tiny{\textsf{(CI: 5.0, 2025.7.1)}}}$$ | $${0.0\\% \space\color{gray}\tiny{\textsf{(CI: 0.0, 2025.7.1)}}}$$ | $${49.0\\% \space\color{gray}\tiny{\textsf{(CI: 3.1, avg)}}}$$ |
| gemma-3-27b-it | $${8.7\\% \space\color{gray}\tiny{\textsf{(CI: 2.6, 2026.2.2)}}}$$ | $${0.0\\% \space\color{gray}\tiny{\textsf{(CI: 0.0, 2026.2.2)}}}$$ | $${2.7\\% \space\color{gray}\tiny{\textsf{(CI: 1.7, 2026.2.2)}}}$$ | $${0.0\\% \space\color{gray}\tiny{\textsf{(CI: 0.0, 2026.2.2)}}}$$ | $${4.9\\% \space\color{gray}\tiny{\textsf{(CI: 1.3, avg)}}}$$ |

Implementation notes:
- CI is large given small number of samples in the datasets.
Expand Down Expand Up @@ -396,6 +397,16 @@ More information:
- https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/


### gemma-3-27b-it

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to [Gemma 2](google/gemma-2-27b-it)



More information:
- https://openrouter.ai/google/gemma-3-27b-it


### glm-4.7-flash

As the strongest model in the 30B class, GLM-4.7-Flash offers a new option for lightweight deployment that balances performance and efficiency.
Expand Down
19 changes: 19 additions & 0 deletions reports/assist-mini/2026.2.2/gemma-3-27b-it/_scrape_context.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
uuid: c7db25c0-6809-4836-9f20-7159a7586d54
timestamp: 2026-02-16 08:07:44.693575
scrape_config:
dataset: assist-mini
dataset_path: datasets/assist-mini
dataset_version: v1
model_id: gemma-3-27b-it
model_output_path: reports/assist-mini/2026.2.2
version: 2026.2.2
context:
user: runner
argv:
- /home/runner/work/openrouter-benchmarks/openrouter-benchmarks/.venv/bin/pytest
- home_assistant_datasets/tool/assist/collect
- --models=gemma-3-27b-it
- --dataset=datasets/assist-mini/
- --model_output_dir=reports/assist-mini/2026.2.2
notes: ''
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
---
uuid: bd22dce8-af05-417a-a549-3bbc30b9b9b7
task_id: dom1_pl_lights_lights-dining_room_light_off-0
model_id: gemma-3-27b-it
category: light
task:
input_text: Dining room light off
expect_changes:
light.dining_room_light:
state: 'off'
attributes:
brightness: null
color_mode: null
response: Error talking to API
context:
unexpected_states:
light.dining_room_light:
expected:
brightness: null
state: 'off'
color_mode: null
got:
brightness: 100
state: 'on'
color_mode: brightness
conversation_trace:
- event_type: async_process
data:
text: Dining room light off
context:
id: 01KHJQVN6YMJSSGHMWSCTKXE9H
parent_id: null
user_id: null
conversation_id: null
device_id: null
satellite_id: null
language: en
agent_id: conversation.mock_title
extra_system_prompt: null
timestamp: 2026-02-16 08:07:51.262251+00:00
- event_type: agent_detail
data:
messages:
- role: system
content: |
You are a voice assistant for Home Assistant.
Answer questions about the world truthfully.
Answer in plain text. Keep it simple and to the point.
When controlling Home Assistant always call the intent tools. Use HassTurnOn to lock and HassTurnOff to unlock a lock. When controlling a device, prefer passing just name and domain. When controlling an area, prefer passing just area name and domain.
When a user asks to turn on all devices of a specific type, ask user to specify an area, unless there is only one device of that type.
This device is not able to start timers.
You ARE equipped to answer questions about the current state of
the home using the `GetLiveContext` tool. This is a primary function. Do not state you lack the
functionality if the question requires live data.
If the user asks about device existence/type (e.g., "Do I have lights in the bedroom?"): Answer
from the static context below.
If the user asks about the CURRENT state, value, or mode (e.g., "Is the lock locked?",
"Is the fan on?", "What mode is the thermostat in?", "What is the temperature outside?"):
1. Recognize this requires live data.
2. You MUST call `GetLiveContext`. This tool will provide the needed real-time information (like temperature from the local weather, lock status, etc.).
3. Use the tool's response** to answer the user accurately (e.g., "The temperature outside is [value from tool].").
For general knowledge questions not about the home: Answer truthfully from internal knowledge.

Static Context: An overview of the areas and the devices in this smart home:
- names: Bedroom 1 Light
domain: light
areas: Bedroom 1
- names: Bedroom 2 Light
domain: light
areas: Bedroom 2
- names: Bedroom 3 Light
domain: light
areas: Bedroom 3
- names: Bedroom 4 Light
domain: light
areas: Bedroom 4
- names: Dining Room Light
domain: light
areas: Dining Room
- names: Garden Light
domain: light
areas: Backyard
- names: Kitchen Light
domain: light
areas: Kitchen
- names: Living Room Light
domain: light
areas: Living Room
created: 2026-02-16 08:07:51.263122+00:00
- role: user
content: Dining room light off
attachments: null
created: 2026-02-16 08:07:51.262322+00:00
tools:
- name: HassLightSet
description: Sets the brightness percentage or color of a light
parameters: '{Any(''name'', ''area'', ''floor'', msg=None): <function non_empty_string
at 0x7f5efe5e5fe0>, ''domain'': All(<function ensure_list at 0x7f5f0318f110>,
[<function string at 0x7f5f03185dd0>], msg=None), ''color'': <function color_name_to_rgb
at 0x7f5f00cb1590>, ''temperature'': All(Coerce(int, msg=None), Range(min=0,
max=None, min_included=True, max_included=True, msg=None), msg=None), ''brightness'':
All(Coerce(int, msg=None), Range(min=0, max=100, min_included=True, max_included=True,
msg=None), msg=None)}'
- name: HassTurnOn
description: Turns on/opens/presses a device or entity. For locks, this performs
a 'lock' action. Use for requests like 'turn on', 'activate', 'enable',
or 'lock'.
parameters: '{Any(''name'', ''area'', ''floor'', msg=None): <function non_empty_string
at 0x7f5efe5e5fe0>, ''domain'': All(<function ensure_list at 0x7f5f0318f110>,
[<function string at 0x7f5f03185dd0>], msg=None), ''device_class'': All(<function
ensure_list at 0x7f5f0318f110>, [In([''outlet'', ''switch'', ''awning'',
''blind'', ''curtain'', ''damper'', ''door'', ''garage'', ''gate'', ''shade'',
''shutter'', ''window'', ''identify'', ''restart'', ''update'', ''tv'',
''speaker'', ''receiver'', ''water'', ''gas''])], msg=None)}'
- name: HassTurnOff
description: Turns off/closes a device or entity. For locks, this performs
an 'unlock' action. Use for requests like 'turn off', 'deactivate', 'disable',
or 'unlock'.
parameters: '{Any(''name'', ''area'', ''floor'', msg=None): <function non_empty_string
at 0x7f5efe5e5fe0>, ''domain'': All(<function ensure_list at 0x7f5f0318f110>,
[<function string at 0x7f5f03185dd0>], msg=None), ''device_class'': All(<function
ensure_list at 0x7f5f0318f110>, [In([''outlet'', ''switch'', ''awning'',
''blind'', ''curtain'', ''damper'', ''door'', ''garage'', ''gate'', ''shade'',
''shutter'', ''window'', ''identify'', ''restart'', ''update'', ''tv'',
''speaker'', ''receiver'', ''water'', ''gas''])], msg=None)}'
- name: HassCancelAllTimers
description: Cancels all timers
parameters: '{''area'': <function string at 0x7f5f03185dd0>}'
- name: GetDateTime
description: Provides the current date and time.
parameters: '{}'
- name: GetLiveContext
description: 'Provides real-time information about the CURRENT state, value,
or mode of devices, sensors, entities, or areas. Use this tool for: 1. Answering
questions about current conditions (e.g., ''Is the light on?''). 2. As the
first step in conditional actions (e.g., ''If the weather is rainy, turn
off sprinklers'' requires checking the weather first).'
parameters: '{}'
timestamp: 2026-02-16 08:07:51.263131+00:00
duration_ms: 91.297
tries: 1
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
---
uuid: 7ffd7d3f-0143-4243-b4c1-210143af7686
task_id: dom1_pl_lights_lights-dining_room_light_off-1
model_id: gemma-3-27b-it
category: light
task:
input_text: Dining room light off
expect_changes:
light.dining_room_light:
state: 'off'
attributes:
brightness: null
color_mode: null
response: Error talking to API
context:
unexpected_states:
light.dining_room_light:
expected:
brightness: null
state: 'off'
color_mode: null
got:
brightness: 100
state: 'on'
color_mode: brightness
conversation_trace:
- event_type: async_process
data:
text: Dining room light off
context:
id: 01KHJQVNEVDB42XS98N4EDSFT1
parent_id: null
user_id: null
conversation_id: null
device_id: null
satellite_id: null
language: en
agent_id: conversation.mock_title
extra_system_prompt: null
timestamp: 2026-02-16 08:07:51.516027+00:00
- event_type: agent_detail
data:
messages:
- role: system
content: |
You are a voice assistant for Home Assistant.
Answer questions about the world truthfully.
Answer in plain text. Keep it simple and to the point.
When controlling Home Assistant always call the intent tools. Use HassTurnOn to lock and HassTurnOff to unlock a lock. When controlling a device, prefer passing just name and domain. When controlling an area, prefer passing just area name and domain.
When a user asks to turn on all devices of a specific type, ask user to specify an area, unless there is only one device of that type.
This device is not able to start timers.
You ARE equipped to answer questions about the current state of
the home using the `GetLiveContext` tool. This is a primary function. Do not state you lack the
functionality if the question requires live data.
If the user asks about device existence/type (e.g., "Do I have lights in the bedroom?"): Answer
from the static context below.
If the user asks about the CURRENT state, value, or mode (e.g., "Is the lock locked?",
"Is the fan on?", "What mode is the thermostat in?", "What is the temperature outside?"):
1. Recognize this requires live data.
2. You MUST call `GetLiveContext`. This tool will provide the needed real-time information (like temperature from the local weather, lock status, etc.).
3. Use the tool's response** to answer the user accurately (e.g., "The temperature outside is [value from tool].").
For general knowledge questions not about the home: Answer truthfully from internal knowledge.

Static Context: An overview of the areas and the devices in this smart home:
- names: Bedroom 1 Light
domain: light
areas: Bedroom 1
- names: Bedroom 2 Light
domain: light
areas: Bedroom 2
- names: Bedroom 3 Light
domain: light
areas: Bedroom 3
- names: Bedroom 4 Light
domain: light
areas: Bedroom 4
- names: Dining Room Light
domain: light
areas: Dining Room
- names: Garden Light
domain: light
areas: Backyard
- names: Kitchen Light
domain: light
areas: Kitchen
- names: Living Room Light
domain: light
areas: Living Room
created: 2026-02-16 08:07:51.516775+00:00
- role: user
content: Dining room light off
attachments: null
created: 2026-02-16 08:07:51.516121+00:00
tools:
- name: HassLightSet
description: Sets the brightness percentage or color of a light
parameters: '{Any(''name'', ''area'', ''floor'', msg=None): <function non_empty_string
at 0x7f5efe5e5fe0>, ''domain'': All(<function ensure_list at 0x7f5f0318f110>,
[<function string at 0x7f5f03185dd0>], msg=None), ''color'': <function color_name_to_rgb
at 0x7f5f00cb1590>, ''temperature'': All(Coerce(int, msg=None), Range(min=0,
max=None, min_included=True, max_included=True, msg=None), msg=None), ''brightness'':
All(Coerce(int, msg=None), Range(min=0, max=100, min_included=True, max_included=True,
msg=None), msg=None)}'
- name: HassTurnOn
description: Turns on/opens/presses a device or entity. For locks, this performs
a 'lock' action. Use for requests like 'turn on', 'activate', 'enable',
or 'lock'.
parameters: '{Any(''name'', ''area'', ''floor'', msg=None): <function non_empty_string
at 0x7f5efe5e5fe0>, ''domain'': All(<function ensure_list at 0x7f5f0318f110>,
[<function string at 0x7f5f03185dd0>], msg=None), ''device_class'': All(<function
ensure_list at 0x7f5f0318f110>, [In([''outlet'', ''switch'', ''awning'',
''blind'', ''curtain'', ''damper'', ''door'', ''garage'', ''gate'', ''shade'',
''shutter'', ''window'', ''identify'', ''restart'', ''update'', ''tv'',
''speaker'', ''receiver'', ''water'', ''gas''])], msg=None)}'
- name: HassTurnOff
description: Turns off/closes a device or entity. For locks, this performs
an 'unlock' action. Use for requests like 'turn off', 'deactivate', 'disable',
or 'unlock'.
parameters: '{Any(''name'', ''area'', ''floor'', msg=None): <function non_empty_string
at 0x7f5efe5e5fe0>, ''domain'': All(<function ensure_list at 0x7f5f0318f110>,
[<function string at 0x7f5f03185dd0>], msg=None), ''device_class'': All(<function
ensure_list at 0x7f5f0318f110>, [In([''outlet'', ''switch'', ''awning'',
''blind'', ''curtain'', ''damper'', ''door'', ''garage'', ''gate'', ''shade'',
''shutter'', ''window'', ''identify'', ''restart'', ''update'', ''tv'',
''speaker'', ''receiver'', ''water'', ''gas''])], msg=None)}'
- name: HassCancelAllTimers
description: Cancels all timers
parameters: '{''area'': <function string at 0x7f5f03185dd0>}'
- name: GetDateTime
description: Provides the current date and time.
parameters: '{}'
- name: GetLiveContext
description: 'Provides real-time information about the CURRENT state, value,
or mode of devices, sensors, entities, or areas. Use this tool for: 1. Answering
questions about current conditions (e.g., ''Is the light on?''). 2. As the
first step in conditional actions (e.g., ''If the weather is rainy, turn
off sprinklers'' requires checking the weather first).'
parameters: '{}'
timestamp: 2026-02-16 08:07:51.516784+00:00
duration_ms: 81.782
tries: 1
Loading
Loading