-
-
Notifications
You must be signed in to change notification settings - Fork 33.6k
Gemini: Add option to specify config_entry in generate_content service #143776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
This was requested for home-assistant#140769. I marked the new parameter as required in services.yaml (to require it in the UI) but optional in the implementation schema (to not fail on existing calls). This respects the config entry's model parameters, but ignores prompt and tool options, since they require HA's `Conversation` infrastructure to handle correctly (to render templates and add parameters to the system prompt, or to call tools).
Hey there @tronikos, @IvanLH, mind taking a look at this pull request as it has been labeled with an integration ( Code owner commandsCode owners of
|
homeassistant/components/google_generative_ai_conversation/strings.json
Outdated
Show resolved
Hide resolved
From what I read in the code, it doesn't seem possible to select different models for the action and for the conversation agent. Not all models have Image output for example, we need not concern us with Veo and Imagen for the GenerateContent action as those are purely media targeted and they use different SDK methods ( However it is possible for users to want to use say Gemini 2.5 flash on their conversation agent, since it has the best results on our eval set, and 2.0 Flash on the action to be able to generate images. @tronikos what do you think? I am myself partial to making the model a parameter of the action itself, not reading from the config entry. Coupling both seems odd even from an UX perspective, they appear to be separate things on the UI. Edit: https://ai.google.dev/gemini-api/docs/models#model-variations source for the different Gemini Models and their capabilities. |
Upon further reading, I see we ask users to specify the Config_entry, how would that look like? Do they need to specifiy the full JSON structure? |
This looks like this: You would need to add a second instance of the integration from
I agree that this is not ideal; I would recommend adding an optional |
I would much prefer a Model dropdown on the service call, I think it's not terrible if you have to manually update the model for an action you have, the models behave subtly differently, and you probably want to test how they behave in the specific action you are editing. This would also allow us to error out if the user selects image output with a model that does not support such capabilities. |
@tronikos WDYT? |
We can do that either way (by looking up the model from the config entry). However, AFAIK, the Google API does not expose supported output formats in the list of models. There is one downside to specifying a model instead of a config entry: There is no way to specify safety settings. WDYT? |
I think safety settings should also not be global. More of a per action thing, and while it's true that we can show the error, it would not be when the model is selected. As in that flow we have no way of knowing if there is an action with generate image. The user can be notified, but now they have to navigate to a whole different page to fix it. |
I prefer specifying config_entry mostly for consistency with the OpenAI integration. Ideally we need to support media as input and output to all the different LLM integrations using a unified framework. I think there is an architecture discussion about it. |
Do we have a consensus here? FYI, Gemini uses an entirely different API for videos, with different parameters, so we probably won't be able to reuse this action for generating videos as well. |
There's also Imagen for Images, we can discuss how to choose the correct Image API on the upcoming PR. I've asked some other folks if they want to chime in, matching Open AI would be nice, but I personally think we should explore a way to allow multi model setups without resorting to extravagant multi config entries, stemming from multiple "devices". |
WDYT of adding two params, so the user can specify either a config entry (and apply safety settings) xor an arbitrary model? Beware: If I add a model dropdown, I will need to remove |
I like that, users get to choose if they want to rely on the global config, or they want to specify things. UI for the model params is going to be tricky, could we hide it like on the current settings? |
No; services aren't interactive (we can only render one screen). But we can put either one in a collapsible section. |
This was requested for #140769.
I marked the new parameter as required in services.yaml (to require it in the UI) but optional in the implementation schema (to not fail on existing calls).
This respects the config entry's model parameters, but ignores prompt and tool options, since they require HA's
Conversation
infrastructure to handle correctly (to render templates and add parameters to the system prompt, or to call tools).Warning: This is a breaking change! All calls to this action will now use the safety settings & model specified in first instance of the integration, even if the action doesn't pass
config_entry
.This breaking change is easy to avoid if we want (it would make the code a bit messier)
Breaking change
Proposed change
Type of change
Additional information
Checklist
ruff format homeassistant tests
)If user exposed functionality or configuration variables are added/changed:
If the code communicates with devices, web services, or third-party tools:
Updated and included derived files by running:
python3 -m script.hassfest
.requirements_all.txt
.Updated by running
python3 -m script.gen_requirements_all
.To help with the load of incoming pull requests: