Skip to content

Gemini: Add option to specify config_entry in generate_content service #143776

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

SLaks
Copy link
Contributor

@SLaks SLaks commented Apr 27, 2025

This was requested for #140769.

I marked the new parameter as required in services.yaml (to require it in the UI) but optional in the implementation schema (to not fail on existing calls).

This respects the config entry's model parameters, but ignores prompt and tool options, since they require HA's Conversation infrastructure to handle correctly (to render templates and add parameters to the system prompt, or to call tools).

Warning: This is a breaking change! All calls to this action will now use the safety settings & model specified in first instance of the integration, even if the action doesn't pass config_entry.

This breaking change is easy to avoid if we want (it would make the code a bit messier)

Breaking change

Proposed change

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Deprecation (breaking change to happen in the future)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • I have followed the perfect PR recommendations
  • The code has been formatted using Ruff (ruff format homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.

To help with the load of incoming pull requests:

This was requested for home-assistant#140769.

I marked the new parameter as required in services.yaml (to require it in the UI) but optional in the implementation schema (to not fail on existing calls).

This respects the config entry's model parameters, but ignores prompt and tool options, since they require HA's `Conversation` infrastructure to handle correctly (to render templates and add parameters to the system prompt, or to call tools).
@home-assistant
Copy link

Hey there @tronikos, @IvanLH, mind taking a look at this pull request as it has been labeled with an integration (google_generative_ai_conversation) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of google_generative_ai_conversation can trigger bot actions by commenting:

  • @home-assistant close Closes the pull request.
  • @home-assistant rename Awesome new title Renames the pull request.
  • @home-assistant reopen Reopen the pull request.
  • @home-assistant unassign google_generative_ai_conversation Removes the current integration label and assignees on the pull request, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the pull request.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the pull request.

@IvanLH
Copy link
Contributor

IvanLH commented Apr 29, 2025

From what I read in the code, it doesn't seem possible to select different models for the action and for the conversation agent. Not all models have Image output for example, we need not concern us with Veo and Imagen for the GenerateContent action as those are purely media targeted and they use different SDK methods (generate_videos and generate_images).

However it is possible for users to want to use say Gemini 2.5 flash on their conversation agent, since it has the best results on our eval set, and 2.0 Flash on the action to be able to generate images.

@tronikos what do you think? I am myself partial to making the model a parameter of the action itself, not reading from the config entry. Coupling both seems odd even from an UX perspective, they appear to be separate things on the UI.

Edit: https://ai.google.dev/gemini-api/docs/models#model-variations source for the different Gemini Models and their capabilities.

@IvanLH
Copy link
Contributor

IvanLH commented Apr 29, 2025

From what I read in the code, it doesn't seem possible to select different models for the action and for the conversation agent. Not all models have Image output for example, we need not concern us with Veo and Imagen for the GenerateContent action as those are purely media targeted and they use different SDK methods (generate_videos and generate_images).

However it is possible for users to want to use say Gemini 2.5 flash on their conversation agent, since it has the best results on our eval set, and 2.0 Flash on the action to be able to generate images.

@tronikos what do you think? I am myself partial to making the model a parameter of the action itself, not reading from the config entry. Coupling both seems odd even from an UX perspective, they appear to be separate things on the UI.

Edit: https://ai.google.dev/gemini-api/docs/models#model-variations source for the different Gemini Models and their capabilities.

Upon further reading, I see we ask users to specify the Config_entry, how would that look like? Do they need to specifiy the full JSON structure?

@SLaks
Copy link
Contributor Author

SLaks commented Apr 29, 2025

This looks like this:

image

You would need to add a second instance of the integration from Devices, copy in the same API key, and select a different model from the Configure dialog there. The full workflow (which I would add to docs in the followup image gen PR) would be:

  1. Find your API key (eg, from core.config_entries)
  2. Devices, Google Generative AI, Add Service
  3. Paste in your API key
  4. Rename the new entry to something more meaningful than 2x Google Generative AI (eg, rename to Images)
  5. Click Configure in the new entry
  6. Uncheck Recommended model settings
  7. Submit
  8. Select Gemini 2.0 Flash (Image Generation) Experimental in Model
  9. Submit a second time

I agree that this is not ideal; I would recommend adding an optional Model dropdown in the service call. OTOH, if you use the service in a number of places, using the config entry would make it easier to change models for all of them at once.

@IvanLH
Copy link
Contributor

IvanLH commented Apr 29, 2025

I would much prefer a Model dropdown on the service call, I think it's not terrible if you have to manually update the model for an action you have, the models behave subtly differently, and you probably want to test how they behave in the specific action you are editing.

This would also allow us to error out if the user selects image output with a model that does not support such capabilities.

@SLaks
Copy link
Contributor Author

SLaks commented Apr 29, 2025

@tronikos WDYT?

@SLaks
Copy link
Contributor Author

SLaks commented Apr 29, 2025

This would also allow us to error out if the user selects image output with a model that does not support such capabilities.

We can do that either way (by looking up the model from the config entry). However, AFAIK, the Google API does not expose supported output formats in the list of models.


There is one downside to specifying a model instead of a config entry: There is no way to specify safety settings. WDYT?

@IvanLH
Copy link
Contributor

IvanLH commented Apr 30, 2025

I think safety settings should also not be global. More of a per action thing, and while it's true that we can show the error, it would not be when the model is selected. As in that flow we have no way of knowing if there is an action with generate image.

The user can be notified, but now they have to navigate to a whole different page to fix it.

@tronikos
Copy link
Member

I prefer specifying config_entry mostly for consistency with the OpenAI integration. Ideally we need to support media as input and output to all the different LLM integrations using a unified framework. I think there is an architecture discussion about it.

@SLaks
Copy link
Contributor Author

SLaks commented Apr 30, 2025

Do we have a consensus here?

FYI, Gemini uses an entirely different API for videos, with different parameters, so we probably won't be able to reuse this action for generating videos as well.

@IvanLH
Copy link
Contributor

IvanLH commented May 1, 2025

There's also Imagen for Images, we can discuss how to choose the correct Image API on the upcoming PR. I've asked some other folks if they want to chime in, matching Open AI would be nice, but I personally think we should explore a way to allow multi model setups without resorting to extravagant multi config entries, stemming from multiple "devices".

@SLaks
Copy link
Contributor Author

SLaks commented May 2, 2025

WDYT of adding two params, so the user can specify either a config entry (and apply safety settings) xor an arbitrary model?

Beware: If I add a model dropdown, I will need to remove services.yaml and register this service in code, so that I can fetch the models from the Gemini API and populate the dropdown options.

@IvanLH
Copy link
Contributor

IvanLH commented May 2, 2025

I like that, users get to choose if they want to rely on the global config, or they want to specify things.

UI for the model params is going to be tricky, could we hide it like on the current settings?

@SLaks
Copy link
Contributor Author

SLaks commented May 2, 2025

UI for the model params is going to be tricky, could we hide it like on the current settings?

No; services aren't interactive (we can only render one screen).

But we can put either one in a collapsible section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants