feat(llm): add generate_video tool for Google Veo#364
Open
williamjameshandley wants to merge 1 commit into
Open
feat(llm): add generate_video tool for Google Veo#364williamjameshandley wants to merge 1 commit into
williamjameshandley wants to merge 1 commit into
Conversation
Surface the already-registered Veo models (veo-3.1-generate-preview, veo-2.0-generate-001) via a new generate_video MCP tool, mirroring generate_image's provider routing and adapter dispatch. - video_generation_adapter (gemini): async generate_videos + operations.get polling, files.download mutation fallback, image-to-video via GenAIImage (aliased to avoid the existing PIL Image import) - register "video_generation" adapter type; add video_generation capability flag (pricing_type == per_second) to both capability builders - thread default_duration_seconds from YAML through model_loader for per-second pricing (fail-fast, no magic number) - generate_video tool: capability-gated dispatch, set-kwarg forwarding, expanduser write, per-second cost, metadata-only return (no inline preview) - unit + MCP-protocol integration tests (SDK mocked at the client boundary) generate_audio is intentionally not exposed: the Gemini Developer API rejects the parameter (Enterprise-only) and Veo 3.1 emits native audio by default. Verified end-to-end against the live API: an 8s 720p clip with an AAC audio track was generated, downloaded, and written successfully. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
generate_videoMCP tool to thellmserver so Google Veo models become callable.Today
list_models()advertisesveo-3.1-generate-preview/veo-2.0-generate-001(defined ingemini/models.yaml), but the only generation tool exposed isgenerate_image— there is nobinding that invokes the video API. This wires that up, mirroring
generate_image's providerrouting and adapter dispatch.
What's included
video_generation_adapter(gemini): asyncgenerate_videos+operations.getpolling witha post-loop
donecheck,files.downloadfallback whenvideo_bytesis unpopulated, andimage-to-video via
types.Image(imported asGenAIImageto avoid colliding with the existingfrom PIL import Image)."video_generation"adapter type;video_generationcapability flag(
pricing_type == "per_second") added to both capability builders.default_duration_secondsthreaded from YAML throughbuild_model_configs_dictfor per-second pricing (fail-fast — no magic number in code).generate_videotool: capability-gated dispatch, set-only kwarg forwarding,expanduserfile write, per-second cost viaseconds_generated, metadata-only return(no inline preview — video is written to
output_file).timeout/error/RAI paths, image resolution, duration threading, capability flag) and
MCP-protocol integration tests for the tool.
0.31.11b10 → 0.32.0).Design notes
generate_audiois intentionally not exposed: the Gemini Developer API rejects the parameter(Enterprise-only), and Veo 3.1 emits native audio by default.
grok-imagine-video*) are real entries ingrok/models.yamland showvideo_generation: true, but use a separate SDK — a Grok videoadapter is deferred to a follow-up; calling
generate_videowith a Grok model fails loud viaget_adapter.Testing
718 passed,ruff check/ruff formatclean.track was generated, polled, downloaded, and written. The
generate_audio-rejection above wascaught by this live run — the mocked tests had passed straight through it.
get_clientboundary by design (a real VCR cassette wouldrequire credentials and commit a multi-MB video binary).
🤖 Generated with Claude Code