Skip to content

feat(llm): add generate_video tool for Google Veo#364

Open
williamjameshandley wants to merge 1 commit into
masterfrom
feat/generate-video-veo
Open

feat(llm): add generate_video tool for Google Veo#364
williamjameshandley wants to merge 1 commit into
masterfrom
feat/generate-video-veo

Conversation

@williamjameshandley

Copy link
Copy Markdown
Contributor

Summary

Adds a generate_video MCP tool to the llm server so Google Veo models become callable.
Today list_models() advertises veo-3.1-generate-preview / veo-2.0-generate-001 (defined in
gemini/models.yaml), but the only generation tool exposed is generate_image — there is no
binding that invokes the video API. This wires that up, mirroring generate_image's provider
routing and adapter dispatch.

What's included

  • video_generation_adapter (gemini): async generate_videos + operations.get polling with
    a post-loop done check, files.download fallback when video_bytes is unpopulated, and
    image-to-video via types.Image (imported as GenAIImage to avoid colliding with the existing
    from PIL import Image).
  • Registry: new "video_generation" adapter type; video_generation capability flag
    (pricing_type == "per_second") added to both capability builders.
  • Model loader / YAML: default_duration_seconds threaded from YAML through
    build_model_configs_dict for per-second pricing (fail-fast — no magic number in code).
  • generate_video tool: capability-gated dispatch, set-only kwarg forwarding,
    expanduser file write, per-second cost via seconds_generated, metadata-only return
    (no inline preview — video is written to output_file).
  • Tests: unit tests for the adapter (config construction, polling, download mutation,
    timeout/error/RAI paths, image resolution, duration threading, capability flag) and
    MCP-protocol integration tests for the tool.
  • README + version bump (0.31.11b10 → 0.32.0).

Design notes

  • generate_audio is intentionally not exposed: the Gemini Developer API rejects the parameter
    (Enterprise-only), and Veo 3.1 emits native audio by default.
  • Scoped to Gemini Veo. Grok video models (grok-imagine-video*) are real entries in
    grok/models.yaml and show video_generation: true, but use a separate SDK — a Grok video
    adapter is deferred to a follow-up; calling generate_video with a Grok model fails loud via
    get_adapter.

Testing

  • 718 passed, ruff check/ruff format clean.
  • Live end-to-end verified against the real Veo 3.1 API: an 8s 720p clip with an AAC audio
    track was generated, polled, downloaded, and written. The generate_audio-rejection above was
    caught by this live run — the mocked tests had passed straight through it.
  • Integration tests mock the SDK at the get_client boundary by design (a real VCR cassette would
    require credentials and commit a multi-MB video binary).

🤖 Generated with Claude Code

Surface the already-registered Veo models (veo-3.1-generate-preview,
veo-2.0-generate-001) via a new generate_video MCP tool, mirroring
generate_image's provider routing and adapter dispatch.

- video_generation_adapter (gemini): async generate_videos + operations.get
  polling, files.download mutation fallback, image-to-video via GenAIImage
  (aliased to avoid the existing PIL Image import)
- register "video_generation" adapter type; add video_generation capability
  flag (pricing_type == per_second) to both capability builders
- thread default_duration_seconds from YAML through model_loader for
  per-second pricing (fail-fast, no magic number)
- generate_video tool: capability-gated dispatch, set-kwarg forwarding,
  expanduser write, per-second cost, metadata-only return (no inline preview)
- unit + MCP-protocol integration tests (SDK mocked at the client boundary)

generate_audio is intentionally not exposed: the Gemini Developer API rejects
the parameter (Enterprise-only) and Veo 3.1 emits native audio by default.
Verified end-to-end against the live API: an 8s 720p clip with an AAC audio
track was generated, downloaded, and written successfully.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant