Skip to content

feat: add TwelveLabs Pegasus as a video-understanding provider#678

Open
mohit-twelvelabs wants to merge 2 commits into
valentinfrlch:v1.8.0-betafrom
mohit-twelvelabs:feat/twelvelabs-integration
Open

feat: add TwelveLabs Pegasus as a video-understanding provider#678
mohit-twelvelabs wants to merge 2 commits into
valentinfrlch:v1.8.0-betafrom
mohit-twelvelabs:feat/twelvelabs-integration

Conversation

@mohit-twelvelabs

Copy link
Copy Markdown

Hi! I'm Mohit, I work at TwelveLabs (@mohit-twelvelabs).

This PR adds TwelveLabs Pegasus as an opt-in video-understanding provider for camera and event analysis.

What it adds

  • A new TwelveLabs provider that calls the Pegasus /analyze endpoint, wired into the ProviderFactory, config flow (new setup step + dropdown entry), constants, and English strings/translations — following the exact same pattern as the existing providers (it's API-key only, like Anthropic/Mistral).
  • Default model pegasus1.5, endpoint, and error constants.
  • Unit tests for the provider (no network) plus one opt-in live smoke test gated on TWELVELABS_API_KEY.

Why it helps this project

Pegasus is purpose-built for video understanding rather than independent stills. LLM Vision already extracts keyframes from videos, camera snapshots, and Frigate events, so this provider re-encodes those frames back into a short MP4 (using the ffmpeg binary the integration already requires) and sends that clip to Pegasus. That lets Pegasus reason about motion across the frames — a good complement to the existing frame-by-frame VLM providers for event summaries.

Opt-in / non-breaking

Nothing changes for existing users: no defaults are altered and no existing provider behavior is touched. Pegasus only runs if a user explicitly adds a TwelveLabs entry. No new Python dependency is introduced — the provider uses raw aiohttp exactly like the other HTTP providers.

How it was tested

  • pytest tests/test_providers.py tests/test_config_flow.py — all pass (the live test is skipped without a key).
  • Verified the full path end-to-end against the real API outside the test harness: extracted JPEG frames → ffmpeg MP4 → Pegasus /analyze returns HTTP 200 with a correct motion-aware description. Confirmed both the url and base64_string video-context paths work, and that Pegasus enforces max_tokens >= 512 (the provider clamps to this).
  • Ran black on the changed files.

You can grab a free API key at https://twelvelabs.io — there's a generous free tier.

Notes

Per CONTRIBUTING this targets the latest v1.8.0-beta branch. I used AI assistance for boilerplate and to verify API behavior, but I understand and stand behind every line; happy to adjust anything to fit your conventions.

Adds TwelveLabs Pegasus as an opt-in provider for camera/event analysis.
Pegasus is a video model, so the provider re-encodes the keyframes that
LLM Vision already extracts into a short MP4 (via the bundled ffmpeg) and
sends it to the Pegasus /analyze endpoint, letting it reason over motion
rather than independent stills.

- New TwelveLabs provider + ProviderFactory wiring (raw aiohttp, no new dep)
- Config flow step, dropdown entry, and English strings/translations
- Default model (pegasus1.5), endpoint, and error constants
- README provider list
- Unit tests for the provider (no-network) plus an opt-in live smoke test
  gated on TWELVELABS_API_KEY
@valentinfrlch

Copy link
Copy Markdown
Owner

Hi @mohit-twelvelabs, thanks for creating this PR! Sounds very interesting, I'll take a look at it as soon as I have a little more time. Perhaps we can integrate it even more deeply, if that's something that would benefit accuracy.

hassfest's TRANSLATIONS validator rejects URLs in strings.json /
translations/en.json. Removed the https://twelvelabs.io links from the
TwelveLabs step description and api_key data_description, using neutral
'from the TwelveLabs dashboard' wording instead.
@mohit-twelvelabs

Copy link
Copy Markdown
Author

Thanks @valentinfrlch! No rush at all — and I'd love to integrate it more deeply if it helps accuracy. Pegasus can do targeted prompted analysis (e.g. structured event descriptions, scene Q&A), and Marengo embeddings could back semantic search over camera events if that's ever useful for the project — happy to follow your lead on the right shape.

In the meantime I fixed the failing validate-hassfest check in 27a9043: hassfest's TRANSLATIONS validator doesn't allow URLs in strings.json/translations/en.json, so I removed the https://twelvelabs.io links from the TwelveLabs step description and api_key description (now neutral "from the TwelveLabs dashboard" wording). CI should be green now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants