feat: add TwelveLabs Pegasus as a video-understanding provider#678
feat: add TwelveLabs Pegasus as a video-understanding provider#678mohit-twelvelabs wants to merge 2 commits into
Conversation
Adds TwelveLabs Pegasus as an opt-in provider for camera/event analysis. Pegasus is a video model, so the provider re-encodes the keyframes that LLM Vision already extracts into a short MP4 (via the bundled ffmpeg) and sends it to the Pegasus /analyze endpoint, letting it reason over motion rather than independent stills. - New TwelveLabs provider + ProviderFactory wiring (raw aiohttp, no new dep) - Config flow step, dropdown entry, and English strings/translations - Default model (pegasus1.5), endpoint, and error constants - README provider list - Unit tests for the provider (no-network) plus an opt-in live smoke test gated on TWELVELABS_API_KEY
|
Hi @mohit-twelvelabs, thanks for creating this PR! Sounds very interesting, I'll take a look at it as soon as I have a little more time. Perhaps we can integrate it even more deeply, if that's something that would benefit accuracy. |
hassfest's TRANSLATIONS validator rejects URLs in strings.json / translations/en.json. Removed the https://twelvelabs.io links from the TwelveLabs step description and api_key data_description, using neutral 'from the TwelveLabs dashboard' wording instead.
|
Thanks @valentinfrlch! No rush at all — and I'd love to integrate it more deeply if it helps accuracy. Pegasus can do targeted prompted analysis (e.g. structured event descriptions, scene Q&A), and Marengo embeddings could back semantic search over camera events if that's ever useful for the project — happy to follow your lead on the right shape. In the meantime I fixed the failing |
Hi! I'm Mohit, I work at TwelveLabs (@mohit-twelvelabs).
This PR adds TwelveLabs Pegasus as an opt-in video-understanding provider for camera and event analysis.
What it adds
TwelveLabsprovider that calls the Pegasus/analyzeendpoint, wired into theProviderFactory, config flow (new setup step + dropdown entry), constants, and English strings/translations — following the exact same pattern as the existing providers (it's API-key only, like Anthropic/Mistral).pegasus1.5, endpoint, and error constants.TWELVELABS_API_KEY.Why it helps this project
Pegasus is purpose-built for video understanding rather than independent stills. LLM Vision already extracts keyframes from videos, camera snapshots, and Frigate events, so this provider re-encodes those frames back into a short MP4 (using the
ffmpegbinary the integration already requires) and sends that clip to Pegasus. That lets Pegasus reason about motion across the frames — a good complement to the existing frame-by-frame VLM providers for event summaries.Opt-in / non-breaking
Nothing changes for existing users: no defaults are altered and no existing provider behavior is touched. Pegasus only runs if a user explicitly adds a TwelveLabs entry. No new Python dependency is introduced — the provider uses raw
aiohttpexactly like the other HTTP providers.How it was tested
pytest tests/test_providers.py tests/test_config_flow.py— all pass (the live test is skipped without a key)./analyzereturns HTTP 200 with a correct motion-aware description. Confirmed both theurlandbase64_stringvideo-context paths work, and that Pegasus enforcesmax_tokens >= 512(the provider clamps to this).blackon the changed files.You can grab a free API key at https://twelvelabs.io — there's a generous free tier.
Notes
Per CONTRIBUTING this targets the latest
v1.8.0-betabranch. I used AI assistance for boilerplate and to verify API behavior, but I understand and stand behind every line; happy to adjust anything to fit your conventions.