docs: add TwelveLabs Pegasus video understanding example#2355
Draft
mohit-twelvelabs wants to merge 3 commits into
Draft
docs: add TwelveLabs Pegasus video understanding example#2355mohit-twelvelabs wants to merge 3 commits into
mohit-twelvelabs wants to merge 3 commits into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new self-contained example under examples/ demonstrating how to combine frame-level Supervision detections (YOLO/Ultralytics) with a whole-video natural-language summary from TwelveLabs Pegasus, and registers it in the examples index.
Changes:
- Introduces
twelvelabs_example.pywith ajsonargparseCLI that runs Pegasus analysis + per-frame detection/annotation. - Adds accompanying example documentation (
README.md), dependencies (requirements.txt), and local ignore rules (.gitignore). - Adds the new example entry to
examples/README.md.
Assessment (per guidelines):
- Code quality: 4/5
- Testing: 3/5 (example-only; no repo tests expected)
- Documentation: 4/5
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| examples/twelvelabs_video_understanding/twelvelabs_example.py | New runnable example script combining Pegasus video summary with per-frame Supervision detections. |
| examples/twelvelabs_video_understanding/requirements.txt | Declares dependencies needed to run the example. |
| examples/twelvelabs_video_understanding/README.md | Provides install/run instructions and argument documentation for the example. |
| examples/twelvelabs_video_understanding/.gitignore | Prevents committing local data, weights, and generated media for this example. |
| examples/README.md | Registers the new example in the examples list. |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi! I'm Mohit, I work at TwelveLabs (@mohit-twelvelabs).
Description
This adds a new
examples/twelvelabs_video_understanding/example that pairs per-frame Supervision detections with a whole-video understanding generated by TwelveLabs Pegasus.Supervision answers "what object is where, in each frame" with precise, box-level detections. Pegasus answers "what is this video about" with a natural-language summary of the whole clip. Running both on the same video combines fine-grained detection with a high-level narrative — useful for captioning, search, moderation, or generating a quick description alongside annotated output.
Type of Change
supervisionpackage or any existing behavior)Motivation and Context
Supervision is model-agnostic and already ships examples that plug in detectors (Ultralytics, Inference, RF-DETR). This example shows how to enrich those detections with video-level understanding, an angle none of the existing examples cover. It's entirely opt-in and additive — it lives only under
examples/, imports nothing into the library, and changes no defaults.Changes Made
examples/twelvelabs_video_understanding/withtwelvelabs_example.py,README.md,requirements.txt, and.gitignore.jsonargparseauto_clientrypoint,from __future__ import annotations, Google-style docstrings, and the same README layout/sections as the other examples.examples/README.md.Testing
ruff checkandruff format --checkon the new script — both pass with the repo config.analyze_video()path end-to-end against the live TwelveLabs Pegasus API (pegasus1.5) with a short generated clip; it returned a correct natural-language description of the video content.The example needs a TwelveLabs API key to run the Pegasus call (read from
--api_keyor theTWELVELABS_API_KEYenv var). You can grab a free API key at https://twelvelabs.io — there's a generous free tier.Additional Notes
Happy to adjust naming, placement, or the README wording to match maintainer preferences.