Segment anything in images and videos using Meta's SAM3 (Segment Anything with Concepts) within Griptape Nodes. Use natural language text prompts to identify and segment specific objects with state-of-the-art AI segmentation.
- Text-Based Segmentation: Describe what you want to segment using natural language (e.g., "person", "car", "dog")
- Image Segmentation: Segment objects in single images with high precision
- Video Segmentation: Track and segment objects across video frames automatically
- Multi-Object Support: Segment multiple objects of the same type in a single pass
- Colored Mask Overlays: Visualize segmentation results with customizable colored overlays
- Confidence Filtering: Filter results by confidence score threshold
- HuggingFace Integration: Automatic model downloading from HuggingFace Hub
- GPU Acceleration: CUDA support with TF32 optimization for Ampere+ GPUs
- Griptape Nodes installed and running
- Python 3.12 or higher
- CUDA-compatible GPU with sufficient VRAM (8GB+ recommended)
- HuggingFace account with access to SAM3 model
- Windows only: Visual Studio Build Tools with C++ compiler (see below)
SAM3 uses Triton which requires a C++ compiler on Windows .
-
Download Build Tools for Visual Studio 2022
-
Run the installer and select "Desktop development with C++"
-
Ensure these components are selected (at minimum):
- MSVC v143 (or latest version)
- Windows 11 SDK (or Windows 10 SDK)
We recommend installing all default components for the C++ workload.
-
Restart your computer after installation
-
Download the library files to your Griptape Nodes libraries directory:
# Navigate to your Griptape Nodes libraries directory cd `gtn config show workspace_directory` # Clone the library with submodules git clone --recurse-submodules https://github.com/griptape-ai/griptape-nodes-library-sam3.git
-
Add the library in the Griptape Nodes Editor:
- Open the Settings menu and navigate to the Libraries settings
- Click on + Add Library at the bottom of the settings panel
- Enter the path to the library JSON file: your Griptape Nodes Workspace directory
/griptape-nodes-library-sam3/griptape_nodes_sam3_library/griptape-nodes-library.json - You can check your workspace directory with
gtn config show workspace_directory - Close the Settings Panel
- Click on Refresh Libraries
-
Verify installation by checking that the "SAM3 Segment Image" and "SAM3 Segment Video" nodes appear in your Griptape Nodes interface in the "SAM3" category.
SAM3 requires access to the gated model on HuggingFace:
- Request access to the SAM3 model at facebook/sam3
- Get your HuggingFace token from HuggingFace Settings
- Configure the token in Griptape Nodes:
- Open the Settings menu and navigate to Model Management
- Set your
HF_TOKENin the HuggingFace section - Alternatively, set it as an environment variable:
export HF_TOKEN="your-huggingface-token-here"
- Add the "SAM3 Segment Image" node to your workflow
- Connect an image to the
input_imageinput - Enter a text prompt describing what to segment (e.g., "person", "cat", "bicycle")
- Configure optional settings:
max_masks: Maximum number of masks to returnscore_threshold: Minimum confidence score (0.0-1.0)
- Run the node to generate segmentation masks
Outputs:
output_masks: List of individual mask imagesoutput_composite: Original image with colored mask overlaysnum_masks_found: Number of objects segmented
- Add the "SAM3 Segment Video" node to your workflow
- Connect a video to the
input_videoinput - Enter a text prompt describing what to segment
- Configure optional settings:
prompt_frame: Frame index to apply the initial prompt (default: 0)mask_opacity: Opacity of mask overlays (0.0-1.0)
- Run the node to generate a video with segmentation masks
Outputs:
output_video: Video with colored mask overlaysnum_frames_processed: Total frames processednum_objects_found: Number of objects tracked
| Parameter | Type | Description | Default |
|---|---|---|---|
input_image |
ImageArtifact | Input image to segment | Required |
text_prompt |
String | Description of objects to segment | Required |
max_masks |
Integer | Maximum masks to return (1-100) | 10 |
score_threshold |
Float | Minimum confidence score (0.0-1.0) | 0.5 |
| Parameter | Type | Description | Default |
|---|---|---|---|
input_video |
VideoUrlArtifact | Input video to segment | Required |
text_prompt |
String | Description of objects to segment | Required |
prompt_frame |
Integer | Frame index for initial prompt | 0 |
mask_opacity |
Float | Mask overlay opacity (0.0-1.0) | 0.4 |
Extract specific objects from images for compositing, editing, or analysis.
Track and segment objects across video frames for visual effects or analysis.
Create masked overlays for presentations, thumbnails, or social media content.
Generate segmentation masks for machine learning dataset preparation.
Identify and highlight specific objects in images or video footage.
The library includes the following ML dependencies:
- PyTorch 2.7.0+ with CUDA support
- TorchVision 0.22.0+ for image processing
- SAM3 from Meta's official repository
- OpenCV for video processing
- Triton for GPU kernel optimization
SAM3 (Segment Anything with Concepts) extends the original SAM architecture with:
- Text-based prompt understanding via integrated language model
- Concept-aware segmentation for natural language queries
- Video propagation for temporal consistency across frames
- Multi-GPU support for efficient inference
- TF32 Enabled: Automatic TF32 precision for Ampere+ GPUs
- GPU Memory Management: Efficient VRAM usage with model caching
- Video Processing: Frame extraction and encoding optimized for throughput
Solution: The library auto-installs on first load. Check the console logs for installation progress or errors. Ensure you have internet connectivity for downloading dependencies.
Solution:
- Request access to the SAM3 model at facebook/sam3
- Verify your HF_TOKEN is set correctly in Model Management settings
Solutions:
- Close other GPU-intensive applications
- Reduce input image/video resolution
- Ensure no other models are loaded in memory
Solution: The output video uses H.264 encoding for broad compatibility. If playback issues occur, try opening with VLC or another media player that supports MP4.
Solutions:
- Ensure CUDA is available (check logs for GPU detection)
- For video, shorter clips process faster
- First run downloads model weights (~2GB), subsequent runs are faster
Check the node's logs output for detailed information including:
- Model loading status
- Segmentation progress
- Number of objects detected
- Processing time per frame
- Issues: GitHub Issues
- Griptape Community: Griptape Discord
- Documentation: Griptape Nodes Docs
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Made with love for the Griptape community