Bootstrap Your Computer Vision Models Without Data or Annotations
Open-source synthetic data generation for object detection and segmentation
Quick Start β’ Pipelines β’ Examples β’ Documentation β’ Community
Training computer vision models requires thousands of labeled images. Data gathering and manual annotation is expensive, time-consuming, and becomes a bottleneck for rapid prototyping and experimentation.
OpenFabrik automatically generates unlimited synthetic training data with perfect annotations. No dataset collection, no manual labeling, no waiting weeks for annotators.
Just describe what you want to detect, and OpenFabrik generates fully annotated datasets ready for training YOLOv8, YOLOv10, or any modern detection/segmentation model.
both_pipelines_generation_video.mp4
| Date | Change |
|---|---|
| 2026-03-06 | Integrated SAM3 as the default annotator for the Scene Generation Pipeline β native multi-class support via Promptable Concept Segmentation (PCS), sequential per-class strategy for 24 GB VRAM compatibility. Use --annotator grounded_sam2 to fall back to the original Grounding DINO + SAM2 pipeline. |
|
Rapidly prototype new architectures without waiting for dataset collection and annotation |
Bootstrap models for manufacturing QA, defect detection, and inventory management |
Generate multi-perspective datasets for robot manipulation and navigation tasks |
| OpenFabrik | 3D Rendering (Omniverse, BlenderProc, Kubric, Gazebo) |
Diffusion Academic (GeoDiffusion, InstaGen, DatasetDM) |
Commercial (Synthesis AI, Anyverse, Datagen) |
|
|---|---|---|---|---|
| Input | Text or 1 photo | 3D assets + scenes | Text + layouts | 3D assets (provided) |
| 3D Assets Required | No | Yes | No | Yes |
| Auto-Annotation | Open-set (any class) | From 3D scene | Partial / custom | Built-in |
| Output Format | YOLO (bbox, seg) | COCO, KITTI, custom | Custom | COCO, custom |
| Multi-View | Yes (6 views) | Yes | No | Yes |
| Runs Locally | Yes | Yes | Yes | No (cloud) |
| End-to-End | Yes | Yes | Partial | Yes |
| Open Source | Yes | Mixed | Yes | No |
Why OpenFabrik? No 3D assets needed, open-set detection for any object class, production-ready YOLO output, and fully local and open-source β all in one end-to-end pipeline.
See detailed tool-by-tool comparison
3D Rendering Frameworks β Require pre-built 3D assets and scenes: Omniverse Replicator | BlenderProc | Infinigen | Kubric | Unreal / Unity / Gazebo
Diffusion-Based Methods β Academic research, typically generation-only or partial pipelines: GeoDiffusion (ICLR 2024) | DiffusionEngine (2024) | InstaGen (CVPR 2024) | DatasetDM (NeurIPS 2023) | X-Paste (ICML 2023)
Commercial Platforms β Cloud-based, require enterprise pricing: Synthesis AI | Anyverse | Rendered.ai | Datagen
- π¨ Zero Data Required - Generate datasets from text descriptions or reference images
- π·οΈ Zero Manual Labeling - Perfect annotations generated automatically
- π Unlimited Diversity - Generate infinite variations with different backgrounds, lighting, and perspectives
- π― Open-Set Detection - Detect any object class via text prompts (powered by Grounding-DINO + SAM2)
- π Multi-View Generation - Create datasets from multiple viewing angles automatically
- β‘ Production Ready - YOLO-compatible output, ROS2 integration, TensorRT export
- π Fully Open Source - Built on state-of-the-art open models (FLUX, SAM2, Qwen Multicamera, Zero123++)
Note: A GPU with CUDA capabilities and >=24GB VRAM required.
# Install Ollama (for LLM-based prompt generation)
curl -fsSL https://ollama.com/install.sh | sh
# Pull an LLM model
ollama pull cogito:latest
# Conda env creation [optional]
conda create -n openfabrik python=3.10 -y
conda activate openfabrik
# Download all pipeline models into your cache_dir (up to 45GB)
pip install huggingface_hub
git clone https://github.com/cvar-vision-dl/OpenFabrik
cd OpenFabrik
python utilities/download_models.py --cache_dir ./my_cache_dir --all# Conda env creation [optional]
conda activate openfabrik
cd OpenFabrik
# Install dependencies
pip install -r requirements.txt
# Install SAM3
cd .. # Do not clone it inside OpenFabrik folder
git clone https://github.com/facebookresearch/sam3
cd sam3 && pip install -e . && cd ..
# Clone Grounded Sam 2 repository
cd OpenFabrik # This one HAS to be inside OpenFabrik's folder
git clone https://github.com/alejodosr/Grounded-SAM-2
cd Grounded-SAM-2
pip install -e .
pip install --no-build-isolation -e grounding_dino
cd ..
# Clone PerSam repository
cd OpenFabrik
git clone https://github.com/alejodosr/Personalize-SAM
cd Personalize-SAM
pip install -e .
cd ..# Scene Generation Pipeline - multi-object detection dataset
python pipelines/scene_generation_pipeline.py \
--working_dir ./my_dataset \
--session new \
--run_prompts --run_images --run_annotations \
--project_info_file examples/prompts/kitchen_objects.txt \
--predefined_classes "cup,bottle,glass,plate,spoon,knife,fork,bowl" \
--num_prompts_per_execution 10 \
--num_random_imgs 2 \
--cache_dir ./my_cache_dir
# Output: YOLO-format dataset at ./my_dataset/YYYYMMDD/outputs/β Done! Your dataset is ready to train with any YOLO from ultalytics:
yolo train data=./my_dataset/YYYYMMDD/outputs/dataset.yaml model=yolov8n.pt epochs=100OpenFabrik provides two specialized pipelines for different use cases:
Best for: Multi-object detection, general-purpose datasets, rapid prototyping
How it works:
- π LLM generates prompts - Describe your target objects and scenes
- π¨ FLUX creates synthetic images - State-of-the-art diffusion model generates diverse scenes
- π·οΈ Auto-annotation - Grounding-DINO detects objects, SAM2 generates precise masks
Output: YOLO segmentation dataset with bounding boxes and masks
scene_generation_video.mp4
python pipelines/scene_generation_pipeline.py \
--working_dir ./my_dataset \
--session new \
--run_prompts --run_images --run_annotations \
--project_info_file examples/prompts/kitchen_objects.txt \
--predefined_classes "cup,bottle,glass,plate,spoon,knife,fork,bowl" \
--num_prompts_per_execution 10 \
--num_random_imgs 2 \
--cache_dir ./my_cache_dirKey Features:
- Supports batch or iterative prompt generation
- Multiple executions with automatic result merging
- Configurable image variations per prompt
- Session persistence - resume from any step
Best for: Custom objects, multi-view datasets, robotics manipulation
How it works:
- πΈ Start with one reference image - Upload a photo of your target object
- π Generate multiple perspectives - Qwen Multicamera (default) or Zero123++ creates multi-view representations
- π Augment contexts - Place object in diverse environments and lighting
- π·οΈ Auto-annotation - PerSAM + SAM2 for reference-based segmentation
Output: YOLO segmentation dataset with multi-perspective annotations
ref_object_generation_video.mp4
For this pipeline, you need a reference image and a reference mask for that image. If you don't have a reference mask, you can generate it with this utility:
# Generate mask for image
python utilities/sam_mask_labeler.py \
--image ./examples/pikachu_bag.jpg \
--cache_dir ./my_cache_dir \
--outputu_dir ./my_output_folderIn case you already have the mask for the reference image, just proceed to the pipeline:
python pipelines/reference_object_pipeline.py \
--input_image ./examples/pikachu_bag.jpg \
--input_mask ./examples/pikachu_bag_mask.png \
--project_info_file ./examples/prompts/project_pikachu.txt \
--object_name "pikachu bag" \
--num_prompts 10 \
--num_iterations 1 \
--working_dir ./datasets/my_product \
--enable_annotation \
--enable_qwen_augmentation \
--qwen_augmentation_count 2 \
--enable_cv_augmentation \
--cv_augmentation_count 2 \
--cache_dir ./my_cache_dirKey Features:
- Multi-perspective 3D-aware generation
- Reference-based segmentation (no class labels needed)
- Decoupled augmentation strategies:
- Generative augmentation: Change lighting, context, occlusions
- CV augmentation: Motion blur, compression, B/W, contrast
- Robust retry mechanisms with automatic server restart
| Feature | Scene Generation | Reference Object |
|---|---|---|
| Input | Text descriptions | Single reference image |
| Best For | Multi-object detection | Custom single-object |
| Perspectives | Single view | Multi-view (configurable) |
| Generation | FLUX only | FLUX + Qwen Multicamera / Zero123++ |
| Annotation | Grounded-SAM2 | PerSAM + SAM2 |
| Augmentation | None (built into generation) | Generative + CV |
| Use Case | General datasets | Robotics, specialized objects |
# Generate dataset for factory automation
python pipelines/scene_generation_pipeline.py \
--predefined_classes bolt nut washer screw gear \
--num_prompts_per_execution 100 \
--num_random_imgs 5 \
--working_dir ./datasets/industrial_parts \
--cache_dir ./my_cache_dir \
--session new --run_prompts --run_images --run_annotations# Train model to recognize your specific product from all angles
python pipelines/reference_object_pipeline.py \
--input_image ./my_product_photo.jpg \
--input_mask ./my_product_mask.png \
--object_name "my_product" \
--working_dir ./datasets/product_detection \
--enable_annotation \
--enable_qwen_augmentation \
--enable_cv_augmentation \
--cache_dir ./my_cache_dir# Resume previous run (annotations only)
python pipelines/scene_generation_pipeline.py \
--working_dir ./datasets/office_objects \
--session last \
--run_annotations \
--cache_dir ./my_cache_dirOpenFabrik includes comprehensive utilities for the entire ML pipeline:
# Train model
python utilities/yolo_scripts/yolo_training.py \
--dataset ./datasets/my_dataset/dataset.yaml \
--model yolov8n-seg.pt \
--epochs 100
# Export to ONNX
python utilities/yolo_scripts/yolo_export_onnx.py \
--model ./runs/train/weights/best.pt
# Export to TensorRT (for production deployment)
python utilities/yolo_scripts/yolo_export_tensorrt.py \
--model ./runs/train/weights/best.pt# Get dataset statistics
python utilities/yolo_scripts/statistics_yolo_dataset.py \
--dataset ./datasets/my_dataset/dataset.yaml
# Split dataset into train/val
python utilities/yolo_scripts/yolo_split_dataset.py \
--dataset ./raw_dataset \
--output ./split_dataset \
--split 0.8# Publish detections to ROS2 topics
python utilities/ros2_scripts/yolo_segmentation_publisher.py \
--model ./weights/best.pt \
--input-topic /camera/image_raw \
--output-topic /detections/segmentation \
--tensorrt # Use TensorRT for faster inferenceFor high-performance C++ ROS2 YOLO inference, see yolo-ros2-inference.
OpenFabrik follows a modular, pipeline-based architecture:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PIPELINES β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Scene Generation β Reference Object β
β (multi-object) β (single object, multi-view) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββΌβββββββββββββββββ
β β β
ββββββββββββββββ ββββββββββββββ ββββββββββββββββ
β Generation β β Annotation β β Augmentation β
β Modules β β Modules β β Modules β
ββββββββββββββββ€ ββββββββββββββ€ ββββββββββββββββ€
β β’ LLM β β β’ Groundingβ β β’ Generative β
β Prompts β β DINO β β (Qwen Edit)β
β β’ FLUX β β β’ SAM2 β β β’ CV Augment β
β β’ Qwen β β β’ PerSAM β β β
β Multicam β β β β β
β β’ Zero123++ β β β β β
ββββββββββββββββ ββββββββββββββ ββββββββββββββββ
β
ββββββββββββββΌβββββββββββββ
β YOLO Dataset Output β
β (ready for training) β
βββββββββββββββββββββββββββ
OpenFabrik combines state-of-the-art models into automated pipelines:
-
LLM Prompt Generation (Ollama)
- Generates diverse scene descriptions
- Supports iterative or batch mode
- Maintains conversation context for diversity
-
Image Synthesis (FLUX Diffusion)
- State-of-the-art text-to-image generation
- Configurable resolution and quality
- Memory-optimized for batch processing
-
Auto-Annotation (Grounding-DINO + SAM2)
- Open-set object detection (any class via text)
- Precise segmentation masks
- Global class consistency across dataset
-
White Background Generation (Qwen Edit)
- Clean object isolation
- Optimized for multi-view generation
-
Multi-Perspective Generation (Qwen Multicamera / Zero123++)
- Qwen Multicamera (default): generates configurable view/elevation/distance combinations using a LoRA adapter
- Zero123++ (alternative): 6 viewing angles with 3D-aware transformations
- Consistent object appearance across perspectives
-
Context Generation (LLM + FLUX)
- Places object in diverse environments
- Multiple iterations per perspective
- Configurable scene complexity
-
Reference-Based Annotation (PerSAM + SAM2)
- Segments object across all generated images
- No class labels needed
- High precision with reference guidance
-
Augmentation (Generative + CV)
- Generative: Lighting, occlusions, weather
- CV: Motion blur, compression, B/W, contrast
If you use OpenFabrik in your research, please cite:
@misc{openfabrik2025,
author = {Rodriguez-Ramos, Alejandro and Campoy, Pascual},
title = {OpenFabrik: Bootstrap Your Computer Vision Models Without Data or Annotations},
howpublished = "\url{https://github.com/cvar-vision-dl/OpenFabrik}",
doi = {10.5281/zenodo.18669083},
year = {2025}
}We welcome contributions! See CONTRIBUTING.md for guidelines.
OpenFabrik builds on these excellent projects:
- FLUX - Text-to-image diffusion
- Qwen Multicamera - Multi-view generation (default)
- Zero123++ - Multi-view generation (alternative)
- Grounding-DINO - Open-set detection
- SAM2 - Segmentation
- PerSAM - Reference segmentation
- Ultralytics YOLO - Object detection framework
- YOLO ROS2 Inference - Real-time ROS2 C++ YOLO inference
Special thanks to:
- The open-source computer vision community
- All contributors and users providing feedback
- Thanks to @GPatiA2 for recommending Qwen Multicamera, which upgraded the quality of one of the pipielines
OpenFabrik is released under the MIT License. See LICENSE for details.
If OpenFabrik helps your project, consider giving it a star! β
Author: Alejandro RodrΓguez-Ramos [alejandrorodriguezramos.me]
Built with β€οΈ by Computer Vision and Aerial Robotics (CVAR) for the open-source community
Report Bug β’ Request Feature β’ Discussions