🏭 OpenFabrik

Bootstrap Your Computer Vision Models Without Data or Annotations

Open-source synthetic data generation for object detection and segmentation

Quick Start • Pipelines • Examples • Documentation • Community

🎯 The Problem

Training computer vision models requires thousands of labeled images. Data gathering and manual annotation is expensive, time-consuming, and becomes a bottleneck for rapid prototyping and experimentation.

✨ The Solution

OpenFabrik automatically generates unlimited synthetic training data with perfect annotations. No dataset collection, no manual labeling, no waiting weeks for annotators.

Just describe what you want to detect, and OpenFabrik generates fully annotated datasets ready for training YOLOv8, YOLOv10, or any modern detection/segmentation model.

both_pipelines_generation_video.mp4

📋 Updates

Date	Change
2026-03-06	Integrated SAM3 as the default annotator for the Scene Generation Pipeline — native multi-class support via Promptable Concept Segmentation (PCS), sequential per-class strategy for 24 GB VRAM compatibility. Use `--annotator grounded_sam2` to fall back to the original Grounding DINO + SAM2 pipeline.

📋 Use Cases

🔬 ML Research

Rapidly prototype new architectures without waiting for dataset collection and annotation

🏭 Industrial Vision

Bootstrap models for manufacturing QA, defect detection, and inventory management

🤖 Robotics

Generate multi-perspective datasets for robot manipulation and navigation tasks

How Does OpenFabrik Compare?

	OpenFabrik	3D Rendering _{(Omniverse, BlenderProc, Kubric, Gazebo)}	Diffusion Academic _{(GeoDiffusion, InstaGen, DatasetDM)}	Commercial _{(Synthesis AI, Anyverse, Datagen)}
Input	Text or 1 photo	3D assets + scenes	Text + layouts	3D assets (provided)
3D Assets Required	No	Yes	No	Yes
Auto-Annotation	Open-set (any class)	From 3D scene	Partial / custom	Built-in
Output Format	YOLO (bbox, seg)	COCO, KITTI, custom	Custom	COCO, custom
Multi-View	Yes (6 views)	Yes	No	Yes
Runs Locally	Yes	Yes	Yes	No (cloud)
End-to-End	Yes	Yes	Partial	Yes
Open Source	Yes	Mixed	Yes	No

Why OpenFabrik? No 3D assets needed, open-set detection for any object class, production-ready YOLO output, and fully local and open-source — all in one end-to-end pipeline.

See detailed tool-by-tool comparison

3D Rendering Frameworks — Require pre-built 3D assets and scenes: Omniverse Replicator | BlenderProc | Infinigen | Kubric | Unreal / Unity / Gazebo

Diffusion-Based Methods — Academic research, typically generation-only or partial pipelines: GeoDiffusion (ICLR 2024) | DiffusionEngine (2024) | InstaGen (CVPR 2024) | DatasetDM (NeurIPS 2023) | X-Paste (ICML 2023)

Commercial Platforms — Cloud-based, require enterprise pricing: Synthesis AI | Anyverse | Rendered.ai | Datagen

🚀 Key Features

🎨 Zero Data Required - Generate datasets from text descriptions or reference images
🏷️ Zero Manual Labeling - Perfect annotations generated automatically
🔄 Unlimited Diversity - Generate infinite variations with different backgrounds, lighting, and perspectives
🎯 Open-Set Detection - Detect any object class via text prompts (powered by Grounding-DINO + SAM2)
📐 Multi-View Generation - Create datasets from multiple viewing angles automatically
⚡ Production Ready - YOLO-compatible output, ROS2 integration, TensorRT export
🌍 Fully Open Source - Built on state-of-the-art open models (FLUX, SAM2, Qwen Multicamera, Zero123++)

⚡ Quick Start

Prerequisites

Note: A GPU with CUDA capabilities and >=24GB VRAM required.

# Install Ollama (for LLM-based prompt generation)
curl -fsSL https://ollama.com/install.sh | sh

# Pull an LLM model
ollama pull cogito:latest

# Conda env creation [optional]
conda create -n openfabrik python=3.10 -y
conda activate openfabrik

# Download all pipeline models into your cache_dir (up to 45GB)
pip install huggingface_hub
git clone https://github.com/cvar-vision-dl/OpenFabrik
cd OpenFabrik
python utilities/download_models.py --cache_dir ./my_cache_dir --all

Installation

# Conda env creation [optional]
conda activate openfabrik
cd OpenFabrik

# Install dependencies
pip install -r requirements.txt

# Install SAM3
cd .. # Do not clone it inside OpenFabrik folder
git clone https://github.com/facebookresearch/sam3
cd sam3 && pip install -e . && cd ..

# Clone Grounded Sam 2 repository
cd OpenFabrik # This one HAS to be inside OpenFabrik's folder
git clone https://github.com/alejodosr/Grounded-SAM-2  
cd Grounded-SAM-2
pip install -e .
pip install --no-build-isolation -e grounding_dino
cd ..

# Clone PerSam repository
cd OpenFabrik
git clone https://github.com/alejodosr/Personalize-SAM
cd Personalize-SAM
pip install -e .
cd ..

Generate Your First Dataset (3 minutes)

# Scene Generation Pipeline - multi-object detection dataset
python pipelines/scene_generation_pipeline.py \
  --working_dir ./my_dataset \
  --session new \
  --run_prompts --run_images --run_annotations \
  --project_info_file examples/prompts/kitchen_objects.txt \
  --predefined_classes "cup,bottle,glass,plate,spoon,knife,fork,bowl" \
  --num_prompts_per_execution 10 \
  --num_random_imgs 2 \
  --cache_dir ./my_cache_dir

# Output: YOLO-format dataset at ./my_dataset/YYYYMMDD/outputs/

✅ Done! Your dataset is ready to train with any YOLO from ultalytics:

yolo train data=./my_dataset/YYYYMMDD/outputs/dataset.yaml model=yolov8n.pt epochs=100

🏗️ Pipelines

OpenFabrik provides two specialized pipelines for different use cases:

1️⃣ Scene Generation Pipeline

Best for: Multi-object detection, general-purpose datasets, rapid prototyping

How it works:

📝 LLM generates prompts - Describe your target objects and scenes
🎨 FLUX creates synthetic images - State-of-the-art diffusion model generates diverse scenes
🏷️ Auto-annotation - Grounding-DINO detects objects, SAM2 generates precise masks

Output: YOLO segmentation dataset with bounding boxes and masks

scene_generation_video.mp4

python pipelines/scene_generation_pipeline.py \
  --working_dir ./my_dataset \
  --session new \
  --run_prompts --run_images --run_annotations \
  --project_info_file examples/prompts/kitchen_objects.txt \
  --predefined_classes "cup,bottle,glass,plate,spoon,knife,fork,bowl" \
  --num_prompts_per_execution 10 \
  --num_random_imgs 2 \
  --cache_dir ./my_cache_dir

Key Features:

Supports batch or iterative prompt generation
Multiple executions with automatic result merging
Configurable image variations per prompt
Session persistence - resume from any step

2️⃣ Reference Object Pipeline

Best for: Custom objects, multi-view datasets, robotics manipulation

How it works:

📸 Start with one reference image - Upload a photo of your target object
🔄 Generate multiple perspectives - Qwen Multicamera (default) or Zero123++ creates multi-view representations
🌍 Augment contexts - Place object in diverse environments and lighting
🏷️ Auto-annotation - PerSAM + SAM2 for reference-based segmentation

Output: YOLO segmentation dataset with multi-perspective annotations

ref_object_generation_video.mp4

For this pipeline, you need a reference image and a reference mask for that image. If you don't have a reference mask, you can generate it with this utility:

# Generate mask for image
python utilities/sam_mask_labeler.py \
--image ./examples/pikachu_bag.jpg \
--cache_dir ./my_cache_dir \
--outputu_dir ./my_output_folder

In case you already have the mask for the reference image, just proceed to the pipeline:

python pipelines/reference_object_pipeline.py \
  --input_image ./examples/pikachu_bag.jpg \
  --input_mask ./examples/pikachu_bag_mask.png \
  --project_info_file ./examples/prompts/project_pikachu.txt \
  --object_name "pikachu bag" \
  --num_prompts 10 \
  --num_iterations 1 \
  --working_dir ./datasets/my_product \
  --enable_annotation \
  --enable_qwen_augmentation \
  --qwen_augmentation_count 2 \
  --enable_cv_augmentation \
  --cv_augmentation_count 2 \
  --cache_dir ./my_cache_dir

Key Features:

Multi-perspective 3D-aware generation
Reference-based segmentation (no class labels needed)
Decoupled augmentation strategies:
- Generative augmentation: Change lighting, context, occlusions
- CV augmentation: Motion blur, compression, B/W, contrast
Robust retry mechanisms with automatic server restart

📊 Pipeline Comparison

Feature	Scene Generation	Reference Object
Input	Text descriptions	Single reference image
Best For	Multi-object detection	Custom single-object
Perspectives	Single view	Multi-view (configurable)
Generation	FLUX only	FLUX + Qwen Multicamera / Zero123++
Annotation	Grounded-SAM2	PerSAM + SAM2
Augmentation	None (built into generation)	Generative + CV
Use Case	General datasets	Robotics, specialized objects

🎬 Examples

Example 1: Industrial Parts Detection

# Generate dataset for factory automation
python pipelines/scene_generation_pipeline.py \
  --predefined_classes bolt nut washer screw gear \
  --num_prompts_per_execution 100 \
  --num_random_imgs 5 \
  --working_dir ./datasets/industrial_parts \
  --cache_dir ./my_cache_dir \
  --session new --run_prompts --run_images --run_annotations

Example 2: Custom Product Recognition

# Train model to recognize your specific product from all angles
python pipelines/reference_object_pipeline.py \
  --input_image ./my_product_photo.jpg \
  --input_mask ./my_product_mask.png \
  --object_name "my_product" \
  --working_dir ./datasets/product_detection \
  --enable_annotation \
  --enable_qwen_augmentation \
  --enable_cv_augmentation \
  --cache_dir ./my_cache_dir

Example 3: Resume from Last Session

# Resume previous run (annotations only)
python pipelines/scene_generation_pipeline.py \
  --working_dir ./datasets/office_objects \
  --session last \
  --run_annotations \
  --cache_dir ./my_cache_dir

🛠️ Utilities

OpenFabrik includes comprehensive utilities for the entire ML pipeline:

YOLO Training & Export

# Train model
python utilities/yolo_scripts/yolo_training.py \
  --dataset ./datasets/my_dataset/dataset.yaml \
  --model yolov8n-seg.pt \
  --epochs 100

# Export to ONNX
python utilities/yolo_scripts/yolo_export_onnx.py \
  --model ./runs/train/weights/best.pt

# Export to TensorRT (for production deployment)
python utilities/yolo_scripts/yolo_export_tensorrt.py \
  --model ./runs/train/weights/best.pt

Dataset Management

# Get dataset statistics
python utilities/yolo_scripts/statistics_yolo_dataset.py \
  --dataset ./datasets/my_dataset/dataset.yaml

# Split dataset into train/val
python utilities/yolo_scripts/yolo_split_dataset.py \
  --dataset ./raw_dataset \
  --output ./split_dataset \
  --split 0.8

ROS2 Real-Time Inference

# Publish detections to ROS2 topics
python utilities/ros2_scripts/yolo_segmentation_publisher.py \
  --model ./weights/best.pt \
  --input-topic /camera/image_raw \
  --output-topic /detections/segmentation \
  --tensorrt  # Use TensorRT for faster inference

For high-performance C++ ROS2 YOLO inference, see yolo-ros2-inference.

🧠 Architecture

OpenFabrik follows a modular, pipeline-based architecture:

┌─────────────────────────────────────────────────────────┐
│                  PIPELINES                              │
├─────────────────────────────────────────────────────────┤
│  Scene Generation  │  Reference Object                  │
│  (multi-object)    │  (single object, multi-view)       │
└─────────────────────────────────────────────────────────┘
                     │
    ┌────────────────┼────────────────┐
    │                │                │
┌──────────────┐ ┌────────────┐ ┌──────────────┐
│  Generation  │ │ Annotation │ │ Augmentation │
│   Modules    │ │  Modules   │ │   Modules    │
├──────────────┤ ├────────────┤ ├──────────────┤
│ • LLM       │ │ • Grounding│ │ • Generative │
│   Prompts   │ │   DINO     │ │   (Qwen Edit)│
│ • FLUX      │ │ • SAM2     │ │ • CV Augment │
│ • Qwen      │ │ • PerSAM   │ │              │
│   Multicam  │ │            │ │              │
│ • Zero123++ │ │            │ │              │
└──────────────┘ └────────────┘ └──────────────┘
                     │
        ┌────────────▼────────────┐
        │   YOLO Dataset Output   │
        │  (ready for training)   │
        └─────────────────────────┘

📚 Documentation

🎓 How It Works

OpenFabrik combines state-of-the-art models into automated pipelines:

Scene Generation Pipeline

LLM Prompt Generation (Ollama)
- Generates diverse scene descriptions
- Supports iterative or batch mode
- Maintains conversation context for diversity
Image Synthesis (FLUX Diffusion)
- State-of-the-art text-to-image generation
- Configurable resolution and quality
- Memory-optimized for batch processing
Auto-Annotation (Grounding-DINO + SAM2)
- Open-set object detection (any class via text)
- Precise segmentation masks
- Global class consistency across dataset

Reference Object Pipeline

White Background Generation (Qwen Edit)
- Clean object isolation
- Optimized for multi-view generation
Multi-Perspective Generation (Qwen Multicamera / Zero123++)
- Qwen Multicamera (default): generates configurable view/elevation/distance combinations using a LoRA adapter
- Zero123++ (alternative): 6 viewing angles with 3D-aware transformations
- Consistent object appearance across perspectives
Context Generation (LLM + FLUX)
- Places object in diverse environments
- Multiple iterations per perspective
- Configurable scene complexity
Reference-Based Annotation (PerSAM + SAM2)
- Segments object across all generated images
- No class labels needed
- High precision with reference guidance
Augmentation (Generative + CV)
- Generative: Lighting, occlusions, weather
- CV: Motion blur, compression, B/W, contrast

📝 Citation

If you use OpenFabrik in your research, please cite:

@misc{openfabrik2025,
    author = {Rodriguez-Ramos, Alejandro and Campoy, Pascual},
    title = {OpenFabrik: Bootstrap Your Computer Vision Models Without Data or Annotations},
    howpublished = "\url{https://github.com/cvar-vision-dl/OpenFabrik}",
    doi = {10.5281/zenodo.18669083},
    year = {2025}
}

🤝 Community

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

Related Work

OpenFabrik builds on these excellent projects:

FLUX - Text-to-image diffusion
Qwen Multicamera - Multi-view generation (default)
Zero123++ - Multi-view generation (alternative)
Grounding-DINO - Open-set detection
SAM2 - Segmentation
PerSAM - Reference segmentation
Ultralytics YOLO - Object detection framework
YOLO ROS2 Inference - Real-time ROS2 C++ YOLO inference

Acknowledgments

Special thanks to:

The open-source computer vision community
All contributors and users providing feedback
Thanks to @GPatiA2 for recommending Qwen Multicamera, which upgraded the quality of one of the pipielines

📄 License

OpenFabrik is released under the MIT License. See LICENSE for details.

🌟 Star History

If OpenFabrik helps your project, consider giving it a star! ⭐

Author: Alejandro Rodríguez-Ramos [alejandrorodriguezramos.me]

Built with ❤️ by Computer Vision and Aerial Robotics (CVAR) for the open-source community

Report Bug • Request Feature • Discussions

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
docs		docs
examples		examples
modules		modules
pipelines		pipelines
utilities		utilities
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🏭 OpenFabrik

🎯 The Problem

✨ The Solution

📋 Updates

📋 Use Cases

🔬 ML Research

🏭 Industrial Vision

🤖 Robotics

How Does OpenFabrik Compare?

🚀 Key Features

⚡ Quick Start

Prerequisites

Installation

Generate Your First Dataset (3 minutes)

🏗️ Pipelines

1️⃣ Scene Generation Pipeline

2️⃣ Reference Object Pipeline

📊 Pipeline Comparison

🎬 Examples

Example 1: Industrial Parts Detection

Example 2: Custom Product Recognition

Example 3: Resume from Last Session

🛠️ Utilities

YOLO Training & Export

Dataset Management

ROS2 Real-Time Inference

🧠 Architecture

📚 Documentation

🎓 How It Works

Scene Generation Pipeline

Reference Object Pipeline

📝 Citation

🤝 Community

Contributing

Related Work

Acknowledgments

📄 License

🌟 Star History

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages