Skip to content

cvar-vision-dl/OpenFabrik

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

26 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🏭 OpenFabrik

Bootstrap Your Computer Vision Models Without Data or Annotations

Open-source synthetic data generation for object detection and segmentation

License: MIT Python 3.8+ PRs Welcome

Quick Start β€’ Pipelines β€’ Examples β€’ Documentation β€’ Community


🎯 The Problem

Training computer vision models requires thousands of labeled images. Data gathering and manual annotation is expensive, time-consuming, and becomes a bottleneck for rapid prototyping and experimentation.

✨ The Solution

OpenFabrik automatically generates unlimited synthetic training data with perfect annotations. No dataset collection, no manual labeling, no waiting weeks for annotators.

Just describe what you want to detect, and OpenFabrik generates fully annotated datasets ready for training YOLOv8, YOLOv10, or any modern detection/segmentation model.

both_pipelines_generation_video.mp4

πŸ“‹ Updates

Date Change
2026-03-06 Integrated SAM3 as the default annotator for the Scene Generation Pipeline β€” native multi-class support via Promptable Concept Segmentation (PCS), sequential per-class strategy for 24 GB VRAM compatibility. Use --annotator grounded_sam2 to fall back to the original Grounding DINO + SAM2 pipeline.

πŸ“‹ Use Cases

πŸ”¬ ML Research

Rapidly prototype new architectures without waiting for dataset collection and annotation

🏭 Industrial Vision

Bootstrap models for manufacturing QA, defect detection, and inventory management

πŸ€– Robotics

Generate multi-perspective datasets for robot manipulation and navigation tasks

How Does OpenFabrik Compare?

OpenFabrik 3D Rendering
(Omniverse, BlenderProc, Kubric, Gazebo)
Diffusion Academic
(GeoDiffusion, InstaGen, DatasetDM)
Commercial
(Synthesis AI, Anyverse, Datagen)
Input Text or 1 photo 3D assets + scenes Text + layouts 3D assets (provided)
3D Assets Required No Yes No Yes
Auto-Annotation Open-set (any class) From 3D scene Partial / custom Built-in
Output Format YOLO (bbox, seg) COCO, KITTI, custom Custom COCO, custom
Multi-View Yes (6 views) Yes No Yes
Runs Locally Yes Yes Yes No (cloud)
End-to-End Yes Yes Partial Yes
Open Source Yes Mixed Yes No

Why OpenFabrik? No 3D assets needed, open-set detection for any object class, production-ready YOLO output, and fully local and open-source β€” all in one end-to-end pipeline.

See detailed tool-by-tool comparison

3D Rendering Frameworks β€” Require pre-built 3D assets and scenes: Omniverse Replicator | BlenderProc | Infinigen | Kubric | Unreal / Unity / Gazebo

Diffusion-Based Methods β€” Academic research, typically generation-only or partial pipelines: GeoDiffusion (ICLR 2024) | DiffusionEngine (2024) | InstaGen (CVPR 2024) | DatasetDM (NeurIPS 2023) | X-Paste (ICML 2023)

Commercial Platforms β€” Cloud-based, require enterprise pricing: Synthesis AI | Anyverse | Rendered.ai | Datagen


πŸš€ Key Features

  • 🎨 Zero Data Required - Generate datasets from text descriptions or reference images
  • 🏷️ Zero Manual Labeling - Perfect annotations generated automatically
  • πŸ”„ Unlimited Diversity - Generate infinite variations with different backgrounds, lighting, and perspectives
  • 🎯 Open-Set Detection - Detect any object class via text prompts (powered by Grounding-DINO + SAM2)
  • πŸ“ Multi-View Generation - Create datasets from multiple viewing angles automatically
  • ⚑ Production Ready - YOLO-compatible output, ROS2 integration, TensorRT export
  • 🌍 Fully Open Source - Built on state-of-the-art open models (FLUX, SAM2, Qwen Multicamera, Zero123++)

⚑ Quick Start

Prerequisites

Note: A GPU with CUDA capabilities and >=24GB VRAM required.

# Install Ollama (for LLM-based prompt generation)
curl -fsSL https://ollama.com/install.sh | sh

# Pull an LLM model
ollama pull cogito:latest

# Conda env creation [optional]
conda create -n openfabrik python=3.10 -y
conda activate openfabrik

# Download all pipeline models into your cache_dir (up to 45GB)
pip install huggingface_hub
git clone https://github.com/cvar-vision-dl/OpenFabrik
cd OpenFabrik
python utilities/download_models.py --cache_dir ./my_cache_dir --all

Installation

# Conda env creation [optional]
conda activate openfabrik
cd OpenFabrik

# Install dependencies
pip install -r requirements.txt

# Install SAM3
cd .. # Do not clone it inside OpenFabrik folder
git clone https://github.com/facebookresearch/sam3
cd sam3 && pip install -e . && cd ..

# Clone Grounded Sam 2 repository
cd OpenFabrik # This one HAS to be inside OpenFabrik's folder
git clone https://github.com/alejodosr/Grounded-SAM-2  
cd Grounded-SAM-2
pip install -e .
pip install --no-build-isolation -e grounding_dino
cd ..

# Clone PerSam repository
cd OpenFabrik
git clone https://github.com/alejodosr/Personalize-SAM
cd Personalize-SAM
pip install -e .
cd ..

Generate Your First Dataset (3 minutes)

# Scene Generation Pipeline - multi-object detection dataset
python pipelines/scene_generation_pipeline.py \
  --working_dir ./my_dataset \
  --session new \
  --run_prompts --run_images --run_annotations \
  --project_info_file examples/prompts/kitchen_objects.txt \
  --predefined_classes "cup,bottle,glass,plate,spoon,knife,fork,bowl" \
  --num_prompts_per_execution 10 \
  --num_random_imgs 2 \
  --cache_dir ./my_cache_dir

# Output: YOLO-format dataset at ./my_dataset/YYYYMMDD/outputs/

βœ… Done! Your dataset is ready to train with any YOLO from ultalytics:

yolo train data=./my_dataset/YYYYMMDD/outputs/dataset.yaml model=yolov8n.pt epochs=100

πŸ—οΈ Pipelines

OpenFabrik provides two specialized pipelines for different use cases:

1️⃣ Scene Generation Pipeline

Best for: Multi-object detection, general-purpose datasets, rapid prototyping

How it works:

  1. πŸ“ LLM generates prompts - Describe your target objects and scenes
  2. 🎨 FLUX creates synthetic images - State-of-the-art diffusion model generates diverse scenes
  3. 🏷️ Auto-annotation - Grounding-DINO detects objects, SAM2 generates precise masks

Output: YOLO segmentation dataset with bounding boxes and masks

scene_generation_video.mp4
python pipelines/scene_generation_pipeline.py \
  --working_dir ./my_dataset \
  --session new \
  --run_prompts --run_images --run_annotations \
  --project_info_file examples/prompts/kitchen_objects.txt \
  --predefined_classes "cup,bottle,glass,plate,spoon,knife,fork,bowl" \
  --num_prompts_per_execution 10 \
  --num_random_imgs 2 \
  --cache_dir ./my_cache_dir

Key Features:

  • Supports batch or iterative prompt generation
  • Multiple executions with automatic result merging
  • Configurable image variations per prompt
  • Session persistence - resume from any step

2️⃣ Reference Object Pipeline

Best for: Custom objects, multi-view datasets, robotics manipulation

How it works:

  1. πŸ“Έ Start with one reference image - Upload a photo of your target object
  2. πŸ”„ Generate multiple perspectives - Qwen Multicamera (default) or Zero123++ creates multi-view representations
  3. 🌍 Augment contexts - Place object in diverse environments and lighting
  4. 🏷️ Auto-annotation - PerSAM + SAM2 for reference-based segmentation

Output: YOLO segmentation dataset with multi-perspective annotations

ref_object_generation_video.mp4

For this pipeline, you need a reference image and a reference mask for that image. If you don't have a reference mask, you can generate it with this utility:

# Generate mask for image
python utilities/sam_mask_labeler.py \
--image ./examples/pikachu_bag.jpg \
--cache_dir ./my_cache_dir \
--outputu_dir ./my_output_folder

In case you already have the mask for the reference image, just proceed to the pipeline:

python pipelines/reference_object_pipeline.py \
  --input_image ./examples/pikachu_bag.jpg \
  --input_mask ./examples/pikachu_bag_mask.png \
  --project_info_file ./examples/prompts/project_pikachu.txt \
  --object_name "pikachu bag" \
  --num_prompts 10 \
  --num_iterations 1 \
  --working_dir ./datasets/my_product \
  --enable_annotation \
  --enable_qwen_augmentation \
  --qwen_augmentation_count 2 \
  --enable_cv_augmentation \
  --cv_augmentation_count 2 \
  --cache_dir ./my_cache_dir

Key Features:

  • Multi-perspective 3D-aware generation
  • Reference-based segmentation (no class labels needed)
  • Decoupled augmentation strategies:
    • Generative augmentation: Change lighting, context, occlusions
    • CV augmentation: Motion blur, compression, B/W, contrast
  • Robust retry mechanisms with automatic server restart

πŸ“Š Pipeline Comparison

Feature Scene Generation Reference Object
Input Text descriptions Single reference image
Best For Multi-object detection Custom single-object
Perspectives Single view Multi-view (configurable)
Generation FLUX only FLUX + Qwen Multicamera / Zero123++
Annotation Grounded-SAM2 PerSAM + SAM2
Augmentation None (built into generation) Generative + CV
Use Case General datasets Robotics, specialized objects

🎬 Examples

Example 1: Industrial Parts Detection

# Generate dataset for factory automation
python pipelines/scene_generation_pipeline.py \
  --predefined_classes bolt nut washer screw gear \
  --num_prompts_per_execution 100 \
  --num_random_imgs 5 \
  --working_dir ./datasets/industrial_parts \
  --cache_dir ./my_cache_dir \
  --session new --run_prompts --run_images --run_annotations

Example 2: Custom Product Recognition

# Train model to recognize your specific product from all angles
python pipelines/reference_object_pipeline.py \
  --input_image ./my_product_photo.jpg \
  --input_mask ./my_product_mask.png \
  --object_name "my_product" \
  --working_dir ./datasets/product_detection \
  --enable_annotation \
  --enable_qwen_augmentation \
  --enable_cv_augmentation \
  --cache_dir ./my_cache_dir

Example 3: Resume from Last Session

# Resume previous run (annotations only)
python pipelines/scene_generation_pipeline.py \
  --working_dir ./datasets/office_objects \
  --session last \
  --run_annotations \
  --cache_dir ./my_cache_dir

πŸ› οΈ Utilities

OpenFabrik includes comprehensive utilities for the entire ML pipeline:

YOLO Training & Export

# Train model
python utilities/yolo_scripts/yolo_training.py \
  --dataset ./datasets/my_dataset/dataset.yaml \
  --model yolov8n-seg.pt \
  --epochs 100

# Export to ONNX
python utilities/yolo_scripts/yolo_export_onnx.py \
  --model ./runs/train/weights/best.pt

# Export to TensorRT (for production deployment)
python utilities/yolo_scripts/yolo_export_tensorrt.py \
  --model ./runs/train/weights/best.pt

Dataset Management

# Get dataset statistics
python utilities/yolo_scripts/statistics_yolo_dataset.py \
  --dataset ./datasets/my_dataset/dataset.yaml

# Split dataset into train/val
python utilities/yolo_scripts/yolo_split_dataset.py \
  --dataset ./raw_dataset \
  --output ./split_dataset \
  --split 0.8

ROS2 Real-Time Inference

# Publish detections to ROS2 topics
python utilities/ros2_scripts/yolo_segmentation_publisher.py \
  --model ./weights/best.pt \
  --input-topic /camera/image_raw \
  --output-topic /detections/segmentation \
  --tensorrt  # Use TensorRT for faster inference

For high-performance C++ ROS2 YOLO inference, see yolo-ros2-inference.


🧠 Architecture

OpenFabrik follows a modular, pipeline-based architecture:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  PIPELINES                              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Scene Generation  β”‚  Reference Object                  β”‚
β”‚  (multi-object)    β”‚  (single object, multi-view)       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚                β”‚                β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Generation  β”‚ β”‚ Annotation β”‚ β”‚ Augmentation β”‚
β”‚   Modules    β”‚ β”‚  Modules   β”‚ β”‚   Modules    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β€’ LLM       β”‚ β”‚ β€’ Groundingβ”‚ β”‚ β€’ Generative β”‚
β”‚   Prompts   β”‚ β”‚   DINO     β”‚ β”‚   (Qwen Edit)β”‚
β”‚ β€’ FLUX      β”‚ β”‚ β€’ SAM2     β”‚ β”‚ β€’ CV Augment β”‚
β”‚ β€’ Qwen      β”‚ β”‚ β€’ PerSAM   β”‚ β”‚              β”‚
β”‚   Multicam  β”‚ β”‚            β”‚ β”‚              β”‚
β”‚ β€’ Zero123++ β”‚ β”‚            β”‚ β”‚              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚   YOLO Dataset Output   β”‚
        β”‚  (ready for training)   β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“š Documentation


πŸŽ“ How It Works

OpenFabrik combines state-of-the-art models into automated pipelines:

Scene Generation Pipeline

  1. LLM Prompt Generation (Ollama)

    • Generates diverse scene descriptions
    • Supports iterative or batch mode
    • Maintains conversation context for diversity
  2. Image Synthesis (FLUX Diffusion)

    • State-of-the-art text-to-image generation
    • Configurable resolution and quality
    • Memory-optimized for batch processing
  3. Auto-Annotation (Grounding-DINO + SAM2)

    • Open-set object detection (any class via text)
    • Precise segmentation masks
    • Global class consistency across dataset

Reference Object Pipeline

  1. White Background Generation (Qwen Edit)

    • Clean object isolation
    • Optimized for multi-view generation
  2. Multi-Perspective Generation (Qwen Multicamera / Zero123++)

    • Qwen Multicamera (default): generates configurable view/elevation/distance combinations using a LoRA adapter
    • Zero123++ (alternative): 6 viewing angles with 3D-aware transformations
    • Consistent object appearance across perspectives
  3. Context Generation (LLM + FLUX)

    • Places object in diverse environments
    • Multiple iterations per perspective
    • Configurable scene complexity
  4. Reference-Based Annotation (PerSAM + SAM2)

    • Segments object across all generated images
    • No class labels needed
    • High precision with reference guidance
  5. Augmentation (Generative + CV)

    • Generative: Lighting, occlusions, weather
    • CV: Motion blur, compression, B/W, contrast

πŸ“ Citation

If you use OpenFabrik in your research, please cite:

@misc{openfabrik2025,
    author = {Rodriguez-Ramos, Alejandro and Campoy, Pascual},
    title = {OpenFabrik: Bootstrap Your Computer Vision Models Without Data or Annotations},
    howpublished = "\url{https://github.com/cvar-vision-dl/OpenFabrik}",
    doi = {10.5281/zenodo.18669083},
    year = {2025}
}

🀝 Community

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

Related Work

OpenFabrik builds on these excellent projects:

Acknowledgments

Special thanks to:

  • The open-source computer vision community
  • All contributors and users providing feedback
  • Thanks to @GPatiA2 for recommending Qwen Multicamera, which upgraded the quality of one of the pipielines

πŸ“„ License

OpenFabrik is released under the MIT License. See LICENSE for details.


🌟 Star History

If OpenFabrik helps your project, consider giving it a star! ⭐


Author: Alejandro RodrΓ­guez-Ramos [alejandrorodriguezramos.me]

Built with ❀️ by Computer Vision and Aerial Robotics (CVAR) for the open-source community

Report Bug β€’ Request Feature β€’ Discussions

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors