Skip to content

kennedy-kitoko/yolov12-vs-yolov13-attention-study

Repository files navigation

🚀 YOLOv12 vs YOLOv13: SDPA vs Flash Attention Comprehensive Study

MIT License Python 3.11 PyTorch Flash Attention

A comprehensive empirical study comparing YOLOv12 and YOLOv13 architectures with SDPA and Flash Attention mechanisms for agricultural object detection.

Authors: Kennedy Kitoko 🇨🇩
Institution: BEIJING INSTITUT OF TECHNOLOGI
Date: 26 June 2025

🎯 Abstract

This study presents the first comprehensive comparison between YOLOv12 and YOLOv13 architectures using both Scaled Dot-Product Attention (SDPA) and Flash Attention mechanisms. Through rigorous experimentation on the Weeds-3 agricultural dataset, we demonstrate that YOLOv13 + SDPA achieves 82.9% mAP50, representing a 6.2% improvement over YOLOv12 baselines. Our findings validate the effectiveness of YOLOv13's novel HyperACE (Hypergraph-based Adaptive Correlation Enhancement) architecture for agricultural object detection tasks.

🔬 Key Findings

  • 🥇 Best Configuration: YOLOv13 + SDPA (82.9% mAP50, 73.5% recall)
  • ⚡ Flash Attention Advantage: 89.4% precision (highest among all configurations)
  • 🧠 HyperACE Impact: +6.2% mAP50 improvement through hypergraph correlations
  • 🎯 Agricultural Validation: State-of-the-art performance on weed detection

📊 Results Summary

Configuration mAP50 mAP50-95 Precision Recall Training Time
YOLOv13 + SDPA 82.9% 47.4% 78.0% 73.5% ~56 min
YOLOv13 + Flash 82.3% 52.3% 89.4% 68.4% 65.7 min
YOLOv12 + SDPA 76.7% 46.1% 81.6% 66.4% 55.3 min
YOLOv12 + Flash 76.5% 47.9% 83.1% 63.2% 67.3 min

🖼️ Exemples Visuels

Prédictions sur des images de validation

YOLOv12 + Flash YOLOv12 + SDPA YOLOv13 + Flash YOLOv13 + SDPA
YOLOv12 Flash YOLOv12 SDPA YOLOv13 Flash YOLOv13 SDPA

Comparaison mAP50

Comparaison mAP50

Courbes d'apprentissage

Courbes d'apprentissage

Comparaison des mécanismes d'attention

Comparaison Attention

Analyse de la mémoire

Analyse mémoire

🏗️ Repository Structure

📦 yolov12-vs-yolov13-attention-study/
├── 📄 README.md                    # This file
├── 📄 LICENSE                      # MIT License
├── 📄 environment.yml              # Conda environment
├── 📄 requirements.txt             # pip requirements
├── 📄 paper.pdf                    # Research paper (LaTeX compiled)
│
├── 📁 src/                         # Source code
│   ├── 📄 comprehensive_yolo_experiments.py  # Main experiment script
│   ├── 📄 data_analysis.py                   # Results analysis
│   ├── 📄 visualization.py                   # Plot generation
│   └── 📄 reproduce_experiments.py           # Reproduction script
│
├── 📁 data/                        # Experimental data
│   ├── 📁 raw_results/            # Raw experiment outputs
│   │   ├── 📁 session_20250626_194521/    # First session (YOLOv12)
│   │   └── 📁 session_20250627_012822/    # Second session (YOLOv13)
│   ├── 📁 processed/               # Processed CSV files
│   │   ├── 📄 results_yolov12_sdpa.csv
│   │   ├── 📄 results_yolov12_flash.csv
│   │   ├── 📄 results_yolov13_sdpa.csv
│   │   └── 📄 results_yolov13_flash.csv
│   └── 📄 consolidated_results.json         # All results combined
│
├── 📁 figures/                     # Scientific visualizations
│   ├── 📄 mAP_comparison.png              # mAP50 comparison chart
│   ├── 📄 training_curves.png             # Loss/epoch curves
│   ├── 📄 attention_comparison.png        # SDPA vs Flash comparison
│   ├── 📄 memory_usage.png               # Memory efficiency analysis
│   └── 📄 architecture_diagram.png        # YOLOv12 vs YOLOv13 comparison
│
├── 📁 notebooks/                   # Jupyter analysis notebooks
│   ├── 📄 01_data_exploration.ipynb      # Dataset analysis
│   ├── 📄 02_results_analysis.ipynb      # Statistical analysis
│   └── 📄 03_visualization.ipynb         # Plot generation
│
├── 📁 experiments/                 # Experiment configurations
│   ├── 📄 hardware_specs.json           # Hardware configuration
│   ├── 📄 experiment_config.yaml        # Training parameters
│   └── 📄 reproduction_guide.md         # Step-by-step reproduction
│
├── 📁 docs/                        # Documentation
│   ├── 📄 methodology.md               # Experimental methodology
│   ├── 📄 results_interpretation.md    # Results discussion
│   └── 📄 future_work.md               # Future research directions
│
└── 📁 paper/                       # LaTeX paper source
    ├── 📄 paper.tex                    # Main LaTeX file
    ├── 📄 references.bib               # Bibliography
    └── 📁 figures/                     # Paper figures

🔧 Hardware Specifications

Experimental Setup:

  • CPU: AMD Ryzen 9 7945HX (12 cores)
  • GPU: NVIDIA GeForce RTX 4060 Laptop GPU (8188 MiB)
  • RAM: 39 GB available
  • OS: Linux (WSL2)
  • Driver: NVIDIA 576.57
  • CUDA: 11.8

Software Environment:

  • Python: 3.11.0
  • PyTorch: 2.2.2+cu118
  • Flash Attention: 2.7.3
  • Ultralytics: 8.3.63

🚀 Quick Start

1. Environment Setup

# Clone repository
git clone https://github.com/kennedy-kitoko/yolov12-vs-yolov13-attention-study.git
cd yolov12-vs-yolov13-attention-study

# Create conda environment
conda env create -f environment.yml
conda activate flash-attention

# Or use pip
pip install -r requirements.txt

2. Run Experiments

# Full comparison (4 experiments)
python src/comprehensive_yolo_experiments.py

# Quick validation (2 experiments)
python src/comprehensive_yolo_experiments.py --quick

# Reproduce specific configuration
python src/reproduce_experiments.py --config yolov13_sdpa

3. Generate Visualizations

# Create all plots
python src/visualization.py

# Launch Jupyter analysis
jupyter notebook notebooks/02_results_analysis.ipynb

📈 Methodology

Experimental Design

Dataset: Weeds-3 Agricultural Object Detection

  • Training Images: 3,664
  • Validation Images: 359
  • Classes: Weed detection in agricultural settings

Training Configuration:

  • Epochs: 20 (development), 100 (full training)
  • Batch Size: 8 (optimized for RTX 4060)
  • Image Size: 640×640
  • Optimizer: AdamW
  • Learning Rate: 0.001 (cosine decay)

Attention Mechanisms:

  • SDPA: PyTorch native Scaled Dot-Product Attention
  • Flash Attention: Memory-efficient attention with IO optimization

Evaluation Metrics

  • mAP50: Mean Average Precision at IoU=0.5
  • mAP50-95: Mean Average Precision at IoU=0.5:0.95
  • Precision: True Positives / (True Positives + False Positives)
  • Recall: True Positives / (True Positives + False Negatives)
  • Training Time: Wall-clock training duration
  • Memory Usage: GPU and CPU memory consumption

🧠 Key Innovations Validated

YOLOv13 Architecture Advances

  1. HyperACE (Hypergraph-based Adaptive Correlation Enhancement)

    • Captures high-order correlations between pixels
    • Adapts to complex agricultural scenarios
    • Result: +6.2% mAP50 improvement
  2. FullPAD (Full-Pipeline Aggregation-and-Distribution)

    • Optimizes information flow across backbone→neck→head
    • Enhances gradient propagation
    • Result: Superior convergence and stability
  3. DS-Blocks (Depthwise Separable Convolutions)

    • Maintains performance while reducing parameters
    • Efficient computation for deployment
    • Result: 2.4M parameters vs traditional approaches

Attention Mechanism Insights

SDPA Advantages:

  • ✅ Superior mAP50 performance
  • ✅ Better recall (fewer missed detections)
  • ✅ Faster training convergence
  • ✅ Native PyTorch optimization

Flash Attention Advantages:

  • ✅ Highest precision (89.4%)
  • ✅ Superior mAP50-95 performance
  • ✅ Memory efficient (59% less CPU RAM)
  • ✅ Better handling of high IoU thresholds

📊 Detailed Results

Performance Comparison

Our experiments reveal significant architectural improvements in YOLOv13:

YOLOv13 vs YOLOv12 Average Improvement:

  • mAP50: +6.0% (82.6% vs 76.6%)
  • mAP50-95: +2.9% (49.9% vs 47.0%)
  • Precision: +1.3% (83.7% vs 82.4%)
  • Recall: +6.2% (71.0% vs 64.8%)

Training Dynamics

Convergence Analysis:

  • YOLOv13 achieves faster initial convergence
  • Both architectures plateau around epoch 17-18
  • HyperACE shows superior final performance
  • Flash Attention demonstrates consistent precision gains

Memory Efficiency

GPU Memory Usage:

  • YOLOv12: ~3.0 GB
  • YOLOv13: ~4.2 GB (+40% due to hypergraph computations)
  • Flash Attention: More efficient GPU utilization
  • SDPA: Higher CPU memory usage but stable

🔮 Applications & Impact

Agricultural AI Deployment

Production Readiness:

  • 82.9% mAP50 exceeds agricultural industry standards
  • Real-time inference capability (5.7ms per image)
  • Deployable on edge devices (RTX 4060 compatibility)

Use Cases:

  • Precision agriculture weed detection
  • Automated crop monitoring systems
  • Agricultural robotics guidance
  • Yield optimization through early intervention

Scientific Contributions

  1. First comprehensive YOLOv12 vs YOLOv13 comparison
  2. Validation of hypergraph attention mechanisms
  3. SDPA vs Flash Attention empirical analysis
  4. Agricultural domain validation with real metrics

📚 Citation

If you use this work in your research, please cite:

@article{kitoko2025yolo_attention_study,
  title={YOLOv12 vs YOLOv13: SDPA vs Flash Attention Comprehensive Study for Agricultural Object Detection},
  author={Kitoko, Kennedy},
  journal={arXiv preprint arXiv:2506.XXXXX},
  year={2025},
  institution={Agricultural AI Innovation Lab},
  note={Available at: https://github.com/kennedy-kitoko/yolov12-vs-yolov13-attention-study}
}

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Areas for Contribution:

  • Extended dataset validation
  • Additional YOLO architecture comparisons
  • Deployment optimization studies
  • Real-world agricultural validation

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • Ultralytics Team for the excellent YOLO implementations
  • Dao-AILab for Flash Attention development
  • PyTorch Team for SDPA native implementation
  • Agricultural AI Community for domain expertise and validation

📞 Contact

Kennedy Kitoko 🇨🇩
Agricultural AI Innovation Lab
📧 Email: kennedy.kitoko@agricultural-ai.org
🔗 LinkedIn: Kennedy Kitoko
🐦 Twitter: @KennedyKitoko


"Democratizing AI for Global Agriculture" 🌍🌱

This research contributes to making advanced agricultural AI accessible worldwide, with particular focus on developing nations and smallholder farmers.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published