Skip to content

Commit 8d50d69

Browse files
Merge pull request #25 from StabRise/yolo_blog_post
Added blog post
2 parents e05d617 + 7325b61 commit 8d50d69

3 files changed

Lines changed: 183 additions & 0 deletions

File tree

Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
---
2+
title: 'Benchmarking YOLO Models on Spark Using ScaleDP'
3+
date: '2025-11-19'
4+
tags: ['spark', 'object detection', 'benchmarking', 'ScaleDP', 'GPU']
5+
draft: false
6+
project: 'scaledp'
7+
authors: ['nmelnik']
8+
displayImage: /static/images/blog/scaledp/yolo/yolo-scaledp-benchmarking.png
9+
summary: 'Performance benchmarking of YOLO inference on Spark using ScaleDP with CPU and GPU acceleration.'
10+
keywords: ['ScaleDP', 'YOLO', 'Benchmarking', 'Performance']
11+
---
12+
13+
When processing large-scale document datasets with object detection, understanding performance characteristics is critical for production deployments. In my previous post, I demonstrated how to run YOLO models on Apache Spark using ScaleDP. Now, I want to share comprehensive benchmarking results that show how ScaleDP's YoloOnnxDetector performs with different configurations.
14+
15+
---
16+
17+
## Introduction
18+
19+
Performance optimization is key when processing millions of documents. The choice between CPU and GPU, partition size, and reader configuration all impact throughput. In this post, I'll share detailed benchmarks from running YOLO11 Nano model on a test dataset of 1,000 PDF pages using both CPU and GPU acceleration.
20+
21+
## Test Environment
22+
23+
My test setup included:
24+
- **CPU:** 13th Gen Intel(R) Core(TM) i9-13980HX (32 vCore)
25+
- **GPU:** NVIDIA GeForce RTX 4090 Laptop
26+
- **Model:** YOLO11 Nano (10.2 MB ONNX format)
27+
- **Dataset:** 1,000 PDF pages from document samples
28+
- **Framework:** Apache Spark with ScaleDP
29+
30+
## Benchmark Methodology
31+
32+
I tested three key scenarios:
33+
1. **End-to-End Pipeline:** PDF reading + image rendering + object detection
34+
2. **Cached Images:** With images pre-cached to isolate detection performance
35+
3. **Detection Only:** Pure YOLO inference performance
36+
37+
I varied the following parameters:
38+
- **Pages per Partition:** 20, 50, and 100 pages
39+
- **Execution Device:** CPU and GPU (CUDA)
40+
- **PDF Readers:** PdfBox and Ghostscript
41+
42+
## Results
43+
44+
Here are the detailed benchmark results for processing 1,000 pages:
45+
46+
| Pages in Partition | Device | Reader | Time (seconds) | Per Page (ms) | Notes |
47+
|--------------------|--------|---------|----------------|---------------|---------------|
48+
| 100 | CPU | PdfBox | 93 | 93 | |
49+
| 50 | CPU | PdfBox | 77 | 77 | |
50+
| 20 | CPU | GS | 56 | 56 | Detection only|
51+
| 20 | GPU | GS | 27.2 | 27.2 | |
52+
| 20 | GPU | GS | 14.7 | 14.7 | Detection only|
53+
54+
## Key Findings
55+
56+
### 1. Partition Size Impact
57+
58+
Smaller partition sizes (20 pages) perform better than larger ones (100 pages). This suggests that optimal parallelism is achieved with finer-grained partitions:
59+
- **100 pages:** 93ms per page
60+
- **50 pages:** 77ms per page
61+
- **20 pages:** 56ms per page
62+
63+
### 2. PDF Reader Performance
64+
65+
The Ghostscript (GS) reader outperforms PdfBox:
66+
- **PdfBox (50 pages):** 77ms per page
67+
- **Ghostscript (20 pages):** 56ms per page
68+
69+
This is a **27% improvement** just by switching readers.
70+
71+
### 3. GPU Acceleration
72+
73+
GPU acceleration provides significant speedup over CPU:
74+
- **CPU (20 pages):** 56ms per page
75+
- **GPU (20 pages, full):** 27.2ms per page
76+
- **GPU (20 pages, detection only):** 14.7ms per page
77+
78+
This represents a **51.6% improvement** with GPU for the full pipeline, and **73.8% improvement** for detection only.
79+
80+
### 4. Image Caching
81+
82+
Caching images in memory between pipeline stages eliminates PDF reading overhead:
83+
- **With PDF reading:** 56s for 1,000 pages (56ms per page)
84+
- **With cached images (CPU):** 69s for 1,000 pages (69ms per page)
85+
- **With cached images (GPU):** 14.7s for 1,000 pages (14.7ms per page)
86+
87+
The GPU benefit becomes even more apparent with cached images, achieving **~14.7ms per page** for pure detection.
88+
89+
## Throughput Analysis
90+
91+
Based on these benchmarks, here's what you can expect:
92+
93+
| Scenario | Pages/Hour | Pages/Day |
94+
|-------------------------------|-----------|-----------|
95+
| CPU with PdfBox (100 pages) | 38,710 | 929,000 |
96+
| CPU with GS (20 pages) | 64,285 | 1,542,857 |
97+
| GPU with GS (20 pages, full) | 132,352 | 3,176,470 |
98+
| GPU cached (detection only) | 244,216 | 5,861,184 |
99+
100+
## Recommendations for Production Deployments
101+
102+
Based on these findings, I recommend:
103+
104+
1. **Use GPU when available:** GPU acceleration provides 2-5x throughput improvement, making it highly cost-effective for large-scale processing.
105+
106+
2. **Optimize partition size:** Use smaller partitions (20 pages) to achieve better parallelism and throughput.
107+
108+
3. **Choose appropriate PDF reader:** For document quality and performance, prefer Ghostscript over PdfBox when rendering PDFs.
109+
110+
4. **Consider image caching:** For pipelines with multiple stages, caching images can eliminate redundant PDF reading.
111+
112+
5. **Scale horizontally:** With these per-node throughputs, distribute processing across multiple nodes:
113+
- 10 GPU nodes: ~2.4M pages/hour
114+
- 50 GPU nodes: ~12.2M pages/hour
115+
116+
## Running the Benchmarks Yourself
117+
118+
I've included a complete benchmarking notebook in the ScaleDP tutorials:
119+
120+
```bash
121+
tutorials/object-detection/4.YoloOnnxDetectorBenchmarks.ipynb
122+
```
123+
124+
You can run this notebook in Google Colab or on your local Spark:
125+
126+
```python
127+
from scaledp import *
128+
129+
spark = ScaleDPSession(with_spark_pdf=True)
130+
131+
# Load PDF documents
132+
df = spark.read.format("pdf") \
133+
.option("pagePerPartition", "20") \
134+
.option("reader", "gs") \
135+
.load("samples_1k.pdf")
136+
137+
# Define detection pipeline
138+
detector = YoloOnnxDetector(
139+
keepInputData=False,
140+
partitionMap=True,
141+
numPartitions=0,
142+
model="yolo11n.onnx",
143+
device=Device.CUDA, # or Device.CPU
144+
scoreThreshold=0.6,
145+
labels=label_list
146+
)
147+
148+
# Run inference
149+
results = detector.transform(df)
150+
results.select("boxes").count()
151+
```
152+
153+
## Factors Affecting Performance
154+
155+
Several factors can influence your benchmarking results:
156+
157+
- **Hardware specs:** CPU cores, GPU compute capability, RAM bandwidth
158+
- **Model size:** YOLO11 Nano is very efficient; larger models will be slower
159+
- **Input resolution:** Default 640x640; adjust based on your use case
160+
- **Spark configuration:** Executor cores, memory, and driver settings
161+
162+
## Conclusion
163+
164+
ScaleDP's YoloOnnxDetector enables efficient, scalable object detection on Apache Spark. With GPU acceleration, you can process millions of document pages daily. These benchmarks demonstrate that thoughtful configuration choices — partition size, reader selection, and GPU utilization — can dramatically improve throughput.
165+
166+
For your specific use case, I recommend benchmarking with your actual hardware and document types to validate these results and find optimal configurations.
167+
168+
---
169+
170+
## References
171+
172+
- [ScaleDP Documentation](https://scaledp.stabrise.com/)
173+
- [ScaleDP Benchmarking Notebook](https://github.com/StabRise/ScaleDP-Tutorials/blob/master/object-detection/4.YoloOnnxDetectorBenchmarks.ipynb)
174+
- [Previous Post: Running YOLO Models on Spark Using ScaleDP](/blog/running-yolo-on-spark-with-scaledp)
175+
- [YoloOnnxDetector Documentation](https://scaledp.stabrise.com/en/latest/models/detectors/yolo_onnx_detector.html)
176+
- [Ultralytics YOLO](https://www.ultralytics.com/)

data/blog/running_yolo_on_spark_with_scaledp.mdx

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -300,6 +300,12 @@ results.show_image("image_with_boxes")
300300
For a complete, runnable example, see the [YOLO ONNX Detector tutorial notebook](https://github.com/StabRise/ScaleDP-Tutorials/blob/master/object-detection/1.YoloOnnxDetector.ipynb).
301301
You can run it directly in Google Colab for easy setup.
302302

303+
## Benchmarking
304+
305+
I conducted benchmarks to evaluate performance of `YoloOnnxDetector` on Spark with different configurations.
306+
You can find the full benchmarking notebook in the [ScaleDP Tutorials repository](https://github.com/StabRise/ScaleDP-Tutorials/blob/master/object-detection/4.YoloOnnxDetectorBenchmarks.ipynb)
307+
and post related it [Benchmarking YOLO Models on Spark Using ScaleDP](/blog/benchmarking_yolo_in_scaledp_on_spark/).
308+
303309
## Pretrained YOLO Models in ScaleDP
304310

305311
ScaleDP has built-in support for several pretrained YOLO models in ONNX format, including:
@@ -316,6 +322,7 @@ Running YOLO models on Spark with Scaledp enables scalable, distributed object d
316322

317323
- [ScaleDP Documentation](https://scaledp.stabrise.com/)
318324
- [ScaleDP Tutorials](https://github.com/StabRise/ScaleDP-Tutorials)
325+
- [Benchmarking YOLO Models on Spark Using ScaleDP](/blog/benchmarking_yolo_in_scaledp_on_spark/)
319326
- [ScaleDP GitHub Repository](https://github.com/StabRise/ScaleDP)
320327
- [Spark PDF Datasource](https://spark-pdf.stabrise.com/)
321328
- [Ultralytics YOLO](https://www.ultralytics.com/)
215 KB
Loading

0 commit comments

Comments
 (0)