Skip to content

Commit f526033

Browse files
authored
feat: paddleocr-vl model and example (huggingface#3273)
1 parent 42a4edc commit f526033

13 files changed

Lines changed: 5254 additions & 0 deletions

File tree

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
# PaddleOCR-VL
2+
3+
[PaddleOCR-VL](https://huggingface.co/PaddlePaddle/PaddleOCR-VL) is a state-of-the-art
4+
vision-language model for document parsing, developed by PaddlePaddle. With only 0.9B
5+
parameters, it achieves competitive performance against much larger models (72B+) while
6+
maintaining fast inference speeds.
7+
8+
## Features
9+
10+
- **Multilingual**: Supports 109 languages including Chinese, English, Japanese, Korean, Arabic, and more
11+
- **Multi-element Recognition**: Handles text, tables, formulas, and charts
12+
- **Dynamic Resolution**: NaViT-style encoder processes images at variable resolutions without distortion
13+
- **Multi-Image Processing**: Process multiple images (e.g., multi-page documents) in a single prompt
14+
- **Video Support**: Extract and process video frames with temporal position encoding
15+
- **Efficient**: Compact 0.9B parameters with grouped query attention (GQA)
16+
- **Position Embedding Caching**: LFU cache for interpolated position embeddings improves performance
17+
18+
## Command Line Options
19+
20+
| Option | Description | Default |
21+
|--------|-------------|---------|
22+
| `--image` | Path to document image (can be specified multiple times) | (required\*) |
23+
| `--video` | Path to video file | (required\*) |
24+
| `--fps` | Frames per second to extract from video | `1.0` |
25+
| `--max-frames` | Maximum frames to extract from video | `16` |
26+
| `--task` | Task type: `ocr`, `table`, `formula`, `chart` | `ocr` |
27+
| `--model-id` | HuggingFace model ID | `PaddlePaddle/PaddleOCR-VL` |
28+
| `--revision` | Model revision | `main` |
29+
| `--max-length` | Maximum generation length | `1024` |
30+
| `--cpu` | Run on CPU | `false` |
31+
| `--bf16` | Use bfloat16 precision | `false` |
32+
| `--seed` | Random seed | `299792458` |
33+
34+
\* Either `--image` or `--video` is required (mutually exclusive).
35+
36+
## Examples
37+
38+
### Basic Recognition
39+
40+
```bash
41+
cargo run --example paddleocr-vl --release -- \
42+
--image candle-examples/examples/paddleocr-vl/test_ocr.png \
43+
--task ocr
44+
```
45+
46+
### Table Recognition
47+
48+
```bash
49+
cargo run --example paddleocr-vl --release -- \
50+
--image candle-examples/examples/paddleocr-vl/test_table.png \
51+
--task table
52+
```
53+
54+
### Formula Recognition
55+
56+
```bash
57+
cargo run --example paddleocr-vl --release -- \
58+
--image candle-examples/examples/paddleocr-vl/test_formula.png \
59+
--task formula
60+
```
61+
62+
### Chart Recognition
63+
64+
```bash
65+
cargo run --example paddleocr-vl --release -- \
66+
--image candle-examples/examples/paddleocr-vl/test_chart.png \
67+
--task chart
68+
```
69+
70+
### Multi-Image (combined output)
71+
72+
Multi-Image OCR works with any task and uses `--task ocr` by default.
73+
74+
```bash
75+
# Process multiple images with combined output
76+
cargo run --example paddleocr-vl --release -- \
77+
--image candle-examples/examples/paddleocr-vl/test_ocr.png \
78+
--image candle-examples/examples/paddleocr-vl/test_ocr_page2.png
79+
```
80+
81+
### Mutli-Image (batch)
82+
83+
```bash
84+
# Process chosen images sequentially with distinct output
85+
cargo run --example paddleocr-vl --release -- \
86+
--batch candle-examples/examples/paddleocr-vl/test_ocr.png candle-examples/examples/paddleocr-vl/test_ocr_page2.png
87+
88+
# With shell glob expansion
89+
cargo run --example paddleocr-vl --release -- \
90+
--batch candle-examples/examples/paddleocr-vl/test_ocr*.png
91+
```
92+
93+
### Video OCR
94+
95+
```bash
96+
cargo run --example paddleocr-vl --release -- \
97+
--video candle-examples/examples/paddleocr-vl/test_video.mp4 \
98+
--task video \
99+
--fps 0.6 \
100+
--max-frames 64 \
101+
--max-length 2048
102+
```

0 commit comments

Comments
 (0)