|
| 1 | +# PaddleOCR-VL |
| 2 | + |
| 3 | +[PaddleOCR-VL](https://huggingface.co/PaddlePaddle/PaddleOCR-VL) is a state-of-the-art |
| 4 | +vision-language model for document parsing, developed by PaddlePaddle. With only 0.9B |
| 5 | +parameters, it achieves competitive performance against much larger models (72B+) while |
| 6 | +maintaining fast inference speeds. |
| 7 | + |
| 8 | +## Features |
| 9 | + |
| 10 | +- **Multilingual**: Supports 109 languages including Chinese, English, Japanese, Korean, Arabic, and more |
| 11 | +- **Multi-element Recognition**: Handles text, tables, formulas, and charts |
| 12 | +- **Dynamic Resolution**: NaViT-style encoder processes images at variable resolutions without distortion |
| 13 | +- **Multi-Image Processing**: Process multiple images (e.g., multi-page documents) in a single prompt |
| 14 | +- **Video Support**: Extract and process video frames with temporal position encoding |
| 15 | +- **Efficient**: Compact 0.9B parameters with grouped query attention (GQA) |
| 16 | +- **Position Embedding Caching**: LFU cache for interpolated position embeddings improves performance |
| 17 | + |
| 18 | +## Command Line Options |
| 19 | + |
| 20 | +| Option | Description | Default | |
| 21 | +|--------|-------------|---------| |
| 22 | +| `--image` | Path to document image (can be specified multiple times) | (required\*) | |
| 23 | +| `--video` | Path to video file | (required\*) | |
| 24 | +| `--fps` | Frames per second to extract from video | `1.0` | |
| 25 | +| `--max-frames` | Maximum frames to extract from video | `16` | |
| 26 | +| `--task` | Task type: `ocr`, `table`, `formula`, `chart` | `ocr` | |
| 27 | +| `--model-id` | HuggingFace model ID | `PaddlePaddle/PaddleOCR-VL` | |
| 28 | +| `--revision` | Model revision | `main` | |
| 29 | +| `--max-length` | Maximum generation length | `1024` | |
| 30 | +| `--cpu` | Run on CPU | `false` | |
| 31 | +| `--bf16` | Use bfloat16 precision | `false` | |
| 32 | +| `--seed` | Random seed | `299792458` | |
| 33 | + |
| 34 | +\* Either `--image` or `--video` is required (mutually exclusive). |
| 35 | + |
| 36 | +## Examples |
| 37 | + |
| 38 | +### Basic Recognition |
| 39 | + |
| 40 | +```bash |
| 41 | +cargo run --example paddleocr-vl --release -- \ |
| 42 | + --image candle-examples/examples/paddleocr-vl/test_ocr.png \ |
| 43 | + --task ocr |
| 44 | +``` |
| 45 | + |
| 46 | +### Table Recognition |
| 47 | + |
| 48 | +```bash |
| 49 | +cargo run --example paddleocr-vl --release -- \ |
| 50 | + --image candle-examples/examples/paddleocr-vl/test_table.png \ |
| 51 | + --task table |
| 52 | +``` |
| 53 | + |
| 54 | +### Formula Recognition |
| 55 | + |
| 56 | +```bash |
| 57 | +cargo run --example paddleocr-vl --release -- \ |
| 58 | + --image candle-examples/examples/paddleocr-vl/test_formula.png \ |
| 59 | + --task formula |
| 60 | +``` |
| 61 | + |
| 62 | +### Chart Recognition |
| 63 | + |
| 64 | +```bash |
| 65 | +cargo run --example paddleocr-vl --release -- \ |
| 66 | + --image candle-examples/examples/paddleocr-vl/test_chart.png \ |
| 67 | + --task chart |
| 68 | +``` |
| 69 | + |
| 70 | +### Multi-Image (combined output) |
| 71 | + |
| 72 | +Multi-Image OCR works with any task and uses `--task ocr` by default. |
| 73 | + |
| 74 | +```bash |
| 75 | +# Process multiple images with combined output |
| 76 | +cargo run --example paddleocr-vl --release -- \ |
| 77 | + --image candle-examples/examples/paddleocr-vl/test_ocr.png \ |
| 78 | + --image candle-examples/examples/paddleocr-vl/test_ocr_page2.png |
| 79 | +``` |
| 80 | + |
| 81 | +### Mutli-Image (batch) |
| 82 | + |
| 83 | +```bash |
| 84 | +# Process chosen images sequentially with distinct output |
| 85 | +cargo run --example paddleocr-vl --release -- \ |
| 86 | + --batch candle-examples/examples/paddleocr-vl/test_ocr.png candle-examples/examples/paddleocr-vl/test_ocr_page2.png |
| 87 | + |
| 88 | +# With shell glob expansion |
| 89 | +cargo run --example paddleocr-vl --release -- \ |
| 90 | + --batch candle-examples/examples/paddleocr-vl/test_ocr*.png |
| 91 | +``` |
| 92 | + |
| 93 | +### Video OCR |
| 94 | + |
| 95 | +```bash |
| 96 | +cargo run --example paddleocr-vl --release -- \ |
| 97 | + --video candle-examples/examples/paddleocr-vl/test_video.mp4 \ |
| 98 | + --task video \ |
| 99 | + --fps 0.6 \ |
| 100 | + --max-frames 64 \ |
| 101 | + --max-length 2048 |
| 102 | +``` |
0 commit comments