[Model] Add DeepSeek-OCR-2 PaddlePaddle implementation with full SFT and LoRA support by forBlank · Pull Request #4324 · PaddlePaddle/PaddleFormers

forBlank · 2026-04-20T09:18:57Z

Before submitting

Lint code. If there are lint issues, please format the code first.

# Install and register `pre-commit` in the project folder
pip install pre-commit && pre-commit install

# Process previous code files separately
pre-commit run --file XXXX.py

Add test cases into tests folder. If there are codecov issues, please add tests cases first.

PR types

New features

PR changes

Models

Description

This PR migrates DeepSeek-OCR-2 from PyTorch to PaddlePaddle, including the full model definition, data pipeline, training configuration, and unit tests.

Model Architecture

DeepSeek-OCR-2 is a multi-modal OCR model with a three-stage vision-language architecture:

SAM ViT-B (12L, 768d)
Conv Downsample (4×)
Qwen2 Decoder-as-Encoder (24L, 896d, hybrid causal/non-causal attention)
Linear Projector (896 → 1280)
DeepSeek V2 LLM (12L, 1280d, MoE: 64 experts, 6 active/token)

Changes

Model (paddleformers/transformers/deepseek_ocr2/)

configuration.py: DeepseekOCR2Config with SAM / Qwen2 encoder / LLM sub-configs
modeling.py: Full model implementation — DeepseekOCR2Model, DeepseekOCR2ForCausalLM, DeepseekOCR2ForConditionalGeneration; includes AOA weight mapping for checkpoint loading
conversation.py: Conversation format and prompt builder
__init__.py: Module exports

Registration (paddleformers/transformers/)

Register DeepseekOCR2Config / model classes to auto mapping and transformers/__init__.py
Add deepseek_vl_v2 as alias in SPECIAL_MODEL_TYPE_TO_MODULE_NAME

Data Pipeline

mm_plugin.py: DeepseekOCR2Plugin — dynamic crop (up to 6 tiles), global/local view preprocessing, image token expansion based on spatial crop ratio
template.py: Register deepseek_ocr2 conversation template
collate.py: mm_collate_fn_ds_ocr2 — handles images_crop / images_spatial_crop / images_seq_mask batching
workflow.py: Dispatch to mm_collate_fn_ds_ocr2 when model_type == "deepseek_ocr2"

LoRA (cli/utils/llm_utils.py)

Add LoRA target modules for all four sub-modules: SAM ViT, Qwen2 Encoder, DeepSeek V2 LLM, Projector

Training Configs (examples/best_practices/DeepSeek-OCR-2/)

deepseek_ocr2_full_8k_config.yaml: Full fine-tuning, 8K seq len, sharding stage1, bf16
deepseek_ocr2_lora_8k_config.yaml: LoRA fine-tuning, rank=8, alpha=32

Tests (tests/transformers/deepseek_ocr2/)

test_modeling.py: Unit tests covering model construction, forward pass, and generation with tiny config

…mapping

…e and template

…pSeek-OCR-2

paddle-bot · 2026-04-20T09:19:04Z

Thanks for your contribution!

codecov-commenter · 2026-04-20T10:30:08Z

Codecov Report

❌ Patch coverage is 56.26761% with 621 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@86ec329). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
...ddleformers/transformers/deepseek_ocr2/modeling.py	63.28%	340 Missing ⚠️
paddleformers/datasets/template/mm_plugin.py	18.70%	113 Missing ⚠️
paddleformers/datasets/collate.py	1.40%	70 Missing ⚠️
...formers/transformers/deepseek_ocr2/conversation.py	45.31%	70 Missing ⚠️
paddleformers/transformers/deepseek_v3/modeling.py	82.41%	16 Missing ⚠️
...ormers/transformers/deepseek_ocr2/configuration.py	86.00%	7 Missing ⚠️
paddleformers/cli/train/sft/workflow.py	25.00%	3 Missing ⚠️
paddleformers/cli/utils/llm_utils.py	0.00%	2 Missing ⚠️

❌ Your patch status has failed because the patch coverage (56.26%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #4324   +/-   ##
==========================================
  Coverage           ?   39.45%           
==========================================
  Files              ?      478           
  Lines              ?    90740           
  Branches           ?        0           
==========================================
  Hits               ?    35803           
  Misses             ?    54937           
  Partials           ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lugimzzz · 2026-04-21T09:16:48Z

+per_device_eval_batch_size: 8
+per_device_train_batch_size: 8
+num_train_epochs: 2
+max_steps: -1


新增一个文档简单介绍一下，并提供数据集./ocr_vl_sft-train_Bengali.jsonl下载方式

lugimzzz · 2026-04-21T09:17:19Z

+            "sam_model.*mlp.lin1.*",
+            "sam_model.*mlp.lin2.*",
+            # Qwen2 Encoder-as-Decoder
+            "qwen2_model.*self_attn.qkv_proj.*",


lora的时候要训练VIT吗

forBlank added 8 commits April 20, 2026 14:58

[model(deepseek_v3)]: add GQA mode support alongside MLA

1ebe284

[model(deepseek_ocr2)]: add DeepSeek-OCR-2 model definition

3af2e1e

[model(deepseek_ocr2)]: register DeepSeek-OCR-2 to transformers auto …

f8f5b5b

…mapping

[model(deepseek_ocr2)]: add DeepSeek-OCR-2 data preprocessing, collat…

543a5fa

…e and template

[model(deepseek_ocr2)]: add LoRA target modules for DeepSeek-OCR-2

2f58654

[model(deepseek_ocr2)]: add full and LoRA SFT training config for Dee…

1bf9233

…pSeek-OCR-2

[model(deepseek_ocr2)]: add unit tests for DeepSeek-OCR-2 model

7d37a4b

Merge remote-tracking branch 'upstream/develop' into deepseek_ocr2

63b5058

[model(deepseek_v3)] compatible support gready topk_method

1e1db87

lugimzzz reviewed Apr 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model] Add DeepSeek-OCR-2 PaddlePaddle implementation with full SFT and LoRA support#4324

[Model] Add DeepSeek-OCR-2 PaddlePaddle implementation with full SFT and LoRA support#4324
forBlank wants to merge 9 commits intoPaddlePaddle:developfrom
forBlank:deepseek_ocr2

forBlank commented Apr 20, 2026

Uh oh!

paddle-bot Bot commented Apr 20, 2026

Uh oh!

codecov-commenter commented Apr 20, 2026

Uh oh!

lugimzzz Apr 21, 2026

Uh oh!

lugimzzz Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

forBlank commented Apr 20, 2026

Before submitting

PR types

PR changes

Description

Uh oh!

paddle-bot Bot commented Apr 20, 2026

Uh oh!

codecov-commenter commented Apr 20, 2026

Codecov Report

Uh oh!

lugimzzz Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

lugimzzz Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants