Skip to content

[Model] Add DeepSeek-OCR-2 PaddlePaddle implementation with full SFT and LoRA support#4324

Open
forBlank wants to merge 9 commits intoPaddlePaddle:developfrom
forBlank:deepseek_ocr2
Open

[Model] Add DeepSeek-OCR-2 PaddlePaddle implementation with full SFT and LoRA support#4324
forBlank wants to merge 9 commits intoPaddlePaddle:developfrom
forBlank:deepseek_ocr2

Conversation

@forBlank
Copy link
Copy Markdown
Collaborator

Before submitting

  • Lint code. If there are lint issues, please format the code first.
# Install and register `pre-commit` in the project folder
pip install pre-commit && pre-commit install

# Process previous code files separately
pre-commit run --file XXXX.py
  • Add test cases into tests folder. If there are codecov issues, please add tests cases first.

PR types

New features

PR changes

Models

Description

This PR migrates DeepSeek-OCR-2 from PyTorch to PaddlePaddle, including the full model definition, data pipeline, training configuration, and unit tests.

Model Architecture

DeepSeek-OCR-2 is a multi-modal OCR model with a three-stage vision-language architecture:

  • SAM ViT-B (12L, 768d)
  • Conv Downsample (4×)
  • Qwen2 Decoder-as-Encoder (24L, 896d, hybrid causal/non-causal attention)
  • Linear Projector (896 → 1280)
  • DeepSeek V2 LLM (12L, 1280d, MoE: 64 experts, 6 active/token)

Changes

Model (paddleformers/transformers/deepseek_ocr2/)

  • configuration.py: DeepseekOCR2Config with SAM / Qwen2 encoder / LLM sub-configs
  • modeling.py: Full model implementation — DeepseekOCR2Model, DeepseekOCR2ForCausalLM, DeepseekOCR2ForConditionalGeneration; includes AOA weight mapping for checkpoint loading
  • conversation.py: Conversation format and prompt builder
  • __init__.py: Module exports

Registration (paddleformers/transformers/)

  • Register DeepseekOCR2Config / model classes to auto mapping and transformers/__init__.py
  • Add deepseek_vl_v2 as alias in SPECIAL_MODEL_TYPE_TO_MODULE_NAME

Data Pipeline

  • mm_plugin.py: DeepseekOCR2Plugin — dynamic crop (up to 6 tiles), global/local view preprocessing, image token expansion based on spatial crop ratio
  • template.py: Register deepseek_ocr2 conversation template
  • collate.py: mm_collate_fn_ds_ocr2 — handles images_crop / images_spatial_crop / images_seq_mask batching
  • workflow.py: Dispatch to mm_collate_fn_ds_ocr2 when model_type == "deepseek_ocr2"

LoRA (cli/utils/llm_utils.py)

  • Add LoRA target modules for all four sub-modules: SAM ViT, Qwen2 Encoder, DeepSeek V2 LLM, Projector

Training Configs (examples/best_practices/DeepSeek-OCR-2/)

  • deepseek_ocr2_full_8k_config.yaml: Full fine-tuning, 8K seq len, sharding stage1, bf16
  • deepseek_ocr2_lora_8k_config.yaml: LoRA fine-tuning, rank=8, alpha=32

Tests (tests/transformers/deepseek_ocr2/)

  • test_modeling.py: Unit tests covering model construction, forward pass, and generation with tiny config

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Apr 20, 2026

Thanks for your contribution!

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 56.26761% with 621 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@86ec329). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...ddleformers/transformers/deepseek_ocr2/modeling.py 63.28% 340 Missing ⚠️
paddleformers/datasets/template/mm_plugin.py 18.70% 113 Missing ⚠️
paddleformers/datasets/collate.py 1.40% 70 Missing ⚠️
...formers/transformers/deepseek_ocr2/conversation.py 45.31% 70 Missing ⚠️
paddleformers/transformers/deepseek_v3/modeling.py 82.41% 16 Missing ⚠️
...ormers/transformers/deepseek_ocr2/configuration.py 86.00% 7 Missing ⚠️
paddleformers/cli/train/sft/workflow.py 25.00% 3 Missing ⚠️
paddleformers/cli/utils/llm_utils.py 0.00% 2 Missing ⚠️

❌ Your patch status has failed because the patch coverage (56.26%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #4324   +/-   ##
==========================================
  Coverage           ?   39.45%           
==========================================
  Files              ?      478           
  Lines              ?    90740           
  Branches           ?        0           
==========================================
  Hits               ?    35803           
  Misses             ?    54937           
  Partials           ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

per_device_eval_batch_size: 8
per_device_train_batch_size: 8
num_train_epochs: 2
max_steps: -1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

新增一个文档简单介绍一下,并提供数据集./ocr_vl_sft-train_Bengali.jsonl下载方式

"sam_model.*mlp.lin1.*",
"sam_model.*mlp.lin2.*",
# Qwen2 Encoder-as-Decoder
"qwen2_model.*self_attn.qkv_proj.*",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lora的时候要训练VIT吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants