Add support for PP-DocLayoutV2#1619
Conversation
|
Thanks for your contribution! |
|
Looks like even though converting DocLayoutV2 to onnx works, the changes break some tests with the |
There was a problem hiding this comment.
Modifying this file from the version prior to this PR (3e77ec7) was unnecessary, reverted it.
|
@zhangbo9674 Do you know why the Windows build is failing? There seems to be some missing DLL in the test environment: https://github.com/PaddlePaddle/Paddle2ONNX/actions/runs/20767660339/job/59637230828?pr=1619#step:10:224 |
paddle2onnx --model_dir ../PaddleOCR-VL/PP-DocLayoutV2/ --model_filename inference.pdmodel --params_filename inference.pdiparams --save_file pp-doclayoutv2.ooonnx
2026-01-07 03:14:53 [WARNING] The .pdmodel file is deprecated in paddlepaddle 3.0 and will be removed in the future. Try to convert from .pdmodel file to json file.
I0107 03:14:53.903254 2728575 program_interpreter.cc:257] New Executor is Running.
[Paddle2ONNX] Start parsing the Paddle model file...
[Paddle2ONNX] Use opset_version = 17 for ONNX export.
--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0 paddle2onnx::Export(char const*, char const*, char**, int*, int, bool, bool, bool, bool, bool, paddle2onnx::CustomOp*, int, char const*, char**, int*, char const*, bool*, bool, char**, int)
----------------------
Error Message Summary:
----------------------
FatalError: `Segmentation fault` is detected by the operating system.
[TimeInfo: *** Aborted at 1767755694 (unix time) try "date -d @1767755694" if you are using GNU date ***]
[SignalInfo: *** SIGSEGV (@0x0) received by PID 2728575 (TID 0x7655a5d0d140) from PID 0 ***]
Segmentation fault (core dumped)paddle2onnx --model_dir PP-DocLayoutV2_infer/ --model_filename inference.json --params_filename inference.pdiparams --save_file pp-doclayoutv2.onnx
[Paddle2ONNX] Start parsing the Paddle model file...
[Paddle2ONNX] Use opset_version = 17 for ONNX export.
[Paddle2ONNX] PaddlePaddle model is exported as ONNX format now.
2026-01-07 04:57:16 [INFO] Try to perform constant folding on the ONNX model with Polygraphy.
[W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
[I] Folding Constants | Pass 1
2026-01-07 04:57:16.696936887 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze.277
[W] colored module is not installed, will not use colors when logging. To enable colors, please install the colored module: python3 -m pip install colored
[W] It looks like this model contains foldable nodes that produce large outputs.
In order to avoid bloating the model, you may want to set a constant-folding size threshold.
Note: Large tensors and their corresponding sizes were: {'Mul.204': '1 MiB'}
[W] Falling back to `onnx.shape_inference` because `onnxruntime.tools.symbolic_shape_infer` either could not be loaded or did not run successfully.
Note that using ONNX-Runtime for shape inference may be faster and require less memory.
Consider installing ONNX-Runtime or setting POLYGRAPHY_AUTOINSTALL_DEPS=1 in your environment variables to allow Polygraphy to do so automatically.
[I] Total Nodes | Original: 8512, After Folding: 3767 | 4745 Nodes Folded
[I] Folding Constants | Pass 2
[I] Total Nodes | Original: 3767, After Folding: 3753 | 14 Nodes Folded
[I] Folding Constants | Pass 3
[I] Total Nodes | Original: 3753, After Folding: 3753 | 0 Nodes Folded
2026-01-07 04:57:31 [INFO] ONNX model saved in pp-doclayoutv2.onnx. |
|
@GreatV Hi can you share your onnx model? |
|
Why is there such a large discrepancy between the output and Paddle Inference? #!/usr/bin/env python3
from __future__ import annotations
import argparse
from pathlib import Path
import numpy as np
def _parse_int_list(csv: str) -> list[int]:
return [int(x.strip()) for x in csv.split(",") if x.strip()]
def _add_bool_arg(
parser: argparse.ArgumentParser, name: str, default: bool, help_text: str
) -> None:
dest = name.lstrip("-").replace("-", "_")
group = parser.add_mutually_exclusive_group(required=False)
group.add_argument(name, dest=dest, action="store_true", help=help_text)
group.add_argument(
f"--no_{dest}", dest=dest, action="store_false", help=f"Disable: {help_text}"
)
parser.set_defaults(**{dest: default})
def export_onnx(
model_dir: Path,
model_filename: str,
params_filename: str,
onnx_path: Path,
opset_version: int,
auto_update_opset: bool,
enable_onnx_checker: bool,
optimize_tool: str,
verbose: bool,
) -> None:
import paddle2onnx
model_file = model_dir / model_filename
params_file = model_dir / params_filename
if not model_file.exists():
raise FileNotFoundError(f"model file not found: {model_file}")
if not params_file.exists():
raise FileNotFoundError(f"params file not found: {params_file}")
onnx_path.parent.mkdir(parents=True, exist_ok=True)
paddle2onnx.export(
str(model_file),
str(params_file),
str(onnx_path),
opset_version=opset_version,
auto_upgrade_opset=auto_update_opset,
verbose=verbose,
enable_onnx_checker=enable_onnx_checker,
optimize_tool=optimize_tool,
)
def build_paddle_predictor(
model_dir: Path,
model_filename: str,
params_filename: str,
disable_mkldnn: bool,
disable_ir_optim: bool,
):
import paddle.inference as paddle_infer
model_file = model_dir / model_filename
params_file = model_dir / params_filename
config = paddle_infer.Config(str(model_file), str(params_file))
config.disable_gpu()
if disable_mkldnn:
config.disable_mkldnn()
if disable_ir_optim:
config.switch_ir_optim(False)
return paddle_infer.create_predictor(config)
def build_ort_session(onnx_path: Path, disable_ort_optim: bool):
import onnxruntime as ort
sess_options = ort.SessionOptions()
if disable_ort_optim:
sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
return ort.InferenceSession(
str(onnx_path),
sess_options=sess_options,
providers=["CPUExecutionProvider"],
)
def generate_inputs(
batch: int,
seed: int,
height: int,
width: int,
repeat_first: bool,
) -> dict[str, np.ndarray]:
rng = np.random.default_rng(seed)
if repeat_first:
base = rng.standard_normal((1, 3, height, width)).astype(np.float32)
image = np.repeat(base, batch, axis=0)
else:
image = rng.standard_normal((batch, 3, height, width)).astype(np.float32)
im_shape = np.tile(np.array([[float(height), float(width)]], dtype=np.float32), (batch, 1))
scale_factor = np.tile(np.array([[1.0, 1.0]], dtype=np.float32), (batch, 1))
return {"image": image, "im_shape": im_shape, "scale_factor": scale_factor}
def main() -> int:
parser = argparse.ArgumentParser(
description="Compare Paddle Inference vs ONNXRuntime for PP-DocLayoutV2."
)
parser.add_argument("--model_dir", type=Path, required=True)
parser.add_argument("--model_filename", type=str, default="inference.json")
parser.add_argument("--params_filename", type=str, default="inference.pdiparams")
parser.add_argument("--onnx_path", type=Path, required=True)
parser.add_argument("--export_onnx", action="store_true")
parser.add_argument("--opset_version", type=int, default=17)
_add_bool_arg(
parser,
"--auto_update_opset",
default=True,
help_text="Auto update ONNX opset",
)
_add_bool_arg(
parser,
"--enable_onnx_checker",
default=False,
help_text="Run ONNX checker",
)
parser.add_argument("--optimize_tool", type=str, default="None")
parser.add_argument("--verbose", action="store_true")
parser.add_argument("--batches", type=_parse_int_list, default=[1, 2, 4, 8])
parser.add_argument("--seed", type=int, default=20260107)
parser.add_argument("--height", type=int, default=800)
parser.add_argument("--width", type=int, default=800)
parser.add_argument("--repeat_first", action="store_true")
parser.add_argument("--atol", type=float, default=1e-4)
parser.add_argument("--rtol", type=float, default=1e-4)
parser.add_argument("--show_max_loc", action="store_true")
_add_bool_arg(
parser,
"--disable_mkldnn",
default=True,
help_text="Disable Paddle MKLDNN",
)
_add_bool_arg(
parser,
"--disable_ir_optim",
default=True,
help_text="Disable Paddle IR optim",
)
_add_bool_arg(
parser,
"--disable_ort_optim",
default=True,
help_text="Disable ORT graph optim",
)
args = parser.parse_args()
if args.export_onnx or not args.onnx_path.exists():
export_onnx(
model_dir=args.model_dir,
model_filename=args.model_filename,
params_filename=args.params_filename,
onnx_path=args.onnx_path,
opset_version=args.opset_version,
auto_update_opset=args.auto_update_opset,
enable_onnx_checker=args.enable_onnx_checker,
optimize_tool=args.optimize_tool,
verbose=args.verbose,
)
predictor = build_paddle_predictor(
model_dir=args.model_dir,
model_filename=args.model_filename,
params_filename=args.params_filename,
disable_mkldnn=args.disable_mkldnn,
disable_ir_optim=args.disable_ir_optim,
)
sess = build_ort_session(args.onnx_path, disable_ort_optim=args.disable_ort_optim)
ort_output_names = [o.name for o in sess.get_outputs()]
pd_output_names = predictor.get_output_names()
common_outputs = [name for name in ort_output_names if name in set(pd_output_names)]
if not common_outputs:
raise RuntimeError(
f"No common outputs between Paddle({pd_output_names}) and ORT({ort_output_names})"
)
print("Paddle inputs:", predictor.get_input_names())
print("Paddle outputs:", pd_output_names)
print("ORT inputs:", [i.name for i in sess.get_inputs()])
print("ORT outputs:", ort_output_names)
print("Compare outputs:", common_outputs)
print()
for batch in args.batches:
inputs = generate_inputs(
batch=batch,
seed=args.seed + batch,
height=args.height,
width=args.width,
repeat_first=args.repeat_first,
)
# Paddle
for name in predictor.get_input_names():
if name not in inputs:
raise RuntimeError(f"Missing input '{name}' in generated inputs: {list(inputs)}")
arr = inputs[name]
h = predictor.get_input_handle(name)
h.reshape(arr.shape)
h.copy_from_cpu(arr)
predictor.run()
pd_outputs = {name: predictor.get_output_handle(name).copy_to_cpu() for name in pd_output_names}
# ORT
ort_outputs_list = sess.run(None, inputs)
if len(ort_output_names) != len(ort_outputs_list):
raise RuntimeError(
f"ORT outputs mismatch: names={len(ort_output_names)} values={len(ort_outputs_list)}"
)
ort_outputs = dict(zip(ort_output_names, ort_outputs_list))
print(f"batch={batch} repeat_first={args.repeat_first} seed={args.seed + batch}")
for name in common_outputs:
pd = pd_outputs[name]
ort = ort_outputs[name]
same_shape = pd.shape == ort.shape
same_dtype = pd.dtype == ort.dtype
print(f" {name}: shape {pd.shape} vs {ort.shape} match={same_shape} dtype {pd.dtype} vs {ort.dtype} match={same_dtype}")
if not same_shape:
continue
if np.issubdtype(pd.dtype, np.floating) and np.issubdtype(ort.dtype, np.floating):
diff = pd - ort
absdiff = np.abs(diff)
max_abs = float(absdiff.max())
mean_abs = float(absdiff.mean())
allclose = bool(np.allclose(pd, ort, atol=args.atol, rtol=args.rtol))
print(f" max_abs={max_abs} mean_abs={mean_abs} allclose(atol={args.atol},rtol={args.rtol})={allclose}")
if pd.ndim == 2 and pd.shape[1] <= 64:
print(f" per_col_max={absdiff.max(axis=0)}")
if args.show_max_loc:
max_idx = np.unravel_index(np.argmax(absdiff), absdiff.shape)
print(f" max_loc={max_idx} pd={pd[max_idx]} ort={ort[max_idx]}")
else:
equal = bool(np.array_equal(pd, ort))
print(f" equal={equal}")
print()
return 0
if __name__ == "__main__":
raise SystemExit(main()) python debug/compare_paddle_ort_ppdoclayoutv2.py --model_dir PP-DocLayoutV2_infer --onnx_path pp-doclayoutv2.onnx
--- Running PIR pass [add_shadow_output_after_dead_parameter_pass]
--- Running PIR pass [delete_quant_dequant_linear_op_pass]
--- Running PIR pass [delete_weight_dequant_linear_op_pass]
--- Running PIR pass [transfer_layout_pass]
--- Running PIR pass [common_subexpression_elimination_pass]
I0107 06:40:56.880336 2811109 print_statistics.cc:50] --- detected [870] subgraphs!
--- Running PIR pass [constant_folding_pass]
I0107 06:40:56.881546 2811109 pir_interpreter.cc:1601] New Executor is Running ...
I0107 06:40:56.881672 2811109 pir_interpreter.cc:1625] pir interpreter is running by multi-thread mode ...
I0107 06:40:56.932209 2811109 print_statistics.cc:44] --- detected [165, 2907] subgraphs!
--- Running PIR pass [dead_code_elimination_pass]
I0107 06:40:56.933084 2811109 print_statistics.cc:50] --- detected [54] subgraphs!
--- Running PIR pass [replace_fetch_with_shadow_output_pass]
I0107 06:40:56.933688 2811109 print_statistics.cc:50] --- detected [2] subgraphs!
--- Running PIR pass [remove_shadow_feed_pass]
--- Running PIR pass [inplace_pass]
I0107 06:40:57.141657 2811109 print_statistics.cc:50] --- detected [676] subgraphs!
I0107 06:40:57.142059 2811109 analysis_predictor.cc:1217] ======= pir optimization completed =======
Paddle inputs: ['im_shape', 'image', 'scale_factor']
Paddle outputs: ['fetch_name_0', 'fetch_name_1']
ORT inputs: ['im_shape', 'image', 'scale_factor']
ORT outputs: ['fetch_name_0', 'fetch_name_1']
Compare outputs: ['fetch_name_0', 'fetch_name_1']
I0107 06:40:57.943626 2811109 pir_interpreter.cc:1622] pir interpreter is running by trace mode ...
batch=1 repeat_first=False seed=20260108
fetch_name_0: shape (300, 8) vs (300, 8) match=True dtype float32 vs float32 match=True
max_abs=291.0 mean_abs=39.715877532958984 allclose(atol=0.0001,rtol=0.0001)=False
per_col_max=[0.0000000e+00 1.1026859e-06 4.3411255e-03 1.2893677e-03 7.9345703e-04
1.6479492e-03 2.9100000e+02 2.9100000e+02]
fetch_name_1: shape (1,) vs (1,) match=True dtype int32 vs int32 match=True
equal=True
batch=2 repeat_first=False seed=20260109
fetch_name_0: shape (600, 8) vs (600, 8) match=True dtype float32 vs float32 match=True
max_abs=291.0 mean_abs=39.5771369934082 allclose(atol=0.0001,rtol=0.0001)=False
per_col_max=[0.0000000e+00 2.3014843e-05 1.4595032e-02 1.6174316e-03 7.9040527e-03
1.5258789e-03 2.9100000e+02 2.9100000e+02]
fetch_name_1: shape (2,) vs (2,) match=True dtype int32 vs int32 match=True
equal=True
batch=4 repeat_first=False seed=20260111
fetch_name_0: shape (1200, 8) vs (1200, 8) match=True dtype float32 vs float32 match=True
max_abs=291.0 mean_abs=39.563392639160156 allclose(atol=0.0001,rtol=0.0001)=False
per_col_max=[0.0000000e+00 6.0498714e-06 1.7265320e-02 2.3117065e-03 3.7841797e-03
5.0048828e-03 2.9100000e+02 2.9100000e+02]
fetch_name_1: shape (4,) vs (4,) match=True dtype int32 vs int32 match=True
equal=True
batch=8 repeat_first=False seed=20260115
fetch_name_0: shape (2400, 8) vs (2400, 8) match=True dtype float32 vs float32 match=True
max_abs=291.0 mean_abs=39.15203094482422 allclose(atol=0.0001,rtol=0.0001)=False
per_col_max=[0.0000000e+00 1.8328428e-06 1.6151428e-02 4.8522949e-03 8.9721680e-03
6.7138672e-03 2.9100000e+02 2.9100000e+02]
fetch_name_1: shape (8,) vs (8,) match=True dtype int32 vs int32 match=True
equal=True |
|
I'm also encountering this problem. The ONNX model outputs zero coordinates, while the classification results and scores seem accurate. |
|
Hi, @zhaohb You might want to try the version I exported myself, which differs slightly from the current PR implementation. pp-doclayoutv2.onnx |
Hi, what changes did you need to make, and what OS are you using? Also, here is the onnx model I exported using the changes in this PR: PP-DocLayoutV2.onnx |
|
Hi @alex-dinh I add tie-break logic for argsort |
Hi @alex-dinh |
Just to clarify: we're still seeing some discrepancies in the coordinate outputs between the ONNX and Paddle inference results. Have you noticed any differences in your tests? @alex-dinh |
Hi @zhaohb, I see some discrepancies but they are very minor. The PaddleX library also does some extra postprocessing (such as unclipping, merging) that I do not apply to the ONNX model output, which would explain the difference. However, the results for my local testing are very similar. I am only filtering out low-confidence boxes from the ONNX model. Here is the image I ran my test on: Edit: here is my doclayout test script: |
GreatV
left a comment
There was a problem hiding this comment.
We may need to incorporate additional unit tests for the index_put function, add the registration in paddle2onnx/mappers_registry.h.in, and update the copyright year in the newly appended files to 2026.
| @@ -0,0 +1,169 @@ | |||
| // Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved. | |||
There was a problem hiding this comment.
| // Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved. | |
| // Copyright (c) 2026 PaddlePaddle Authors. All Rights Reserved. |
| @@ -0,0 +1,46 @@ | |||
| // Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved. | |||
There was a problem hiding this comment.
| // Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved. | |
| // Copyright (c) 2026 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Pull request overview
This pull request adds support for converting PaddlePaddle's PP-DocLayoutV2 model to ONNX format. The implementation builds upon work from predict-woo but includes significant modifications and additional fixes needed for macOS compatibility.
Key changes:
- Updated PaddlePaddle dependency from development version to stable 3.0.0
- Added new
index_putoperation mapper to handle tensor indexing operations - Enhanced existing tensor operation mappers (stack, squeeze2, slice, set_value) to properly handle PIR mode and edge cases
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| pyproject.toml | Updated paddlepaddle to stable 3.0.0 release and bumped minimum onnx version to 1.16.1 |
| paddle2onnx/mapper/tensor/stack.cc | Added logic to normalize mixed-rank inputs (scalars and single-element tensors) before stacking |
| paddle2onnx/mapper/tensor/squeeze2.cc | Added optimization to skip squeeze operation when target dimensions are not squeezable |
| paddle2onnx/mapper/tensor/slice.cc | Added early return for PIR mode when decrease_axis is present to avoid shape comparison failures |
| paddle2onnx/mapper/tensor/set_value.cc | Added support for PIR mode set_value_with_tensor operations with empty axes and alternative input names |
| paddle2onnx/mapper/tensor/index_put.h | New header file defining the IndexPutMapper class for handling index_put operations |
| paddle2onnx/mapper/tensor/index_put.cc | New implementation supporting both boolean mask and integer indexing with optional accumulation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
paddle2onnx/mapper/tensor/stack.cc
Outdated
| continue; | ||
| } else if (x_info[i].Rank() == 1) { | ||
| // Check if it's exactly [1] not [4] or other sizes | ||
| if (x_info[i].shape.size() > 0 && x_info[i].shape[0] == 1) { |
There was a problem hiding this comment.
The logic for detecting single-element tensors has a potential issue. At line 47, the code checks x_info[i].shape.size() > 0, but for a rank-1 tensor, shape.size() is always equal to 1 (the rank), not necessarily > 0. This condition will always be true for rank-1 tensors. The intent seems correct but the check is redundant since we already know Rank() == 1 at this point.
| if (x_info[i].shape.size() > 0 && x_info[i].shape[0] == 1) { | |
| if (x_info[i].shape[0] == 1) { |
PP-DocLayoutV2.onnx will encounter an error when loaded with openvino, while pp-doclayoutv2.onnx works normally Edit: here is my doclayout-openvino test script: |
Hi @hzkitty , |
Hi @hzkitty, this revealed an issue with macOS and paddle2onnx. I tried exporting the onnx model on my linux machine and your script runs successfully. Try this model: PP-DocLayoutV2.onnx (Exported in a linux environment) Output of |
| "cmake>=3.16", | ||
| "setuptools-scm", | ||
| "paddlepaddle==3.0.0", | ||
| "paddlepaddle==3.0.0.dev20250426", |
There was a problem hiding this comment.
Not sure why, but the Windows build requires "paddlepaddle==3.0.0.dev20250426" in the PR check workflow. For local development on macOS and linux, changing to 3.0.0 or 3.1.0 are fine for pip install -e . to run successfully.
| %PY_CMD% -m pip install tqdm filelock | ||
| %PY_CMD% -m pip install onnx==1.16.0 onnxruntime==1.19.0 | ||
| %PY_CMD% -m pip install six hypothesis | ||
| %PY_CMD% -m pip install --pre paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu/ |
There was a problem hiding this comment.
My guess is that this used to work when paddlepaddle==3.0.0.dev20250426 was the newest dev version, but broke when newer versions of paddlepaddle were released.
Hi @GreatV, I made edits according to your suggestions. Please see the latest commits! |
|
I have created a new release which includes this PR. please give it a try: |


Credit to user predict-woo, but their code does not work out of the box. I needed to make more changes to get it working. Here are detailed instructions on how to setup and run the conversion. I am working on macOS, but if anyone else is able to test it out on Windows or Linux, I welcome your input.
Setup steps (macOS):
(Note: I tried using
pip install -e ., but this results in an incorrect working directory when running paddle2onnx from the terminal)My environment package versions:
Test DocLayoutV2 conversion:
Test ONNX model:
On macOS, the outputted model has a lot of unused weight initializers, so I also recommend optimizing the model via
onnxoptimizerto avoid warnings and reduce model size. This issue is not present on Linux for some reason (I tested on Zorin OS).On macOS, the paddle2onnx conversion produces these warnings, but the model is still functional: