Skip to content

Add support for PP-DocLayoutV2#1619

Merged
jzhang533 merged 4 commits intoPaddlePaddle:developfrom
alex-dinh:develop
Jan 12, 2026
Merged

Add support for PP-DocLayoutV2#1619
jzhang533 merged 4 commits intoPaddlePaddle:developfrom
alex-dinh:develop

Conversation

@alex-dinh
Copy link
Contributor

@alex-dinh alex-dinh commented Jan 5, 2026

Credit to user predict-woo, but their code does not work out of the box. I needed to make more changes to get it working. Here are detailed instructions on how to setup and run the conversion. I am working on macOS, but if anyone else is able to test it out on Windows or Linux, I welcome your input.

Setup steps (macOS):

brew install protobuf (if not installed already)
git clone https://github.com/alex-dinh/Paddle2ONNX
cd Paddle2ONNX
git submodule init
git submodule update
pip install .  

(Note: I tried using pip install -e . , but this results in an incorrect working directory when running paddle2onnx from the terminal)

My environment package versions:

paddleocr==3.3.2
paddlepaddle==3.0.0                  
paddlex==3.1.3                  
onnx==1.17.0                                 
onnxoptimizer==0.3.13                 
onnxruntime==1.22.1     

Test DocLayoutV2 conversion:

cd <path to DocLayoutV2 folder>
paddle2onnx --model_dir PP-DocLayoutV2 \
            --model_filename inference.json \
            --params_filename inference.pdiparams \
            --save_file PP-DocLayoutV2/inference.onnx

Test ONNX model:

import onnxruntime as ort
import cv2
import numpy as np

def preprocess_image_doclayout(image, target_input_size=(800, 800)):
    """
    Preprocessing for DocLayoutV2 with 800x800 input
    """
    # Get original dimensions
    orig_h, orig_w = image.shape[:2]

    # Resize, do not maintain aspect ratio
    target_h, target_w = target_input_size
    scale_h = target_h / orig_h
    scale_w = target_w / orig_w

    new_h, new_w = int(orig_h * scale_h), int(orig_w * scale_w)
    resized = cv2.resize(image, (new_w, new_h), interpolation=cv2.INTER_LINEAR)

    # Convert to RGB and normalize
    padded = cv2.cvtColor(resized, cv2.COLOR_BGR2RGB)

    # Normalize: scale to [0, 1] then apply ImageNet normalization
    input_blob = padded.astype(np.float32) / 255.0

    # ImageNet normalization
    mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
    std = np.array([0.229, 0.224, 0.225], dtype=np.float32)
    input_blob = (input_blob - mean) / std

    # Transpose to CHW format and add batch dimension
    input_blob = input_blob.transpose(2, 0, 1)[np.newaxis, ...]

    return input_blob, scale_h, scale_w

def paddle_onnx_doclayout():
    model = ort.InferenceSession('./PP-DocLayoutV2/inference.onnx')  # Specify onnx path
    input_names = [i.name for i in model.get_inputs()]
    output_names = [o.name for o in model.get_outputs()]

    image = cv2.imread('test_input.png')  # Specify test input image
    input_blob, scale_h, scale_w = preprocess_image_doclayout(image)
    print(scale_h, scale_w)
    preprocess_shape = [np.array([800, 800], dtype=np.float32)]
    input_feed = {input_names[0]: preprocess_shape,
                  input_names[1]: input_blob,
                  input_names[2]: [[scale_h, scale_w]]}

    output = model.run(output_names, input_feed)[0]
    # First 6 values are [label_index, score, xmin, ymin, xmax, ymax]
    print(output[0])  # (300, 8)

    # Filter out low-confidence boxes
    boxes = output[output[:, 1] > 0.5]
    print(boxes)

if __name__ == '__main__':
    paddle_onnx_doclayout()

On macOS, the outputted model has a lot of unused weight initializers, so I also recommend optimizing the model via onnxoptimizer to avoid warnings and reduce model size. This issue is not present on Linux for some reason (I tested on Zorin OS).

python -m onnxoptimizer inference.onnx inference_optimized.onnx

On macOS, the paddle2onnx conversion produces these warnings, but the model is still functional:

**2026-01-06 15:52:59 [WARNING]	Fail to fold onnx model with error: [ShapeInferenceError] (op_type:BatchNormalization, node name: BatchNormalization.1): [TypeInferenceError] Input 0 expected to have type but instead is null. Skip folding.**
2026-01-06 15:53:00 [INFO]	ONNX model saved in PP-DocLayoutV2/inference.onnx.

@paddle-bot
Copy link

paddle-bot bot commented Jan 5, 2026

Thanks for your contribution!

@CLAassistant
Copy link

CLAassistant commented Jan 5, 2026

CLA assistant check
All committers have signed the CLA.

@alex-dinh
Copy link
Contributor Author

Looks like even though converting DocLayoutV2 to onnx works, the changes break some tests with the strided_slice and stack ops. Will mark as a draft for now.

@alex-dinh alex-dinh marked this pull request as draft January 6, 2026 18:38
@alex-dinh alex-dinh marked this pull request as ready for review January 6, 2026 19:58
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modifying this file from the version prior to this PR (3e77ec7) was unnecessary, reverted it.

@alex-dinh
Copy link
Contributor Author

@zhangbo9674 Do you know why the Windows build is failing? There seems to be some missing DLL in the test environment: https://github.com/PaddlePaddle/Paddle2ONNX/actions/runs/20767660339/job/59637230828?pr=1619#step:10:224

@GreatV
Copy link
Collaborator

GreatV commented Jan 7, 2026

paddle2onnx --model_dir ../PaddleOCR-VL/PP-DocLayoutV2/ --model_filename inference.pdmodel --params_filename inference.pdiparams --save_file pp-doclayoutv2.ooonnx
2026-01-07 03:14:53 [WARNING]   The .pdmodel file is deprecated in paddlepaddle 3.0 and will be removed in the future. Try to convert from .pdmodel file to json file.
I0107 03:14:53.903254 2728575 program_interpreter.cc:257] New Executor is Running.
[Paddle2ONNX] Start parsing the Paddle model file...
[Paddle2ONNX] Use opset_version = 17 for ONNX export.


--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   paddle2onnx::Export(char const*, char const*, char**, int*, int, bool, bool, bool, bool, bool, paddle2onnx::CustomOp*, int, char const*, char**, int*, char const*, bool*, bool, char**, int)

----------------------
Error Message Summary:
----------------------
FatalError: `Segmentation fault` is detected by the operating system.
  [TimeInfo: *** Aborted at 1767755694 (unix time) try "date -d @1767755694" if you are using GNU date ***]
  [SignalInfo: *** SIGSEGV (@0x0) received by PID 2728575 (TID 0x7655a5d0d140) from PID 0 ***]

Segmentation fault (core dumped)
paddle2onnx --model_dir PP-DocLayoutV2_infer/ --model_filename inference.json --params_filename inference.pdiparams --save_file pp-doclayoutv2.onnx
[Paddle2ONNX] Start parsing the Paddle model file...
[Paddle2ONNX] Use opset_version = 17 for ONNX export.
[Paddle2ONNX] PaddlePaddle model is exported as ONNX format now.
2026-01-07 04:57:16 [INFO]      Try to perform constant folding on the ONNX model with Polygraphy.
[W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
[I] Folding Constants | Pass 1
2026-01-07 04:57:16.696936887 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze.277
[W] colored module is not installed, will not use colors when logging. To enable colors, please install the colored module: python3 -m pip install colored
[W] It looks like this model contains foldable nodes that produce large outputs.
In order to avoid bloating the model, you may want to set a constant-folding size threshold.
Note: Large tensors and their corresponding sizes were: {'Mul.204': '1 MiB'}
[W] Falling back to `onnx.shape_inference` because `onnxruntime.tools.symbolic_shape_infer` either could not be loaded or did not run successfully.
    Note that using ONNX-Runtime for shape inference may be faster and require less memory.
    Consider installing ONNX-Runtime or setting POLYGRAPHY_AUTOINSTALL_DEPS=1 in your environment variables to allow Polygraphy to do so automatically.
[I]     Total Nodes | Original:  8512, After Folding:  3767 |  4745 Nodes Folded
[I] Folding Constants | Pass 2
[I]     Total Nodes | Original:  3767, After Folding:  3753 |    14 Nodes Folded
[I] Folding Constants | Pass 3
[I]     Total Nodes | Original:  3753, After Folding:  3753 |     0 Nodes Folded
2026-01-07 04:57:31 [INFO]      ONNX model saved in pp-doclayoutv2.onnx.

@zhaohb
Copy link

zhaohb commented Jan 7, 2026

@GreatV Hi can you share your onnx model?
Thank you.

@GreatV
Copy link
Collaborator

GreatV commented Jan 7, 2026

Why is there such a large discrepancy between the output and Paddle Inference?

#!/usr/bin/env python3
from __future__ import annotations

import argparse
from pathlib import Path

import numpy as np


def _parse_int_list(csv: str) -> list[int]:
    return [int(x.strip()) for x in csv.split(",") if x.strip()]


def _add_bool_arg(
    parser: argparse.ArgumentParser, name: str, default: bool, help_text: str
) -> None:
    dest = name.lstrip("-").replace("-", "_")
    group = parser.add_mutually_exclusive_group(required=False)
    group.add_argument(name, dest=dest, action="store_true", help=help_text)
    group.add_argument(
        f"--no_{dest}", dest=dest, action="store_false", help=f"Disable: {help_text}"
    )
    parser.set_defaults(**{dest: default})


def export_onnx(
    model_dir: Path,
    model_filename: str,
    params_filename: str,
    onnx_path: Path,
    opset_version: int,
    auto_update_opset: bool,
    enable_onnx_checker: bool,
    optimize_tool: str,
    verbose: bool,
) -> None:
    import paddle2onnx

    model_file = model_dir / model_filename
    params_file = model_dir / params_filename
    if not model_file.exists():
        raise FileNotFoundError(f"model file not found: {model_file}")
    if not params_file.exists():
        raise FileNotFoundError(f"params file not found: {params_file}")

    onnx_path.parent.mkdir(parents=True, exist_ok=True)
    paddle2onnx.export(
        str(model_file),
        str(params_file),
        str(onnx_path),
        opset_version=opset_version,
        auto_upgrade_opset=auto_update_opset,
        verbose=verbose,
        enable_onnx_checker=enable_onnx_checker,
        optimize_tool=optimize_tool,
    )


def build_paddle_predictor(
    model_dir: Path,
    model_filename: str,
    params_filename: str,
    disable_mkldnn: bool,
    disable_ir_optim: bool,
):
    import paddle.inference as paddle_infer

    model_file = model_dir / model_filename
    params_file = model_dir / params_filename
    config = paddle_infer.Config(str(model_file), str(params_file))
    config.disable_gpu()
    if disable_mkldnn:
        config.disable_mkldnn()
    if disable_ir_optim:
        config.switch_ir_optim(False)
    return paddle_infer.create_predictor(config)


def build_ort_session(onnx_path: Path, disable_ort_optim: bool):
    import onnxruntime as ort

    sess_options = ort.SessionOptions()
    if disable_ort_optim:
        sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
    return ort.InferenceSession(
        str(onnx_path),
        sess_options=sess_options,
        providers=["CPUExecutionProvider"],
    )


def generate_inputs(
    batch: int,
    seed: int,
    height: int,
    width: int,
    repeat_first: bool,
) -> dict[str, np.ndarray]:
    rng = np.random.default_rng(seed)
    if repeat_first:
        base = rng.standard_normal((1, 3, height, width)).astype(np.float32)
        image = np.repeat(base, batch, axis=0)
    else:
        image = rng.standard_normal((batch, 3, height, width)).astype(np.float32)

    im_shape = np.tile(np.array([[float(height), float(width)]], dtype=np.float32), (batch, 1))
    scale_factor = np.tile(np.array([[1.0, 1.0]], dtype=np.float32), (batch, 1))
    return {"image": image, "im_shape": im_shape, "scale_factor": scale_factor}


def main() -> int:
    parser = argparse.ArgumentParser(
        description="Compare Paddle Inference vs ONNXRuntime for PP-DocLayoutV2."
    )
    parser.add_argument("--model_dir", type=Path, required=True)
    parser.add_argument("--model_filename", type=str, default="inference.json")
    parser.add_argument("--params_filename", type=str, default="inference.pdiparams")
    parser.add_argument("--onnx_path", type=Path, required=True)
    parser.add_argument("--export_onnx", action="store_true")
    parser.add_argument("--opset_version", type=int, default=17)
    _add_bool_arg(
        parser,
        "--auto_update_opset",
        default=True,
        help_text="Auto update ONNX opset",
    )
    _add_bool_arg(
        parser,
        "--enable_onnx_checker",
        default=False,
        help_text="Run ONNX checker",
    )
    parser.add_argument("--optimize_tool", type=str, default="None")
    parser.add_argument("--verbose", action="store_true")

    parser.add_argument("--batches", type=_parse_int_list, default=[1, 2, 4, 8])
    parser.add_argument("--seed", type=int, default=20260107)
    parser.add_argument("--height", type=int, default=800)
    parser.add_argument("--width", type=int, default=800)
    parser.add_argument("--repeat_first", action="store_true")
    parser.add_argument("--atol", type=float, default=1e-4)
    parser.add_argument("--rtol", type=float, default=1e-4)
    parser.add_argument("--show_max_loc", action="store_true")

    _add_bool_arg(
        parser,
        "--disable_mkldnn",
        default=True,
        help_text="Disable Paddle MKLDNN",
    )
    _add_bool_arg(
        parser,
        "--disable_ir_optim",
        default=True,
        help_text="Disable Paddle IR optim",
    )
    _add_bool_arg(
        parser,
        "--disable_ort_optim",
        default=True,
        help_text="Disable ORT graph optim",
    )

    args = parser.parse_args()

    if args.export_onnx or not args.onnx_path.exists():
        export_onnx(
            model_dir=args.model_dir,
            model_filename=args.model_filename,
            params_filename=args.params_filename,
            onnx_path=args.onnx_path,
            opset_version=args.opset_version,
            auto_update_opset=args.auto_update_opset,
            enable_onnx_checker=args.enable_onnx_checker,
            optimize_tool=args.optimize_tool,
            verbose=args.verbose,
        )

    predictor = build_paddle_predictor(
        model_dir=args.model_dir,
        model_filename=args.model_filename,
        params_filename=args.params_filename,
        disable_mkldnn=args.disable_mkldnn,
        disable_ir_optim=args.disable_ir_optim,
    )
    sess = build_ort_session(args.onnx_path, disable_ort_optim=args.disable_ort_optim)

    ort_output_names = [o.name for o in sess.get_outputs()]
    pd_output_names = predictor.get_output_names()

    common_outputs = [name for name in ort_output_names if name in set(pd_output_names)]
    if not common_outputs:
        raise RuntimeError(
            f"No common outputs between Paddle({pd_output_names}) and ORT({ort_output_names})"
        )

    print("Paddle inputs:", predictor.get_input_names())
    print("Paddle outputs:", pd_output_names)
    print("ORT inputs:", [i.name for i in sess.get_inputs()])
    print("ORT outputs:", ort_output_names)
    print("Compare outputs:", common_outputs)
    print()

    for batch in args.batches:
        inputs = generate_inputs(
            batch=batch,
            seed=args.seed + batch,
            height=args.height,
            width=args.width,
            repeat_first=args.repeat_first,
        )

        # Paddle
        for name in predictor.get_input_names():
            if name not in inputs:
                raise RuntimeError(f"Missing input '{name}' in generated inputs: {list(inputs)}")
            arr = inputs[name]
            h = predictor.get_input_handle(name)
            h.reshape(arr.shape)
            h.copy_from_cpu(arr)
        predictor.run()
        pd_outputs = {name: predictor.get_output_handle(name).copy_to_cpu() for name in pd_output_names}

        # ORT
        ort_outputs_list = sess.run(None, inputs)
        if len(ort_output_names) != len(ort_outputs_list):
            raise RuntimeError(
                f"ORT outputs mismatch: names={len(ort_output_names)} values={len(ort_outputs_list)}"
            )
        ort_outputs = dict(zip(ort_output_names, ort_outputs_list))

        print(f"batch={batch} repeat_first={args.repeat_first} seed={args.seed + batch}")
        for name in common_outputs:
            pd = pd_outputs[name]
            ort = ort_outputs[name]
            same_shape = pd.shape == ort.shape
            same_dtype = pd.dtype == ort.dtype
            print(f"  {name}: shape {pd.shape} vs {ort.shape} match={same_shape} dtype {pd.dtype} vs {ort.dtype} match={same_dtype}")
            if not same_shape:
                continue
            if np.issubdtype(pd.dtype, np.floating) and np.issubdtype(ort.dtype, np.floating):
                diff = pd - ort
                absdiff = np.abs(diff)
                max_abs = float(absdiff.max())
                mean_abs = float(absdiff.mean())
                allclose = bool(np.allclose(pd, ort, atol=args.atol, rtol=args.rtol))
                print(f"    max_abs={max_abs} mean_abs={mean_abs} allclose(atol={args.atol},rtol={args.rtol})={allclose}")
                if pd.ndim == 2 and pd.shape[1] <= 64:
                    print(f"    per_col_max={absdiff.max(axis=0)}")
                if args.show_max_loc:
                    max_idx = np.unravel_index(np.argmax(absdiff), absdiff.shape)
                    print(f"    max_loc={max_idx} pd={pd[max_idx]} ort={ort[max_idx]}")
            else:
                equal = bool(np.array_equal(pd, ort))
                print(f"    equal={equal}")
        print()

    return 0


if __name__ == "__main__":
    raise SystemExit(main())
 python debug/compare_paddle_ort_ppdoclayoutv2.py --model_dir PP-DocLayoutV2_infer --onnx_path pp-doclayoutv2.onnx
--- Running PIR pass [add_shadow_output_after_dead_parameter_pass]
--- Running PIR pass [delete_quant_dequant_linear_op_pass]
--- Running PIR pass [delete_weight_dequant_linear_op_pass]
--- Running PIR pass [transfer_layout_pass]
--- Running PIR pass [common_subexpression_elimination_pass]
I0107 06:40:56.880336 2811109 print_statistics.cc:50] --- detected [870] subgraphs!
--- Running PIR pass [constant_folding_pass]
I0107 06:40:56.881546 2811109 pir_interpreter.cc:1601] New Executor is Running ...
I0107 06:40:56.881672 2811109 pir_interpreter.cc:1625] pir interpreter is running by multi-thread mode ...
I0107 06:40:56.932209 2811109 print_statistics.cc:44] --- detected [165, 2907] subgraphs!
--- Running PIR pass [dead_code_elimination_pass]
I0107 06:40:56.933084 2811109 print_statistics.cc:50] --- detected [54] subgraphs!
--- Running PIR pass [replace_fetch_with_shadow_output_pass]
I0107 06:40:56.933688 2811109 print_statistics.cc:50] --- detected [2] subgraphs!
--- Running PIR pass [remove_shadow_feed_pass]
--- Running PIR pass [inplace_pass]
I0107 06:40:57.141657 2811109 print_statistics.cc:50] --- detected [676] subgraphs!
I0107 06:40:57.142059 2811109 analysis_predictor.cc:1217] ======= pir optimization completed =======
Paddle inputs: ['im_shape', 'image', 'scale_factor']
Paddle outputs: ['fetch_name_0', 'fetch_name_1']
ORT inputs: ['im_shape', 'image', 'scale_factor']
ORT outputs: ['fetch_name_0', 'fetch_name_1']
Compare outputs: ['fetch_name_0', 'fetch_name_1']

I0107 06:40:57.943626 2811109 pir_interpreter.cc:1622] pir interpreter is running by trace mode ...
batch=1 repeat_first=False seed=20260108
  fetch_name_0: shape (300, 8) vs (300, 8) match=True dtype float32 vs float32 match=True
    max_abs=291.0 mean_abs=39.715877532958984 allclose(atol=0.0001,rtol=0.0001)=False
    per_col_max=[0.0000000e+00 1.1026859e-06 4.3411255e-03 1.2893677e-03 7.9345703e-04
 1.6479492e-03 2.9100000e+02 2.9100000e+02]
  fetch_name_1: shape (1,) vs (1,) match=True dtype int32 vs int32 match=True
    equal=True

batch=2 repeat_first=False seed=20260109
  fetch_name_0: shape (600, 8) vs (600, 8) match=True dtype float32 vs float32 match=True
    max_abs=291.0 mean_abs=39.5771369934082 allclose(atol=0.0001,rtol=0.0001)=False
    per_col_max=[0.0000000e+00 2.3014843e-05 1.4595032e-02 1.6174316e-03 7.9040527e-03
 1.5258789e-03 2.9100000e+02 2.9100000e+02]
  fetch_name_1: shape (2,) vs (2,) match=True dtype int32 vs int32 match=True
    equal=True

batch=4 repeat_first=False seed=20260111
  fetch_name_0: shape (1200, 8) vs (1200, 8) match=True dtype float32 vs float32 match=True
    max_abs=291.0 mean_abs=39.563392639160156 allclose(atol=0.0001,rtol=0.0001)=False
    per_col_max=[0.0000000e+00 6.0498714e-06 1.7265320e-02 2.3117065e-03 3.7841797e-03
 5.0048828e-03 2.9100000e+02 2.9100000e+02]
  fetch_name_1: shape (4,) vs (4,) match=True dtype int32 vs int32 match=True
    equal=True

batch=8 repeat_first=False seed=20260115
  fetch_name_0: shape (2400, 8) vs (2400, 8) match=True dtype float32 vs float32 match=True
    max_abs=291.0 mean_abs=39.15203094482422 allclose(atol=0.0001,rtol=0.0001)=False
    per_col_max=[0.0000000e+00 1.8328428e-06 1.6151428e-02 4.8522949e-03 8.9721680e-03
 6.7138672e-03 2.9100000e+02 2.9100000e+02]
  fetch_name_1: shape (8,) vs (8,) match=True dtype int32 vs int32 match=True
    equal=True

@zhaohb
Copy link

zhaohb commented Jan 7, 2026

I'm also encountering this problem. The ONNX model outputs zero coordinates, while the classification results and scores seem accurate.

@GreatV
Copy link
Collaborator

GreatV commented Jan 7, 2026

Hi, @zhaohb You might want to try the version I exported myself, which differs slightly from the current PR implementation. pp-doclayoutv2.onnx

@alex-dinh
Copy link
Contributor Author

alex-dinh commented Jan 7, 2026

Hi, @zhaohb You might want to try the version I exported myself, which differs slightly from the current PR implementation. pp-doclayoutv2.onnx

Hi, what changes did you need to make, and what OS are you using?

Also, here is the onnx model I exported using the changes in this PR: PP-DocLayoutV2.onnx

@GreatV
Copy link
Collaborator

GreatV commented Jan 7, 2026

Hi @alex-dinh I add tie-break logic for argsort

@zhaohb
Copy link

zhaohb commented Jan 7, 2026

Hi, @zhaohb You might want to try the version I exported myself, which differs slightly from the current PR implementation. pp-doclayoutv2.onnx

Hi, what changes did you need to make, and what OS are you using?

Also, here is the onnx model I exported using the changes in this PR: PP-DocLayoutV2.onnx

Hi @alex-dinh
I've verified the ONNX model you provided, and the results are as expected. Thank you very much.

@zhaohb
Copy link

zhaohb commented Jan 7, 2026

Hi, @zhaohb You might want to try the version I exported myself, which differs slightly from the current PR implementation. pp-doclayoutv2.onnx

Hi, what changes did you need to make, and what OS are you using?
Also, here is the onnx model I exported using the changes in this PR: PP-DocLayoutV2.onnx

Hi @alex-dinh I've verified the ONNX model you provided, and the results are as expected. Thank you very much.

Just to clarify: we're still seeing some discrepancies in the coordinate outputs between the ONNX and Paddle inference results. Have you noticed any differences in your tests? @alex-dinh

@alex-dinh
Copy link
Contributor Author

alex-dinh commented Jan 7, 2026

Just to clarify: we're still seeing some discrepancies in the coordinate outputs between the ONNX and Paddle inference results. Have you noticed any differences in your tests? @alex-dinh

Hi @zhaohb, I see some discrepancies but they are very minor. The PaddleX library also does some extra postprocessing (such as unclipping, merging) that I do not apply to the ONNX model output, which would explain the difference. However, the results for my local testing are very similar. I am only filtering out low-confidence boxes from the ONNX model.

# ONNX inference results
cls_id	score	xmin	ymin	xmax	ymax
22		0.988	129.06	980.56	1008.02	1389.72
22		0.981	129.53	676.27	1005.06	758.30
22		0.985	129.69	760.43	1006.29	869.83
22		0.986	129.77	461.66	1007.36	595.17
22		0.988	129.90	215.78	1006.80	459.33
22		0.967	130.72	923.87	1004.14	977.11
17		0.955	132.66	157.30	910.44	193.03
15		0.845	134.74	242.61	219.08	268.67
16		0.905	134.82	87.82	172.78	108.21
5		0.970	186.75	602.91	432.31	666.96
5		0.946	187.07	880.80	411.39	910.06
15		0.600	198.66	678.32	316.77	700.77
15		0.759	205.82	352.43	272.44	376.67
15		0.920	327.63	297.45	481.57	321.51
15		0.926	357.26	923.21	492.15	949.41
15		0.842	502.21	269.64	584.47	294.52
15		0.918	536.70	294.97	614.41	319.93
12		0.950	655.48	86.78	1004.64	110.35
15		0.930	781.65	488.08	916.63	513.95
15		0.921	835.40	323.57	996.94	348.95
15		0.920	850.75	242.72	954.85	267.82
11		0.934	941.74	882.36	1001.35	909.73
11		0.934	941.85	621.43	1001.59	648.19

# Paddle inference results
cls_id	score	xmin	ymin	xmax	ymax
22		0.988	129.33	981.08	1007.87	1390.29
22		0.982	129.86	676.31	1005.09	758.65
22		0.985	130.17	761.11	1006.52	870.00
22		0.987	130.20	461.91	1007.33	596.17
22		0.989	130.42	216.08	1006.34	459.67
22		0.968	130.83	923.80	1005.02	978.05
17		0.954	132.14	157.31	911.11	193.37
16		0.890	134.67	87.14	172.41	108.45
15		0.910	134.95	242.78	218.56	269.02
5		0.945	187.28	880.58	411.67	910.64
5		0.969	187.43	603.18	432.71	667.15
15		0.559	198.57	678.35	316.43	700.74
15		0.774	206.47	352.40	271.67	377.55
15		0.872	328.40	296.68	481.14	322.20
15		0.934	356.49	922.29	492.36	950.71
15		0.866	501.80	269.53	583.92	294.96
15		0.922	537.86	294.66	613.58	320.30
12		0.952	656.88	86.38	1004.87	110.47
15		0.936	782.03	488.54	916.38	514.59
15		0.936	835.46	323.44	995.74	349.93
15		0.927	851.37	242.47	954.60	268.37
11		0.935	941.90	882.26	1001.40	910.11
11		0.936	941.94	621.05	1001.64	649.50

Here is the image I ran my test on:

sb_policy_approx.png

Edit: here is my doclayout test script:
doclayout.py

Copy link
Collaborator

@GreatV GreatV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to incorporate additional unit tests for the index_put function, add the registration in paddle2onnx/mappers_registry.h.in, and update the copyright year in the newly appended files to 2026.

@@ -0,0 +1,169 @@
// Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
// Copyright (c) 2026 PaddlePaddle Authors. All Rights Reserved.

@@ -0,0 +1,46 @@
// Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
// Copyright (c) 2026 PaddlePaddle Authors. All Rights Reserved.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds support for converting PaddlePaddle's PP-DocLayoutV2 model to ONNX format. The implementation builds upon work from predict-woo but includes significant modifications and additional fixes needed for macOS compatibility.

Key changes:

  • Updated PaddlePaddle dependency from development version to stable 3.0.0
  • Added new index_put operation mapper to handle tensor indexing operations
  • Enhanced existing tensor operation mappers (stack, squeeze2, slice, set_value) to properly handle PIR mode and edge cases

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pyproject.toml Updated paddlepaddle to stable 3.0.0 release and bumped minimum onnx version to 1.16.1
paddle2onnx/mapper/tensor/stack.cc Added logic to normalize mixed-rank inputs (scalars and single-element tensors) before stacking
paddle2onnx/mapper/tensor/squeeze2.cc Added optimization to skip squeeze operation when target dimensions are not squeezable
paddle2onnx/mapper/tensor/slice.cc Added early return for PIR mode when decrease_axis is present to avoid shape comparison failures
paddle2onnx/mapper/tensor/set_value.cc Added support for PIR mode set_value_with_tensor operations with empty axes and alternative input names
paddle2onnx/mapper/tensor/index_put.h New header file defining the IndexPutMapper class for handling index_put operations
paddle2onnx/mapper/tensor/index_put.cc New implementation supporting both boolean mask and integer indexing with optional accumulation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

continue;
} else if (x_info[i].Rank() == 1) {
// Check if it's exactly [1] not [4] or other sizes
if (x_info[i].shape.size() > 0 && x_info[i].shape[0] == 1) {
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for detecting single-element tensors has a potential issue. At line 47, the code checks x_info[i].shape.size() > 0, but for a rank-1 tensor, shape.size() is always equal to 1 (the rank), not necessarily > 0. This condition will always be true for rank-1 tensors. The intent seems correct but the check is redundant since we already know Rank() == 1 at this point.

Suggested change
if (x_info[i].shape.size() > 0 && x_info[i].shape[0] == 1) {
if (x_info[i].shape[0] == 1) {

Copilot uses AI. Check for mistakes.
@hzkitty
Copy link

hzkitty commented Jan 8, 2026

Hi, @zhaohb You might want to try the version I exported myself, which differs slightly from the current PR implementation. pp-doclayoutv2.onnx

Hi, what changes did you need to make, and what OS are you using?

Also, here is the onnx model I exported using the changes in this PR: PP-DocLayoutV2.onnx

PP-DocLayoutV2.onnx will encounter an error when loaded with openvino, while pp-doclayoutv2.onnx works normally
image

Edit: here is my doclayout-openvino test script:
doclayout-openvino.py

@alex-dinh

@zhaohb
Copy link

zhaohb commented Jan 8, 2026

Hi, @zhaohb You might want to try the version I exported myself, which differs slightly from the current PR implementation. pp-doclayoutv2.onnx

Hi, what changes did you need to make, and what OS are you using?
Also, here is the onnx model I exported using the changes in this PR: PP-DocLayoutV2.onnx

PP-DocLayoutV2.onnx will encounter an error when loaded with openvino, while pp-doclayoutv2.onnx works normally image

Edit: here is my doclayout-openvino test script: doclayout-openvino.py

@alex-dinh

Hi @hzkitty ,
I've just finished preparing the OpenVINO IR model. Please feel free to test it.
https://www.modelscope.cn/models/zhaohb/PP-DocLayoutV2-ov/summary

@alex-dinh
Copy link
Contributor Author

alex-dinh commented Jan 8, 2026

PP-DocLayoutV2.onnx will encounter an error when loaded with openvino, while pp-doclayoutv2.onnx works normally

Hi @hzkitty, this revealed an issue with macOS and paddle2onnx. I tried exporting the onnx model on my linux machine and your script runs successfully.

Try this model: PP-DocLayoutV2.onnx (Exported in a linux environment)


Output of doclayout-openvino.py using this new PP-DocLayoutV2.onnx:

=== Model Inputs ===
im_shape [?,2]
image [?,3,800,800]
scale_factor [?,2]
=== Model Outputs ===
fetch_name_0 [?,8]
fetch_name_1 [?]
--- DocLayoutV2 OpenVINO Output ---
cls_id	score	xmin	ymin	xmax	ymax
22	0.987	128.88	980.50	1008.50	1389.00
22	0.986	129.75	462.00	1007.00	595.50
22	0.988	130.00	215.62	1006.50	459.50
22	0.984	130.00	760.50	1006.50	869.50
22	0.981	130.00	676.50	1004.00	758.50
22	0.966	130.75	924.00	1003.00	977.00
17	0.955	132.75	157.38	909.50	193.12
15	0.851	134.50	242.62	219.00	268.75
16	0.906	135.00	87.69	172.88	108.12
5	0.970	186.88	603.00	432.50	667.50
5	0.945	187.62	881.00	412.00	909.50
15	0.607	198.50	678.50	316.75	700.50
15	0.764	206.12	352.50	272.50	376.75
15	0.918	327.75	297.00	481.50	321.00
15	0.926	358.00	923.00	492.75	949.00
15	0.844	502.25	269.75	584.50	294.75
15	0.921	536.50	295.50	614.50	320.50
12	0.950	655.50	86.75	1004.00	110.19
15	0.930	781.50	488.25	917.50	514.00
15	0.919	835.00	323.25	997.50	348.75
15	0.919	850.50	242.50	955.00	267.50
11	0.934	942.00	882.50	1001.00	909.50
11	0.932	942.00	621.50	1002.00	648.00

"cmake>=3.16",
"setuptools-scm",
"paddlepaddle==3.0.0",
"paddlepaddle==3.0.0.dev20250426",
Copy link
Contributor Author

@alex-dinh alex-dinh Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why, but the Windows build requires "paddlepaddle==3.0.0.dev20250426" in the PR check workflow. For local development on macOS and linux, changing to 3.0.0 or 3.1.0 are fine for pip install -e . to run successfully.

%PY_CMD% -m pip install tqdm filelock
%PY_CMD% -m pip install onnx==1.16.0 onnxruntime==1.19.0
%PY_CMD% -m pip install six hypothesis
%PY_CMD% -m pip install --pre paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is that this used to work when paddlepaddle==3.0.0.dev20250426 was the newest dev version, but broke when newer versions of paddlepaddle were released.

@alex-dinh
Copy link
Contributor Author

We may need to incorporate additional unit tests for the index_put function, add the registration in paddle2onnx/mappers_registry.h.in, and update the copyright year in the newly appended files to 2026.

Hi @GreatV, I made edits according to your suggestions. Please see the latest commits!

Copy link
Collaborator

@GreatV GreatV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@GreatV GreatV requested a review from jzhang533 January 10, 2026 01:13
@jzhang533 jzhang533 merged commit 820b83d into PaddlePaddle:develop Jan 12, 2026
5 checks passed
@jzhang533
Copy link
Collaborator

I have created a new release which includes this PR. please give it a try:

https://pypi.org/project/paddle2onnx/2.1.0/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants