PaddleOCR-VL模型返回的json中bbox位置错误

使用以下方式调用PaddleOCR-VL：

```
from pathlib import Path
from paddlex import create_pipeline

pipeline = create_pipeline(pipeline="PaddleOCR-VL.yaml")

input_file = "./datas/xxx.pdf"
output_path = Path("./output/output_xxx")

output = pipeline.predict(
    input=input_file,
    use_doc_orientation_classify=False,
    use_doc_unwarping=False)

for res in output:     
    res.save_to_json(save_path=output_path) 
```

然后查看保存的json文件，发现parsing_res_list字段中的bbox位置信息是错误的，具体案例如下：
parsing_res_list有一部分的内容如下：

<img width="1880" height="1360" alt="Image" src="https://github.com/user-attachments/assets/2619c2fd-4fa0-4b45-bca5-97223ed7dc1c" />

其中1、2和3的bbox在图上可视化为：

<img width="1340" height="236" alt="Image" src="https://github.com/user-attachments/assets/262be83e-4267-4395-8be2-f04ce3ccfd35" />

但是第一bbox对应的"content"其实是1、2和3的所有文本内容，所以现在文本内容和框的位置对不上。

所以可能是不是你们在文本合并的时候，忘记将bbox也进行合并，还有删除多余box的操作，所以导致parsing_res_list中content确实合并了，但对应的bbox位置不对，还有被合并文本的解析block没有删除，信息还在，只不过content为空。

辛苦看下，然后修复下这个问题。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PaddleOCR-VL模型返回的json中bbox位置错误 #4694

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PaddleOCR-VL模型返回的json中bbox位置错误 #4694

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions