Skip to content

Conversation

@SpongeBob0318
Copy link
Contributor

No description provided.

@paddle-bot
Copy link

paddle-bot bot commented Nov 27, 2025

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Nov 27, 2025
layout_parsing_result[key].append(value)

if merge_talble:
layout_parsing_result["parsing_res_list"] = merge_tables_across_pages(pages)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

修改同一个表格ID

else:
layout_parsing_result["input_path"] = None

layout_parsing_result["page_index"] = None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

width,height,count 置为none

:return: Normalized chapter title string.
"""
level = getattr(block, "title_level", 1)
title = getattr(block, "content", "").rstrip(".")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要去掉.

:param title: Original chapter title string.
:return: Normalized chapter title string.
"""
level = getattr(block, "title_level", 1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改成2,98行前面的# 去掉


if D > 0:
bucket = "A"
else:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

直观一些,不用ABC

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A,B,C的if,else调整一下

B_level = lvl
break

L_phys = phys_map.get(e["height"], 1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

phy_map[e["height]]


def assign_levels_to_parsing_res(parsing_res_list):
"""
parsing_res_list 是一个 LayoutBlock 对象列表
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

英文

return (tables_match or rows_match), soup_prev, soup_curr


def perform_table_merge(soup_prev, soup_curr):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

合并到一起 加上注释

@@ -0,0 +1,193 @@
import json
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

借鉴mineru(用大模型写,致谢,说明

变量和函数名改为直观的
注释写上
pre-commit
函数的划分

@SpongeBob0318 SpongeBob0318 force-pushed the merge_table_and_title_level_policy branch from f0fbd59 to 5596f01 Compare December 9, 2025 07:24
title_level: Whether to assign title levels
Returns:
LayoutParsingResultV2: Combined parsing result
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LayoutParsingResultV2

Comment on lines 1414 to 1430
for key in [
"input_path",
"page_count",
"width",
"height",
"doc_preprocessor_res",
"layout_det_res",
"region_det_res",
"overall_ocr_res",
"table_res_list",
"seal_res_list",
"chart_res_list",
"formula_res_list",
"imgs_in_doc",
"model_settings",
]:
value = single_img_res.get(key, [])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for res_key in single_img_res

return "".join(result)


# Calculate total columns including colspan and rowspan, accounting for merged cells
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

统一注释风格

return max_cols


# Calculate the actual number of columns in a single row
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

return sum(int(cell.get("colspan", 1)) for cell in row.find_all(["td", "th"]))


# Calculate the visual number of columns in a single row, excluding colspan (merged cells count as one)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

if any(w in content_u for w in keywords):
RelativeOrder_level = level
break
bucket = "RelativeOrder"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这一行应该执行不到?

            bucket = "RelativeOrder"

else:
bucket = "Cluster"

Cluster_level = cluster_map[e["height"]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cluster_level -> cluster_level

if block.label not in ("paragraph_title", "doc_title"):
continue

content = getattr(block, "content", "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

block.content

Comment on lines 234 to 329
if len(entries) == 0:
return parsing_res_list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

边界逻辑的代码放在前面

nums += 1
merged_html = perform_table_merge(soup_prev, soup_curr)
prev_block.content = merged_html
curr_block.content = ""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

group_id

@SpongeBob0318 SpongeBob0318 force-pushed the merge_table_and_title_level_policy branch 3 times, most recently from b4d844c to d5683c0 Compare December 15, 2025 09:32
@TingquanGao TingquanGao reopened this Dec 18, 2025
@SpongeBob0318 SpongeBob0318 force-pushed the merge_table_and_title_level_policy branch from 6a8db90 to 773a78d Compare December 18, 2025 06:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants