-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Open
Labels
bugSomething isn't workingSomething isn't workinghtmlissue related to html backendissue related to html backend
Description
Bug
In continuation of #2360 (comment), here is the report regarding the "Error while making rich table" error. The HTML page and a Docling command that produces such errors are presented below.
Steps to reproduce
Given the HTML:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Page Title</title>
</head>
<body class="mceContentBody aui-theme-default wiki-content fullsize">
<h3>Header 3</h3>
<table class="wrapped confluenceTable">
<colgroup>
<col>
<col>
<col>
<col>
</colgroup>
<tbody>
<tr>
<th scope="col" class="confluenceTh"><br></th>
<th scope="col" class="confluenceTh">
<h3>Column 2</h3>
</th>
<th scope="col" class="confluenceTh">Column 3</th>
<th scope="col" class="confluenceTh"><br></th>
</tr>
<tr>
<td class="confluenceTd">...</td>
<td class="confluenceTd">
<ol>
<li>... <span style="color: rgb(0,0,255);"><br><span
style="color: rgb(0,0,0);">.. </span><br></span></li>
<li>...<ol>
<li>...</li>
<li>...<span style="color: rgb(36,36,36);">...<span> </span></span>
<span style="color: rgb(0,0,255);"><strong> </strong></span></li>
<li>...</li>
</ol>
</li>
</ol>
</td>
<td class="confluenceTd">
<div class="content-wrapper">
<p><img class="editor-inline-macro" src="./bad_2_files/macro" data-macro-name="jira"
data-macro-id="..." role="button" tabindex="0" aria-haspopup="true"
aria-label="jira macro" data-macro-parameters="..." data-macro-schema-version="1"></p>
</div>
</td>
<td class="confluenceTd"><br></td>
</tr>
<tr>
<td class="confluenceTd">...</td>
<td class="confluenceTd">
<ol>
<li>... <br>...<span style="color: rgb(0,0,255);"><br><span
style="color: rgb(0,0,0);">... </span></span></li>
<li>...<ol>
<li>...</li>
<li>...</li>
<li>...</li>
<li>... </li>
</ol>
</li>
</ol>
</td>
<td class="confluenceTd">
<div class="content-wrapper">
<p><img class="editor-inline-macro" src="./bad_2_files/macro(1)" data-macro-name="jira"
data-macro-id="..." role="button" tabindex="0" aria-haspopup="true"
aria-label="jira macro" data-macro-parameters="..." data-macro-schema-version="1"></p>
</div>
</td>
<td class="confluenceTd"><br></td>
</tr>
<tr>
<td class="confluenceTd"><br></td>
<td class="confluenceTd">...</td>
<td class="confluenceTd">
<div class="content-wrapper">
<p><img class="editor-inline-macro" src="./bad_2_files/macro(2)" data-macro-name="jira"
data-macro-id="..." role="button" tabindex="0" aria-haspopup="true"
aria-label="jira macro" data-macro-parameters="..." data-macro-schema-version="1"></p>
</div>
</td>
<td class="confluenceTd"><br></td>
</tr>
<tr>
<td class="confluenceTd"><br></td>
<td class="confluenceTd">... </td>
<td class="confluenceTd"><br></td>
<td class="confluenceTd"><br></td>
</tr>
<tr>
<td class="confluenceTd"><br></td>
<td class="confluenceTd">...</td>
<td class="confluenceTd"><br></td>
<td class="confluenceTd"><br></td>
</tr>
<tr>
<td class="confluenceTd"><br></td>
<td class="confluenceTd">...</td>
<td class="confluenceTd"><br></td>
<td class="confluenceTd"><br></td>
</tr>
</tbody>
</table>
</body>
</html>
The docling shows below errors (the page is converted, though, but its content in Mardown differs to HTML one):
$ docling --from html --to md ./bad_2.html
2025-10-08 05:07:10,335 - INFO - Loading plugin 'docling_defaults'
2025-10-08 05:07:10,336 - INFO - Registered ocr engines: ['easyocr', 'ocrmac', 'rapidocr', 'tesserocr', 'tesseract']
2025-10-08 05:07:10,341 - INFO - paths: [PosixPath('/tmp/tmpo7j6jom2/bad_2.html')]
2025-10-08 05:07:10,341 - INFO - detected formats: [<InputFormat.HTML: 'html'>]
2025-10-08 05:07:10,343 - INFO - Going to convert document batch...
2025-10-08 05:07:10,343 - INFO - Initializing pipeline for SimplePipeline with options hash 995a146ad601044538e6a923bea22f4e
2025-10-08 05:07:10,346 - INFO - Loading plugin 'docling_defaults'
2025-10-08 05:07:10,346 - INFO - Registered picture descriptions: ['vlm', 'api']
2025-10-08 05:07:10,346 - INFO - Processing document bad_2.html
2025-10-08 05:07:10,348 - INFO - deleted item in tree at stack: (1, 0, 1) => #/texts/2
2025-10-08 05:07:10,349 - ERROR - Error while making rich table: Cannot find all provided RefItems in doc: ['#/texts/2'].
2025-10-08 05:07:10,349 - ERROR - Error while making rich table: Cannot find all provided RefItems in doc: ['#/texts/3'].
2025-10-08 05:07:10,350 - ERROR - Error while making rich table: Cannot find all provided RefItems in doc: ['#/texts/9'].
2025-10-08 05:07:10,351 - ERROR - Error while making rich table: Cannot find all provided RefItems in doc: ['#/texts/16'].
2025-10-08 05:07:10,351 - ERROR - Error while making rich table: Cannot find all provided RefItems in doc: ['#/texts/17'].
2025-10-08 05:07:10,352 - ERROR - Error while making rich table: Cannot find all provided RefItems in doc: ['#/texts/18'].
2025-10-08 05:07:10,352 - ERROR - Error while making rich table: Cannot find all provided RefItems in doc: ['#/texts/19'].
2025-10-08 05:07:10,352 - INFO - Finished converting document bad_2.html in 0.01 sec.
2025-10-08 05:07:10,352 - INFO - writing Markdown output to bad_2.md
2025-10-08 05:07:10,359 - INFO - Processed 1 docs, of which 0 failed
2025-10-08 05:07:10,359 - INFO - All documents were converted in 0.02 seconds.
Docling version
Output of the docling:
$ docling --version
2025-10-08 05:05:22,769 - INFO - Loading plugin 'docling_defaults'
2025-10-08 05:05:22,770 - INFO - Registered ocr engines: ['easyocr', 'ocrmac', 'rapidocr', 'tesserocr', 'tesseract']
Docling version: 2.55.1
Docling Core version: 2.48.4
Docling IBM Models version: 3.9.1
Docling Parse version: 4.5.0
Python: cpython-310 (3.10.12)
Platform: Linux-6.8.0-79-generic-x86_64-with-glibc2.35
Actually, the version from the 9705f40 revision is installed in my env.
Python version
$ python -V
Python 3.10.12
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinghtmlissue related to html backendissue related to html backend