serialize GroupItem meta prior to content, DocItem meta after content

vagenas · vagenas · commit 3287664d087f · 2025-10-28T21:27:44.000+01:00
Signed-off-by: Panos Vagenas &lt;pva@zurich.ibm.com&gt;
diff --git a/docling_core/transforms/serializer/common.py b/docling_core/transforms/serializer/common.py
@@ -46,6 +46,7 @@
     FloatingItem,
     Formatting,
     FormItem,
+    GroupItem,
     InlineGroup,
     KeyValueItem,
     ListGroup,
@@ -454,17 +455,20 @@ def get_parts(
             else:
                 my_visited.add(node.self_ref)
 
+            meta_part = create_ser_result()
+            node_is_group = isinstance(node, GroupItem)
             if (
                 not params.use_legacy_annotations
                 and node.self_ref not in self.get_excluded_refs(**kwargs)
             ):
-                part = self.serialize_meta(
+                meta_part = self.serialize_meta(
                     item=node,
                     level=lvl,
                     **kwargs,
                 )
-                if part.text:
-                    parts.append(part)
+                if meta_part.text and node_is_group:
+                    # for GroupItems add meta prior to content
+                    parts.append(meta_part)
 
             if params.include_non_meta:
                 part = self.serialize(
@@ -477,6 +481,10 @@ def get_parts(
                 if part.text:
                     parts.append(part)
 
+            if meta_part.text and not node_is_group:
+                # for DocItems add meta after content
+                parts.append(meta_part)
+
         return parts
 
     @override
diff --git a/test/data/doc/2408.09869v3_enriched.gt.md b/test/data/doc/2408.09869v3_enriched.gt.md
@@ -2,12 +2,12 @@
 
 <!-- page break -->
 
-In this image, we can see some text and images.
-
 Figure 1: Sketch of Docling's default processing pipeline. The inner part of the model pipeline is easily customizable and extensible.
 
 <!-- image -->
 
+In this image, we can see some text and images.
+
 licensing (e.g. pymupdf [7]), poor speed or unrecoverable quality issues, such as merged text cells across far-apart text tokens or table columns (pypdfium, PyPDF) [15, 14].
 
 We therefore decided to provide multiple backend choices, and additionally open-source a custombuilt PDF parser, which is based on the low-level qpdf [4] library. It is made available in a separate package named docling-parse and powers the default PDF backend in Docling. As an alternative, we provide a PDF backend relying on pypdfium , which may be a safe backup choice in certain cases, e.g. if issues are seen with particular font encodings.
diff --git a/test/data/doc/2408.09869v3_enriched_p1_include_annotations_false.gt.md b/test/data/doc/2408.09869v3_enriched_p1_include_annotations_false.gt.md
@@ -1,9 +1,9 @@
 # Docling Technical Report
 
-In this image we can see a cartoon image of a duck holding a paper.
-
 <!-- image -->
 
+In this image we can see a cartoon image of a duck holding a paper.
+
 Version 1.0
 
 Christoph Auer Maksym Lysak Ahmed Nassar Michele Dolfi Nikolaos Livathinos Panos Vagenas Cesar Berrospi Ramis Matteo Omenetti Fabian Lindlbauer Kasper Dinkla Lokesh Mishra Yusik Kim Shubham Gupta Rafael Teixeira de Lima Valery Weber Lucas Morin Ingmar Meijer Viktor Kuropiatnyk Peter W. J. Staar
@@ -22,8 +22,6 @@ With Docling , we open-source a very capable and efficient document conversion t
 
 torch runtimes backing the Docling pipeline. We will deliver updates on this topic at in a future version of this report.
 
-{'summary': 'Typical Docling setup runtime characterization.', 'type': 'performance data'}
-
 Table 1: Runtime characteristics of Docling with the standard model pipeline and settings, on our test dataset of 225 pages, on two different systems. OCR is disabled. We show the time-to-solution (TTS), computed throughput in pages per second, and the peak memory used (resident set size) for both the Docling-native PDF backend and for the pypdfium backend, using 4 and 16 threads.
 
 | CPU                              | Thread budget   | native backend   | native backend   | native backend   | pypdfium backend   | pypdfium backend   | pypdfium backend   |
@@ -32,6 +30,8 @@ Table 1: Runtime characteristics of Docling with the standard model pipeline and
 | Apple M3 Max                     | 4               | 177 s 167 s      | 1.27 1.34        | 6.20 GB          | 103 s 92 s         | 2.18 2.45          | 2.56 GB            |
 | (16 cores) Intel(R) Xeon E5-2690 | 16 4 16         | 375 s 244 s      | 0.60 0.92        | 6.16 GB          | 239 s 143 s        | 0.94 1.57          | 2.42 GB            |
 
+{'summary': 'Typical Docling setup runtime characterization.', 'type': 'performance data'}
+
 ## 5 Applications
 
 Thanks to the high-quality, richly structured document conversion achieved by Docling, its output qualifies for numerous downstream applications. For example, Docling can provide a base for detailed enterprise document search, passage retrieval or classification use-cases, or support knowledge extraction pipelines, allowing specific treatment of different structures in the document, such as tables, figures, section structure or references. For popular generative AI application patterns, such as retrieval-augmented generation (RAG), we provide quackling , an open-source package which capitalizes on Docling's feature-rich document output to enable document-native optimized vector embedding and chunking. It plugs in seamlessly with LLM frameworks such as LlamaIndex [8]. Since Docling is fast, stable and cheap to run, it also makes for an excellent choice to build document-derived datasets. With its powerful table structure recognition, it provides significant benefit to automated knowledge-base construction [11, 10]. Docling is also integrated within the open IBM data prep kit [6], which implements scalable data transforms to build large-scale multi-modal training datasets.
diff --git a/test/data/doc/2408.09869v3_enriched_p1_mark_annotations_false.gt.md b/test/data/doc/2408.09869v3_enriched_p1_mark_annotations_false.gt.md
@@ -1,9 +1,9 @@
 # Docling Technical Report
 
-In this image we can see a cartoon image of a duck holding a paper.
-
 <!-- image -->
 
+In this image we can see a cartoon image of a duck holding a paper.
+
 Version 1.0
 
 Christoph Auer Maksym Lysak Ahmed Nassar Michele Dolfi Nikolaos Livathinos Panos Vagenas Cesar Berrospi Ramis Matteo Omenetti Fabian Lindlbauer Kasper Dinkla Lokesh Mishra Yusik Kim Shubham Gupta Rafael Teixeira de Lima Valery Weber Lucas Morin Ingmar Meijer Viktor Kuropiatnyk Peter W. J. Staar
@@ -22,8 +22,6 @@ With Docling , we open-source a very capable and efficient document conversion t
 
 torch runtimes backing the Docling pipeline. We will deliver updates on this topic at in a future version of this report.
 
-{'summary': 'Typical Docling setup runtime characterization.', 'type': 'performance data'}
-
 summary: Typical Docling setup runtime characterization.
 type: performance data
 
@@ -35,6 +33,8 @@ Table 1: Runtime characteristics of Docling with the standard model pipeline and
 | Apple M3 Max                     | 4               | 177 s 167 s      | 1.27 1.34        | 6.20 GB          | 103 s 92 s         | 2.18 2.45          | 2.56 GB            |
 | (16 cores) Intel(R) Xeon E5-2690 | 16 4 16         | 375 s 244 s      | 0.60 0.92        | 6.16 GB          | 239 s 143 s        | 0.94 1.57          | 2.42 GB            |
 
+{'summary': 'Typical Docling setup runtime characterization.', 'type': 'performance data'}
+
 ## 5 Applications
 
 Thanks to the high-quality, richly structured document conversion achieved by Docling, its output qualifies for numerous downstream applications. For example, Docling can provide a base for detailed enterprise document search, passage retrieval or classification use-cases, or support knowledge extraction pipelines, allowing specific treatment of different structures in the document, such as tables, figures, section structure or references. For popular generative AI application patterns, such as retrieval-augmented generation (RAG), we provide quackling , an open-source package which capitalizes on Docling's feature-rich document output to enable document-native optimized vector embedding and chunking. It plugs in seamlessly with LLM frameworks such as LlamaIndex [8]. Since Docling is fast, stable and cheap to run, it also makes for an excellent choice to build document-derived datasets. With its powerful table structure recognition, it provides significant benefit to automated knowledge-base construction [11, 10]. Docling is also integrated within the open IBM data prep kit [6], which implements scalable data transforms to build large-scale multi-modal training datasets.
diff --git a/test/data/doc/2408.09869v3_enriched_p1_mark_meta_true.gt.md b/test/data/doc/2408.09869v3_enriched_p1_mark_meta_true.gt.md
@@ -1,9 +1,9 @@
 # Docling Technical Report
 
-[Description] In this image we can see a cartoon image of a duck holding a paper.
-
 <!-- image -->
 
+[Description] In this image we can see a cartoon image of a duck holding a paper.
+
 Version 1.0
 
 Christoph Auer Maksym Lysak Ahmed Nassar Michele Dolfi Nikolaos Livathinos Panos Vagenas Cesar Berrospi Ramis Matteo Omenetti Fabian Lindlbauer Kasper Dinkla Lokesh Mishra Yusik Kim Shubham Gupta Rafael Teixeira de Lima Valery Weber Lucas Morin Ingmar Meijer Viktor Kuropiatnyk Peter W. J. Staar
@@ -22,8 +22,6 @@ With Docling , we open-source a very capable and efficient document conversion t
 
 torch runtimes backing the Docling pipeline. We will deliver updates on this topic at in a future version of this report.
 
-[Docling Legacy Misc] {'summary': 'Typical Docling setup runtime characterization.', 'type': 'performance data'}
-
 summary: Typical Docling setup runtime characterization.
 type: performance data
 
@@ -35,6 +33,8 @@ Table 1: Runtime characteristics of Docling with the standard model pipeline and
 | Apple M3 Max                     | 4               | 177 s 167 s      | 1.27 1.34        | 6.20 GB          | 103 s 92 s         | 2.18 2.45          | 2.56 GB            |
 | (16 cores) Intel(R) Xeon E5-2690 | 16 4 16         | 375 s 244 s      | 0.60 0.92        | 6.16 GB          | 239 s 143 s        | 0.94 1.57          | 2.42 GB            |
 
+[Docling Legacy Misc] {'summary': 'Typical Docling setup runtime characterization.', 'type': 'performance data'}
+
 ## 5 Applications
 
 Thanks to the high-quality, richly structured document conversion achieved by Docling, its output qualifies for numerous downstream applications. For example, Docling can provide a base for detailed enterprise document search, passage retrieval or classification use-cases, or support knowledge extraction pipelines, allowing specific treatment of different structures in the document, such as tables, figures, section structure or references. For popular generative AI application patterns, such as retrieval-augmented generation (RAG), we provide quackling , an open-source package which capitalizes on Docling's feature-rich document output to enable document-native optimized vector embedding and chunking. It plugs in seamlessly with LLM frameworks such as LlamaIndex [8]. Since Docling is fast, stable and cheap to run, it also makes for an excellent choice to build document-derived datasets. With its powerful table structure recognition, it provides significant benefit to automated knowledge-base construction [11, 10]. Docling is also integrated within the open IBM data prep kit [6], which implements scalable data transforms to build large-scale multi-modal training datasets.
diff --git a/test/data/doc/barchart.gt.md b/test/data/doc/barchart.gt.md
@@ -1,5 +1,3 @@
-Bar chart
-
 <!-- image -->
 
 |   Number of impellers |   single-frequency |   multi-frequency |
@@ -10,3 +8,5 @@ Bar chart
 |                     4 |               0.14 |              0.26 |
 |                     5 |               0.16 |              0.25 |
 |                     6 |               0.24 |              0.24 |
+
+Bar chart
diff --git a/test/data/doc/dummy_doc.yaml.md b/test/data/doc/dummy_doc.yaml.md
@@ -1,5 +1,9 @@
 # DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis
 
+Figure 1: Four examples of complex page layouts across different document categories
+
+<!-- image -->
+
 ...
 
 Bar chart
@@ -8,10 +12,6 @@ CC1=NNC(C2=CN3C=CN=C3C(CC3=CC(F)=CC(F)=C3)=N2)=N1
 
 {'myanalysis': {'prediction': 'abc'}, 'something_else': {'text': 'aaa'}}
 
-Figure 1: Four examples of complex page layouts across different document categories
-
-<!-- image -->
-
 A description annotation for this table.
 
 {'foo': 'bar'}
diff --git a/test/data/doc/group_with_metadata_default.md b/test/data/doc/group_with_metadata_default.md
@@ -8,10 +8,10 @@ This is some introductory text.
 
 This section talks about foo.
 
-This paragraph provides more details about foo.
-
 Regarding foo...
 
+This paragraph provides more details about foo.
+
 Here some foo specifics are listed.
 
 1. lorem
diff --git a/test/data/doc/group_with_metadata_marked.md b/test/data/doc/group_with_metadata_marked.md
@@ -8,10 +8,10 @@ This is some introductory text.
 
 [Summary] This section talks about foo.
 
-[Summary] This paragraph provides more details about foo.
-
 Regarding foo...
 
+[Summary] This paragraph provides more details about foo.
+
 [Summary] Here some foo specifics are listed.
 
 1. lorem