Skip to content

Commit b49e353

Browse files
committed
fix(markdown): serialize hyperlink as code always with single backticks
Signed-off-by: Cesar Berrospi Ramis <[email protected]>
1 parent 621f602 commit b49e353

File tree

5 files changed

+75
-6
lines changed

5 files changed

+75
-6
lines changed

docling_core/transforms/serializer/markdown.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -211,7 +211,9 @@ def serialize(
211211
text_part = f"{num_hashes * '#'} {text}"
212212
elif isinstance(item, CodeItem):
213213
if params.format_code_blocks:
214-
text_part = f"`{text}`" if is_inline_scope else f"```\n{text}\n```"
214+
# inline items and all hyperlinks: use single backticks
215+
bt = is_inline_scope or (params.include_hyperlinks and item.hyperlink)
216+
text_part = f"`{text}`" if bt else f"```\n{text}\n```"
215217
else:
216218
text_part = text
217219
escape_html = False

test/data/doc/inline_and_formatting.gt.dt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,5 +43,9 @@
4343
<formula>E=mc^2</formula>
4444
<text>& ampersand</text>
4545
</inline></section_header_level_1>
46+
<inline><text>A hyperlink on</text>
47+
<code><_unknown_>code in a line</code>
48+
</inline>
49+
<code><_unknown_>A hyperlink on code as paragraph</code>
4650
<text>The end.</text>
4751
</doctag>

test/data/doc/inline_and_formatting.gt.html

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -150,6 +150,8 @@ <h1>Contribution guideline example</h1>
150150
<em><h1>Whole heading is italic</h1></em>
151151
<span class='inline-group'>Some <em><code>formatted_code</code></em></span>
152152
<h2><span class='inline-group'><em>Partially formatted</em> heading to_escape <code>not_to_escape</code> <a href="https://en.wikipedia.org/wiki/Albert_Einstein"><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><mrow><mi>E</mi><mo>&#x0003D;</mo><mi>m</mi><msup><mi>c</mi><mn>2</mn></msup></mrow><annotation encoding="TeX">E=mc^2</annotation></math></a> &amp; ampersand</span></h2>
153+
<span class='inline-group'>A hyperlink on <a href="#link"><code>code in a line</code></a></span>
154+
<a href="#test"><pre><code>A hyperlink on code as paragraph</code></pre></a>
153155
<p>The end.</p>
154156
</div>
155157
</body>

test/data/doc/inline_and_formatting.gt.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,4 +20,8 @@ Some *`formatted_code`*
2020

2121
## *Partially formatted* heading to\_escape `not_to_escape` [$E=mc^2$](https://en.wikipedia.org/wiki/Albert_Einstein) &amp; ampersand
2222

23+
A hyperlink on [`code in a line`](#link)
24+
25+
[`A hyperlink on code as paragraph`](#test)
26+
2327
The end.

test/data/doc/inline_and_formatting.yaml

Lines changed: 62 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,9 @@ body:
88
- $ref: '#/texts/32'
99
- $ref: '#/groups/8'
1010
- $ref: '#/texts/35'
11-
- $ref: '#/texts/41'
11+
- $ref: '#/groups/10'
12+
- $ref: '#/texts/43'
13+
- $ref: '#/texts/44'
1214
content_layer: body
1315
label: unspecified
1416
name: _root_
@@ -52,7 +54,7 @@ groups:
5254
- $ref: '#/texts/27'
5355
- $ref: '#/texts/28'
5456
content_layer: body
55-
label: ordered_list
57+
label: list
5658
name: list
5759
parent:
5860
$ref: '#/body'
@@ -128,10 +130,19 @@ groups:
128130
parent:
129131
$ref: '#/texts/35'
130132
self_ref: '#/groups/9'
133+
- children:
134+
- $ref: '#/texts/41'
135+
- $ref: '#/texts/42'
136+
content_layer: body
137+
label: inline
138+
name: group
139+
parent:
140+
$ref: '#/body'
141+
self_ref: '#/groups/10'
131142
key_value_items: []
132143
name: inline_and_formatting
133144
origin:
134-
binary_hash: 16409076955457599155
145+
binary_hash: 9005144896041945842
135146
filename: inline_and_formatting.md
136147
mimetype: text/markdown
137148
pages: {}
@@ -171,6 +182,7 @@ texts:
171182
formatting:
172183
bold: false
173184
italic: true
185+
script: baseline
174186
strikethrough: false
175187
underline: false
176188
label: text
@@ -185,6 +197,7 @@ texts:
185197
formatting:
186198
bold: true
187199
italic: false
200+
script: baseline
188201
strikethrough: false
189202
underline: false
190203
label: text
@@ -199,6 +212,7 @@ texts:
199212
formatting:
200213
bold: true
201214
italic: true
215+
script: baseline
202216
strikethrough: false
203217
underline: false
204218
label: text
@@ -274,6 +288,7 @@ texts:
274288
formatting:
275289
bold: true
276290
italic: false
291+
script: baseline
277292
strikethrough: false
278293
underline: false
279294
hyperlink: https://github.com/docling-project/docling
@@ -439,6 +454,7 @@ texts:
439454
formatting:
440455
bold: true
441456
italic: false
457+
script: baseline
442458
strikethrough: false
443459
underline: false
444460
label: list_item
@@ -475,6 +491,7 @@ texts:
475491
formatting:
476492
bold: false
477493
italic: true
494+
script: baseline
478495
strikethrough: false
479496
underline: false
480497
label: text
@@ -498,6 +515,7 @@ texts:
498515
formatting:
499516
bold: false
500517
italic: true
518+
script: baseline
501519
strikethrough: false
502520
underline: false
503521
label: title
@@ -524,6 +542,7 @@ texts:
524542
formatting:
525543
bold: false
526544
italic: true
545+
script: baseline
527546
strikethrough: false
528547
underline: false
529548
label: code
@@ -550,6 +569,7 @@ texts:
550569
formatting:
551570
bold: false
552571
italic: true
572+
script: baseline
553573
strikethrough: false
554574
underline: false
555575
label: text
@@ -600,13 +620,50 @@ texts:
600620
prov: []
601621
self_ref: '#/texts/40'
602622
text: '& ampersand'
623+
- children: []
624+
content_layer: body
625+
label: text
626+
orig: A hyperlink on
627+
parent:
628+
$ref: '#/groups/10'
629+
prov: []
630+
self_ref: '#/texts/41'
631+
text: A hyperlink on
632+
- captions: []
633+
children: []
634+
code_language: unknown
635+
content_layer: body
636+
footnotes: []
637+
hyperlink: '#link'
638+
label: code
639+
orig: code in a line
640+
parent:
641+
$ref: '#/groups/10'
642+
prov: []
643+
references: []
644+
self_ref: '#/texts/42'
645+
text: code in a line
646+
- captions: []
647+
children: []
648+
code_language: unknown
649+
content_layer: body
650+
footnotes: []
651+
hyperlink: '#test'
652+
label: code
653+
orig: A hyperlink on code as paragraph
654+
parent:
655+
$ref: '#/body'
656+
prov: []
657+
references: []
658+
self_ref: '#/texts/43'
659+
text: A hyperlink on code as paragraph
603660
- children: []
604661
content_layer: body
605662
label: text
606663
orig: The end.
607664
parent:
608665
$ref: '#/body'
609666
prov: []
610-
self_ref: '#/texts/41'
667+
self_ref: '#/texts/44'
611668
text: The end.
612-
version: 1.3.0
669+
version: 1.8.0

0 commit comments

Comments
 (0)