You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: extract_thinker/markdown/markdown_converter.py
+21-25
Original file line number
Diff line number
Diff line change
@@ -44,11 +44,21 @@ class MarkdownConverter:
44
44
45
45
Instructions:
46
46
1. Create proper Markdown with headings, lists, formatting etc.
47
+
- Format headings with # syntax (# for main headings, ## for sub-headings, etc.)
48
+
- Format lists with proper bullet points (*, -) or numbers (1., 2., etc.)
49
+
- Apply proper emphasis using **bold**, *italic*, or `code` where appropriate
50
+
- Create proper links using [text](url) format if applicable
51
+
- Format code blocks with triple backticks ``` if applicable
52
+
- Format tables using proper Markdown table syntax if applicable
53
+
- Use block quotes with > where appropriate
47
54
2. Include ALL content from the image in your Markdown output.
48
55
3. After the Markdown section, add a JSON block that follows the above schema.
49
56
4. The JSON should break down the Markdown into logical sections with certainty scores.
50
57
5. Make sure the certainty scores accurately reflect your confidence (1-10).
51
58
6. Make sure you keep things like checkboxes, lists, etc. as is.
59
+
7. IMPORTANT: If you see a placeholder for an image (e.g. [img-1.jpeg], <!-- image -->, etc.), NEVER include the placeholder directly in your output. ALWAYS replace it with the actual content or description of what the image shows.
60
+
Example: Replace [img-1.jpeg] with "Signature of William Smith" or appropriate description of what's in the image.
61
+
8. MAINTAIN ALL THE ORIGINAL CONTENT AND MEANING - do not add or remove information.
52
62
53
63
Your response format should be:
54
64
@@ -77,33 +87,19 @@ class MarkdownConverter:
77
87
Convert the image into well-formatted Markdown content.
78
88
79
89
Instructions:
80
-
1. Create proper Markdown with headings, lists, formatting etc.
81
-
2. Include ALL content from the image in your Markdown output.
82
-
3. Format headings with # syntax (# for main headings, ## for sub-headings, etc.)
83
-
4. Format lists with proper bullet points (*, -) or numbers (1., 2., etc.)
84
-
5. Apply proper emphasis using **bold**, *italic*, or `code` where appropriate
85
-
6. Create proper links using [text](url) format if applicable
86
-
7. Format code blocks with triple backticks ``` if applicable
87
-
8. Format tables using proper Markdown table syntax if applicable
88
-
9. Make sure you keep things like checkboxes, lists, etc. as is.
90
+
Create proper Markdown with headings, lists, formatting etc.
91
+
Include ALL content from the image in your Markdown output.
92
+
Format headings with # syntax (# for main headings, ## for sub-headings, etc.)
93
+
Format lists with proper bullet points (*, -) or numbers (1., 2., etc.)
94
+
Apply proper emphasis using **bold**, *italic*, or `code` where appropriate
95
+
Create proper links using [text](url) format if applicable
96
+
Format code blocks with triple backticks ``` if applicable
97
+
Format tables using proper Markdown table syntax if applicable
98
+
Make sure you keep things like checkboxes, lists, etc. as is.
99
+
IMPORTANT: If you see a placeholder for an image (e.g. [img-1.jpeg], <!-- image -->, etc.), NEVER include the placeholder directly in your output. ALWAYS replace it with the actual content or description of what the image shows.
100
+
Example: Replace [img-1.jpeg] with "Signature of William Smith" or appropriate description of what's in the image.
89
101
90
102
Your response should ONLY include the formatted Markdown content without any additional JSON structure or explanations.
91
-
"""
92
-
93
-
MARKDOWN_VERIFICATION_PROMPT="""
94
-
Look at the image provided and reformat the text content into well-structured Markdown.
95
-
The text content is already accurate, but may lack proper Markdown formatting. Your task is to:
96
-
97
-
1. Format headings with # syntax (# for main headings, ## for sub-headings, etc.)
98
-
2. Format lists with proper bullet points (*, -) or numbers (1., 2., etc.)
99
-
3. Apply proper emphasis using **bold**, *italic*, or `code` where appropriate
100
-
4. Create proper links using [text](url) format
101
-
5. Format code blocks with triple backticks ```
102
-
6. Format tables using proper Markdown table syntax if applicable
103
-
7. Use block quotes with > where appropriate
104
-
8. MAINTAIN ALL THE ORIGINAL CONTENT AND MEANING - do not add or remove information
105
-
106
-
Preserve all the original information while improving its readability through proper Markdown formatting.
0 commit comments