Skip to content

Commit 5f7de27

Browse files
Merge pull request #170 from pandora-s-git/main
OCR Update
2 parents ab0b36d + 0077d9f commit 5f7de27

File tree

5 files changed

+2051
-312
lines changed

5 files changed

+2051
-312
lines changed

mistral/ocr/batch_ocr.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
"### Used\n",
2424
"\n",
2525
"- OCR\n",
26-
"- Batch Inference"
26+
"- Batch Inference\n"
2727
]
2828
},
2929
{
@@ -571,4 +571,4 @@
571571
},
572572
"nbformat": 4,
573573
"nbformat_minor": 0
574-
}
574+
}

mistral/ocr/data_extraction.ipynb

Lines changed: 1558 additions & 0 deletions
Large diffs are not rendered by default.

mistral/ocr/document_understanding.ipynb

Lines changed: 103 additions & 295 deletions
Large diffs are not rendered by default.

mistral/ocr/structured_ocr.ipynb

Lines changed: 19 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -10,16 +10,20 @@
1010
"\n",
1111
"---\n",
1212
"\n",
13-
"## OCR Exploration and Structured Outputs\n",
14-
"In this cookbook, we will explore the basics of OCR and leverage it together with existing models to achieve structured outputs fueled by our OCR model.\n",
13+
"## OCR Exploration and Simple Structured Outputs (Deprecated)\n",
14+
"In this cookbook, we will explore the basics of OCR and leverage it together with existing models to achieve structured outputs fueled by our OCR model (we recommend using the new Annotations feature instead for better results).\n",
1515
"\n",
1616
"You may want to do this in case current vision models are not powerful enough, hence enhancing their vision OCR capabilities with the OCR model to achieve better structured data extraction.\n",
1717
"\n",
1818
"---\n",
1919
"\n",
2020
"### Model Used\n",
2121
"- Mistral OCR\n",
22-
"- Pixtral 12B & Ministral 8B\n"
22+
"- Pixtral 12B & Ministral 8B\n",
23+
"\n",
24+
"---\n",
25+
"\n",
26+
"**For a more up to date guide on structured outputs visit our [Annotations cookbook](https://github.com/mistralai/cookbook/blob/main/mistral/ocr/data_extraction.ipynb) on Data Extraction.**\n"
2327
]
2428
},
2529
{
@@ -35,7 +39,7 @@
3539
},
3640
{
3741
"cell_type": "code",
38-
"execution_count": 1,
42+
"execution_count": null,
3943
"metadata": {
4044
"id": "po7Cukllt8za"
4145
},
@@ -56,7 +60,7 @@
5660
},
5761
{
5862
"cell_type": "code",
59-
"execution_count": 2,
63+
"execution_count": null,
6064
"metadata": {
6165
"id": "MtKgrASwF3Ol"
6266
},
@@ -80,7 +84,7 @@
8084
},
8185
{
8286
"cell_type": "code",
83-
"execution_count": 3,
87+
"execution_count": null,
8488
"metadata": {
8589
"id": "odfkuCk6qSAw"
8690
},
@@ -108,7 +112,7 @@
108112
},
109113
{
110114
"cell_type": "code",
111-
"execution_count": 4,
115+
"execution_count": null,
112116
"metadata": {
113117
"colab": {
114118
"base_uri": "https://localhost:8080/"
@@ -175,7 +179,7 @@
175179
},
176180
{
177181
"cell_type": "code",
178-
"execution_count": 5,
182+
"execution_count": null,
179183
"metadata": {
180184
"colab": {
181185
"base_uri": "https://localhost:8080/",
@@ -255,7 +259,7 @@
255259
},
256260
{
257261
"cell_type": "code",
258-
"execution_count": 6,
262+
"execution_count": null,
259263
"metadata": {
260264
"colab": {
261265
"base_uri": "https://localhost:8080/"
@@ -328,7 +332,7 @@
328332
},
329333
{
330334
"cell_type": "code",
331-
"execution_count": 7,
335+
"execution_count": null,
332336
"metadata": {
333337
"colab": {
334338
"base_uri": "https://localhost:8080/"
@@ -456,7 +460,7 @@
456460
"id": "1m19STu2DDfI",
457461
"outputId": "06f99dfe-b697-4d82-bf20-0fa60435d47f"
458462
},
459-
"execution_count": 8,
463+
"execution_count": null,
460464
"outputs": [
461465
{
462466
"output_type": "stream",
@@ -503,7 +507,7 @@
503507
},
504508
{
505509
"cell_type": "code",
506-
"execution_count": 9,
510+
"execution_count": null,
507511
"metadata": {
508512
"id": "oM2ensmIwh4H"
509513
},
@@ -584,7 +588,7 @@
584588
},
585589
{
586590
"cell_type": "code",
587-
"execution_count": 10,
591+
"execution_count": null,
588592
"metadata": {
589593
"colab": {
590594
"base_uri": "https://localhost:8080/"
@@ -656,7 +660,7 @@
656660
},
657661
{
658662
"cell_type": "code",
659-
"execution_count": 11,
663+
"execution_count": null,
660664
"metadata": {
661665
"colab": {
662666
"base_uri": "https://localhost:8080/",
@@ -709,4 +713,4 @@
709713
},
710714
"nbformat": 4,
711715
"nbformat_minor": 0
712-
}
716+
}

mistral/ocr/tool_usage.ipynb

Lines changed: 369 additions & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)