Skip to content

Commit a0ebe70

Browse files
committed
Added support for Mistral Document AI
1 parent 337c456 commit a0ebe70

12 files changed

Lines changed: 651 additions & 21 deletions

File tree

.env.template

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,19 @@ AZURE_OPENAI_ENDPOINT=https://your-openai-account.openai.azure.com/
1818
AZURE_OPENAI_KEY=your-openai-api-key
1919
AZURE_OPENAI_MODEL_DEPLOYMENT_NAME=gpt-4
2020

21+
# OCR Provider Configuration
22+
# Choose which OCR provider to use: "azure" or "mistral" (default: azure)
23+
OCR_PROVIDER=azure
24+
25+
# Azure Document Intelligence Configuration (for OCR_PROVIDER=azure)
26+
DOCUMENT_INTELLIGENCE_ENDPOINT=https://your-doc-intelligence.cognitiveservices.azure.com/
27+
28+
# Mistral Document AI Configuration (for OCR_PROVIDER=mistral)
29+
# Only required if OCR_PROVIDER is set to "mistral"
30+
MISTRAL_DOC_AI_ENDPOINT=https://your-endpoint.services.ai.azure.com/providers/mistral/azure/ocr
31+
MISTRAL_DOC_AI_KEY=your-mistral-api-key
32+
MISTRAL_DOC_AI_MODEL=mistral-document-ai-2505
33+
2134
# To get your Principal ID, run:
2235
# az ad signed-in-user show --query id --output tsv
2336

README.md

Lines changed: 75 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ Traditional OCR solutions extract text but miss the context. AI-only approaches
3535

3636
### 🔍 **Intelligent Document Understanding**
3737
- **Hybrid AI Pipeline**: Combines OCR precision with LLM reasoning
38+
- **Multiple OCR Providers**: Azure Document Intelligence or Mistral Document AI
3839
- **Context-Aware Extraction**: Understands relationships between data points
3940
- **Multi-Format Support**: PDFs, images, forms, invoices, medical records
4041
- **Zero-Shot Learning**: Works on new document types without training
@@ -81,9 +82,12 @@ graph TB
8182
8283
subgraph "🧠 AI Processing Engine"
8384
B --> D
84-
D --> E[🔍 Azure Document Intelligence]
85+
D --> E{🔍 OCR Provider}
86+
E -->|Azure| E1[Azure Document Intelligence]
87+
E -->|Mistral| E2[Mistral Document AI]
8588
D --> F[🤖 GPT-4 Vision]
86-
E --> G[⚙️ Hybrid Processing Pipeline]
89+
E1 --> G[⚙️ Hybrid Processing Pipeline]
90+
E2 --> G
8791
F --> G
8892
end
8993
@@ -105,6 +109,8 @@ graph TB
105109
style C fill:#e8f5e8,stroke:#388e3c,stroke-width:2px
106110
style D fill:#fff3e0,stroke:#f57c00,stroke-width:2px
107111
style E fill:#fce4ec,stroke:#c2185b,stroke-width:2px
112+
style E1 fill:#fce4ec,stroke:#c2185b,stroke-width:2px
113+
style E2 fill:#fce4ec,stroke:#c2185b,stroke-width:2px
108114
style F fill:#e0f2f1,stroke:#00695c,stroke-width:2px
109115
style G fill:#fff8e1,stroke:#ffa000,stroke-width:2px
110116
style H fill:#f1f8e9,stroke:#558b2f,stroke-width:2px
@@ -124,7 +130,7 @@ graph TB
124130
| **📱 Frontend UI** | Streamlit (Optional) | Interactive document management interface |
125131
| **📁 Document Storage** | Azure Blob Storage | Secure, scalable document repository |
126132
| **🗄️ Metadata Database** | Azure Cosmos DB | Results, configurations, and analytics |
127-
| **🔍 OCR Engine** | Azure Document Intelligence | Structured text and layout extraction |
133+
| **🔍 OCR Engine** | Azure Document Intelligence or Mistral Document AI | Structured text and layout extraction |
128134
| **🧠 AI Reasoning** | Azure OpenAI (GPT-4 Vision) | Contextual understanding and extraction |
129135
| **🏗️ Container Registry** | Azure Container Registry | Private, secure container images |
130136
| **🔒 Security** | Managed Identity + RBAC | Zero-credential architecture |
@@ -299,7 +305,71 @@ Datasets are managed through the Streamlit frontend interface (deployed automati
299305

300306
---
301307

302-
## 🖥️ Frontend Interface: User-Friendly Document Management
308+
### � OCR Provider Configuration
309+
310+
ARGUS supports **two OCR providers** for document text extraction:
311+
312+
- **Azure Document Intelligence** (Default): Microsoft's enterprise OCR service with advanced layout understanding
313+
- **Mistral Document AI**: Mistral's document processing service with markdown-optimized output
314+
315+
<details>
316+
<summary><b>🔧 Configure OCR Provider</b></summary>
317+
318+
**Via Frontend (Recommended)**:
319+
1. Navigate to **Settings** tab in the web interface
320+
2. Select **OCR Provider** section
321+
3. Choose your provider:
322+
- **Azure**: Uses Azure Document Intelligence (automatically configured during deployment)
323+
- **Mistral**: Requires additional configuration (endpoint, API key, model name)
324+
4. For Mistral, enter:
325+
- **Mistral Endpoint**: Your Mistral Document AI API endpoint URL
326+
- **Mistral API Key**: Your Mistral API authentication key
327+
- **Mistral Model**: Model name (default: `mistral-document-ai-2505`)
328+
5. Click **"Update OCR Provider"** to apply changes
329+
330+
**Via Environment Variables**:
331+
Set the following environment variables in your deployment:
332+
333+
```bash
334+
# Choose OCR provider
335+
OCR_PROVIDER=mistral # or "azure" (default)
336+
337+
# Mistral-specific configuration (only needed if OCR_PROVIDER=mistral)
338+
MISTRAL_DOC_AI_ENDPOINT=https://your-endpoint.services.ai.azure.com/providers/mistral/azure/ocr
339+
MISTRAL_DOC_AI_KEY=your-mistral-api-key
340+
MISTRAL_DOC_AI_MODEL=mistral-document-ai-2505
341+
```
342+
343+
**Update via Azure Portal**:
344+
1. Navigate to Azure Portal → Container Apps → Your Backend App
345+
2. Go to **Settings****Environment variables**
346+
3. Add/update the variables listed above
347+
4. **Restart** the container app
348+
349+
**Update via Azure CLI**:
350+
```bash
351+
# Switch to Mistral
352+
az containerapp update \
353+
--name <your-backend-app-name> \
354+
--resource-group <your-resource-group> \
355+
--set-env-vars \
356+
OCR_PROVIDER="mistral" \
357+
MISTRAL_DOC_AI_ENDPOINT="https://your-endpoint.../ocr" \
358+
MISTRAL_DOC_AI_KEY="your-api-key" \
359+
MISTRAL_DOC_AI_MODEL="mistral-document-ai-2505"
360+
361+
# Switch back to Azure
362+
az containerapp update \
363+
--name <your-backend-app-name> \
364+
--resource-group <your-resource-group> \
365+
--set-env-vars OCR_PROVIDER="azure"
366+
```
367+
368+
**Note**: OCR provider selection is configured at the solution level and applies to all document processing operations.
369+
370+
</details>
371+
372+
---
303373

304374
The Streamlit frontend is **automatically deployed** with `azd up` and provides a user-friendly interface for document management.
305375

@@ -677,7 +747,7 @@ Contributors will be recognized in:
677747
| Resource | Description | Link |
678748
|----------|-------------|------|
679749
| **📚 Documentation** | Complete setup and usage guides | [docs/](docs/) |
680-
| **🐛 Issue Tracker** | Bug reports and feature requests | [GitHub Issues](https://github.com/Azure-Samples/ARGUS/issues) |
750+
| **🐛 Issue Tracker** | Bug reports and feature requests | [GitHub Issues](https://github.com/Azure-Samples/ARGUS/issues) |
681751
| **💡 Discussions** | Community Q&A and ideas | [GitHub Discussions](https://github.com/Azure-Samples/ARGUS/discussions) |
682752
| **📧 Team Contact** | Direct contact for enterprise needs | See team section below |
683753

api_documentation.md

Lines changed: 61 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -154,11 +154,21 @@ Retrieve current system configuration from Cosmos DB including datasets, prompts
154154
"invoice_number": {"type": "string"},
155155
"total_amount": {"type": "number"}
156156
}
157+
},
158+
"processing_options": {
159+
"include_ocr": true,
160+
"include_images": true,
161+
"enable_summary": true,
162+
"enable_evaluation": true,
163+
"ocr_provider": "azure"
157164
}
158165
},
159166
"medical-dataset": {
160167
"system_prompt": "Extract medical information...",
161-
"output_schema": {...}
168+
"output_schema": {...},
169+
"processing_options": {
170+
"ocr_provider": "mistral"
171+
}
162172
}
163173
}
164174
}
@@ -769,12 +779,61 @@ For production file uploads, you need to:
769779
"system_prompt": "string",
770780
"output_schema": "object",
771781
"max_pages": "number",
772-
"options": "object"
782+
"processing_options": {
783+
"include_ocr": "boolean",
784+
"include_images": "boolean",
785+
"enable_summary": "boolean",
786+
"enable_evaluation": "boolean",
787+
"ocr_provider": "string (azure|mistral)"
788+
}
773789
}
774790
}
775791
}
776792
```
777793

794+
### OCR Provider Configuration
795+
796+
ARGUS supports two OCR providers for document text extraction:
797+
798+
1. **Azure Document Intelligence** (default)
799+
- Uses Azure's Document Intelligence service
800+
- Requires `DOCUMENT_INTELLIGENCE_ENDPOINT` environment variable
801+
- Configured with `"ocr_provider": "azure"`
802+
803+
2. **Mistral Document AI** (alternative)
804+
- Uses Mistral's Document AI API
805+
- Requires `MISTRAL_DOC_AI_ENDPOINT` and `MISTRAL_DOC_AI_KEY` environment variables
806+
- Configured with `"ocr_provider": "mistral"`
807+
- Supports base64-encoded PDFs and images
808+
- Can use structured extraction with bbox annotation
809+
810+
**Example Configuration with Mistral:**
811+
```json
812+
{
813+
"id": "configuration",
814+
"partitionKey": "configuration",
815+
"datasets": {
816+
"medical-dataset": {
817+
"system_prompt": "Extract medical information...",
818+
"output_schema": {...},
819+
"processing_options": {
820+
"include_ocr": true,
821+
"include_images": true,
822+
"enable_summary": true,
823+
"enable_evaluation": true,
824+
"ocr_provider": "mistral"
825+
}
826+
}
827+
}
828+
}
829+
```
830+
831+
**Environment Variables Required for Mistral:**
832+
```bash
833+
MISTRAL_DOC_AI_ENDPOINT=https://your-endpoint.services.ai.azure.com/providers/mistral/azure/ocr
834+
MISTRAL_DOC_AI_KEY=your-mistral-api-key
835+
```
836+
778837
### Event Grid Event Model
779838
```json
780839
{
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
{
2+
"Customer Name": "",
3+
"Invoice Number": "",
4+
"Date": "",
5+
"Billing info": {
6+
"Customer": "",
7+
"Customer ID": "",
8+
"Address": "",
9+
"Phone": ""
10+
},
11+
"Payment Due": "",
12+
"Salesperson": "",
13+
"Payment Terms": "",
14+
"Shipping info": {
15+
"Recipient": "",
16+
"Address": "",
17+
"Phone": ""
18+
},
19+
"Delivery Date": "",
20+
"Shipping Method": "",
21+
"Shipping Terms": "",
22+
"Table": {
23+
"Items": [
24+
{
25+
"Qty": "",
26+
"Item#": "",
27+
"Description": "",
28+
"Unit price": "",
29+
"Discount": "",
30+
"Line total": ""
31+
}
32+
],
33+
"Total Discount": "",
34+
"Subtotal": "",
35+
"Sales Tax": "",
36+
"Total": ""
37+
},
38+
"Footer": {
39+
"Customer Name": "",
40+
"Address": "",
41+
"Website": "",
42+
"Phone number": "",
43+
"Fax number": "",
44+
"Email": ""
45+
}
46+
}
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
You are an expert document processing assistant. Your task is to extract structured information from invoices.
2+
3+
Carefully analyze the provided document and extract all relevant information according to the schema provided.
4+
5+
For each field:
6+
- Extract the exact text as it appears in the document
7+
- If a field is not present, leave it as an empty string
8+
- For numerical values, extract them exactly as shown (including currency symbols if present)
9+
- For dates, preserve the original format
10+
- For tables, extract all rows of items
11+
12+
Pay special attention to:
13+
1. Invoice number and date
14+
2. Billing and shipping addresses
15+
3. All line items in the table
16+
4. Total amounts and taxes
17+
5. Contact information in the footer
18+
19+
Be thorough and accurate in your extraction.

frontend/process_files.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -196,20 +196,20 @@ def process_files_tab():
196196

197197
with col_a:
198198
include_ocr = st.checkbox(
199-
"📄 Run OCR and use it in GPT Extraction",
199+
"📄 Run OCR Processing",
200200
value=processing_options.get("include_ocr", True),
201201
help="Extract and analyze the text content from your documents using Optical Character Recognition (OCR). This captures all written information including tables, forms, and structured data. Essential for text-heavy documents like contracts, invoices, and reports. When enabled, the AI can understand and extract information from the document's text content."
202202
)
203203

204204
include_images = st.checkbox(
205-
"🖼️ Split in Images and use them in GPT Extraction",
205+
"🖼️ Run GPT Vision",
206206
value=processing_options.get("include_images", True),
207207
help="Process document pages as images so the AI can visually understand layouts, charts, diagrams, handwritten notes, and visual elements that OCR might miss. This is particularly valuable for forms with checkboxes, complex layouts, signatures, charts, or documents where visual context matters. Combines with OCR for the most comprehensive analysis."
208208
)
209209

210210
# Validation: Ensure at least one of OCR or Images is enabled
211211
if not include_ocr and not include_images:
212-
st.error("⚠️ **Validation Error**: You must enable at least one of 'Include OCR Text' or 'Include Images' for GPT extraction to work properly.")
212+
st.error("⚠️ **Validation Error**: You must enable at least one of 'OCR' or 'GPT Vision' for GPT extraction to work properly.")
213213
# Force at least one to be true
214214
include_ocr = True
215215
st.warning("🔧 **Auto-correction**: Automatically re-enabled 'Include OCR Text' to ensure proper functionality.")

0 commit comments

Comments
 (0)