Description
Feature Description
Introduce new samples in the Azure AI Document Processing repo that demonstrate how to integrate the new Phi-4 multi-modal model. These samples will cover:
- Data Extraction using Phi-4 with Vision
- Data Extraction combining Azure AI Document Intelligence with Phi-4 (with Vision)
- Classification utilizing Phi-4 with Vision
This enhancement will provide clear, hands-on examples of processing documents using both text and visual data, highlighting the advanced capabilities of the new Phi-4 multi-modal model.
Use Case
This feature would be useful for businesses who are looking to analyze complex documents that contain both textual and visual information, such as multi-page reports, forms with images, or rich media content, using open multi-modal models, like Phi-4.
Motivation
The introduction of the Phi-4 multi-modal model marks a significant advancement in multi-modal capabilities for open, small language models. Demonstrating these capabilities through samples will help developers quickly adopt and implement these techniques, leading to more efficient and insightful document analysis in real-world applications.