Skip to content

[Feature]: Enhance Samples with Phi-4 Multi-Modal Model for Comprehensive Document Processing #78

@jamesmcroft

Description

@jamesmcroft

Feature Description

Introduce new samples in the Azure AI Document Processing repo that demonstrate how to integrate the new Phi-4 multi-modal model. These samples will cover:

  • Data Extraction using Phi-4 with Vision
  • Data Extraction combining Azure AI Document Intelligence with Phi-4 (with Vision)
  • Classification utilizing Phi-4 with Vision

This enhancement will provide clear, hands-on examples of processing documents using both text and visual data, highlighting the advanced capabilities of the new Phi-4 multi-modal model.

Use Case

This feature would be useful for businesses who are looking to analyze complex documents that contain both textual and visual information, such as multi-page reports, forms with images, or rich media content, using open multi-modal models, like Phi-4.

Motivation

The introduction of the Phi-4 multi-modal model marks a significant advancement in multi-modal capabilities for open, small language models. Demonstrating these capabilities through samples will help developers quickly adopt and implement these techniques, leading to more efficient and insightful document analysis in real-world applications.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions