Skip to content

Conversation

@enoch3712
Copy link
Owner

No description provided.

Copilot AI review requested due to automatic review settings June 9, 2025 09:40
@enoch3712 enoch3712 linked an issue Jun 9, 2025 that may be closed by this pull request
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces advanced configuration support for Azure Document Intelligence by adding new configuration options and tests for advanced feature extraction. The changes include new tests for advanced features in the document loader, updates to the AzureConfig and DocumentLoaderAzureForm implementations to handle a “features” parameter, and refreshed documentation reflecting these updates.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
tests/test_document_loader_azure_document_intelligence.py Added tests to validate advanced features configuration and extraction behavior
extract_thinker/document_loader/document_loader_azure_document_intelligence.py Integrated advanced feature parameters into initialization and processing logic
docs/core-concepts/document-loaders/azure-form.md Updated documentation to include advanced feature usage and removed legacy field references
Comments suppressed due to low confidence (2)

extract_thinker/document_loader/document_loader_azure_document_intelligence.py:368

  • For consistency with other advanced feature checks, consider verifying that the 'languages' feature is enabled in the configuration (e.g. checking if 'languages' is in config.features) before processing language data.
if hasattr(result, 'languages'):

docs/core-concepts/document-loaders/azure-form.md:127

  • Remove legacy field references (e.g. 'key', 'language', 'pages', 'reading_order') from the configuration options table to avoid confusion, ensuring that only current options (like 'subscription_key', 'model_id', 'features', etc.) are documented.
| `endpoint` | str | None | Azure endpoint URL |

assert "content" in pages[0]
assert isinstance(pages[0]["content"], str)
assert len(pages[0]["content"]) > 0
print(f"High-res OCR extracted {len(pages[0]['content'])} characters")
Copy link

Copilot AI Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Avoid using print statements in tests; consider removing them or replacing with a logging mechanism to keep test output clean.

Suggested change
print(f"High-res OCR extracted {len(pages[0]['content'])} characters")
logger.info(f"High-res OCR extracted {len(pages[0]['content'])} characters")

Copilot uses AI. Check for mistakes.
@enoch3712 enoch3712 merged commit db5a590 into main Jun 9, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] AzureDocumentIntelligence as document loader

2 participants