Skip to content

Conversation

@eora21
Copy link
Collaborator

@eora21 eora21 commented Jan 8, 2026

Description:

  • Extract common utility modules (constants.py, file_utils.py) for reusable file operations and shared constants
  • Refactor UpstageUniversalInformationExtraction tool: improve docstrings (Google Style), add generate_schema() method for schema generation, use common utilities, enhance error handling, and align supported file extensions with API documentation
  • Refactor UpstagePrebuiltInformationExtraction tool: replace ChatUpstage with direct httpx HTTP calls for multipart/form-data requests to align with Prebuilt API requirements, use common utilities, improve docstrings, and enhance error handling
  • Remove duplicate get_from_param_or_env function from document_parse.py and document_parse_parsers.py, now importing from utils.value_retriever
  • Export Information Extraction Tools in __init__.py for public API
  • Refactor test files: move common fixtures to conftest.py, update to use tool.invoke() instead of deprecated tool.extract(), and update mock targets
  • Remove unused information_extraction_check.py module (functionality moved to file_utils.py)
  • Reorganize file structure: move tools to tools/ directory and utilities to utils/ directory

Dependencies: N/A

Twitter handle: N/A

@eora21 eora21 self-assigned this Jan 8, 2026
@eora21 eora21 force-pushed the refactor/information-extraction-tools branch from 3d2fb4a to f539042 Compare January 8, 2026 04:58
eora21 added 9 commits January 8, 2026 14:00
- Add constants.py for shared constants (KILOBYTE, MEGABYTE)
- Add file_utils.py for reusable file-to-base64 conversion
- Extract common file validation and encoding logic
- Use common utilities (constants, file_utils)
- Improve docstrings to Google Style
- Add generate_schema() method for schema generation
- Enhance error handling
- Remove hwp/hwpx from supported extensions per API docs
- Move common fixtures to conftest.py
- Update to use tool.invoke() instead of tool.extract()
- Update mock targets to ChatUpstage
- Remove duplicate fixtures
- Export UpstagePrebuiltInformationExtraction
- Export UpstageUniversalInformationExtraction
- Import from utils.value_retriever instead of duplicating
- Maintain backward compatibility
- Replace ChatUpstage with httpx for multipart/form-data requests
- Use common utilities (constants)
- Improve docstrings to Google Style
- Enhance error handling
- Align with Prebuilt API requirements (multipart/form-data only)
- Functionality moved to file_utils.py
- No longer referenced in codebase
- Fix line length violations (E501) by breaking long lines
- Fix mypy type errors for HumanMessage content and json.loads
- Remove hardcoded API key from universal_information_extraction.py
- Add type casts and type checks for proper type safety
- Add mypy cache directories to .gitignore
@eora21 eora21 force-pushed the refactor/information-extraction-tools branch from f539042 to 03486bb Compare January 8, 2026 05:02
eora21 added 4 commits January 8, 2026 14:02
- Fix PrebuiltInformationExtraction: Update mock strategy from ChatUpstage to httpx.Client to match actual implementation
- Fix UniversalInformationExtraction: Replace ChatUpstage patch with direct api_wrapper assignment to avoid model_validator issues
- Fix API key handling in PrebuiltInformationExtraction.__init__ to properly handle explicit api_key parameter
- Ensure upstage_api_key is explicitly set after super().__init__ to prevent BaseTool from not preserving it

All tests now pass (289 passed, 1 skipped)
- Format long function signatures to single line
- Remove unused 'patch' import from test_universal_information_extraction
- Apply ruff auto-fixes
@eora21 eora21 requested a review from inahjeon January 8, 2026 06:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants