Skip to content

feat: add zhipu_ocr and remove knowledgebase#212

Open
TNCNBO wants to merge 4 commits intoHKUDS:mainfrom
A985zhougan:main
Open

feat: add zhipu_ocr and remove knowledgebase#212
TNCNBO wants to merge 4 commits intoHKUDS:mainfrom
A985zhougan:main

Conversation

@TNCNBO
Copy link

@TNCNBO TNCNBO commented Mar 9, 2026

Description

A clear and concise description of the changes.

Related Issues

  • Closes #...
  • Related to #...

Module(s) Affected

  • agents
  • api
  • config
  • core
  • knowledge
  • logging
  • services
  • tools
  • utils
  • web (Frontend)
  • docs (Documentation)
  • scripts
  • tests
  • Other: ...

Checklist

  • I have read and followed the contribution guidelines.
  • My code follows the project's coding standards.
  • I have run pre-commit run --all-files and fixed any issues.
  • I have added relevant tests for my changes.
  • I have updated the documentation (if necessary).
  • My changes do not introduce any new security vulnerabilities.

Additional Notes

Add any other context or screenshots about the pull request here.

A985zhougan and others added 4 commits March 2, 2026 22:24
…ntext handling

- Remove RelevanceAnalyzer agent and consolidate workflow to two core agents
- Rename knowledge_context parameter to context for broader applicability
- Make context parameter optional in GenerateAgent.process() with sensible default
- Update agent documentation to reflect simplified responsibilities
- Relax numpy version constraint in requirements.txt for better compatibility
- Add comprehensive test suite for question generation and mimic functionality
- Include sample exam PDF for testing question extraction and generation
- Simplify prompt templates to use generic "Context" instead of "Knowledge"
- Update all agent imports and exports to remove RelevanceAnalyzer references
…on generation pipeline

- Add Zhipu GLM-OCR PDF parser implementation with layout analysis and image extraction
- Introduce configurable PDF parser selection (zhipu/mineru) in question config
- Increase max parallel questions from 1 to 50 for improved throughput
- Extend LLM request timeouts from 120s to 600s for longer-running operations
- Add comprehensive test suite for PDF parsing, OCR, extraction, and full pipeline flows
- Update exam mimic workflow to dynamically select parser based on configuration
- Enhance API responses with parser type information in status messages
- Export new Zhipu parser from question tools module
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants