You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following is my current thinking about the roadmap. It shouldn't be seen as set in stone, and the time boxing is also tentative and might be a bit over ambitious.
Please give your feedback!
Note About Versioning
Kreuzberg follows a rapid development cycle with regular major version releases. Why? Because major versions (1.x, 2.x etc.) rather than pre-1.0 versioning (0.1.11 etc.) allows us to evolve the library's interfaces confidently while being explicit about breaking changes. And we do and will always follow SemVer.
Current: Version 2.x
Core Functionality
Unified async/sync API for document text extraction
Support for PDF, images, Office documents, and markup formats
OCR capabilities via Tesseract integration
Text extraction and metadata extraction via Pandoc
Efficient batch processing
Version 3.x (Q2 2025)
Extensibility
Architecture Update:
Support for creating and using custom extractors for any file format
Capability to override existing extractors
Pre-processing, validation, and post-processing hooks
Extended metadata extraction
Enhanced Document Structure
Optional Features (available via extra install groups):
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
The following is my current thinking about the roadmap. It shouldn't be seen as set in stone, and the time boxing is also tentative and might be a bit over ambitious.
Please give your feedback!
Note About Versioning
Kreuzberg follows a rapid development cycle with regular major version releases. Why? Because major versions (1.x, 2.x etc.) rather than pre-1.0 versioning (0.1.11 etc.) allows us to evolve the library's interfaces confidently while being explicit about breaking changes. And we do and will always follow SemVer.
Current: Version 2.x
Core Functionality
Version 3.x (Q2 2025)
Extensibility
Architecture Update:
Enhanced Document Structure
Optional Features (available via
extrainstall groups):Possible path forward (TBD):
Version 4.x (Q3 2025)
Model-Based Processing
Optional Vision Model Integration:
Optional Specialized OCR:
Optional Heuristics:
Version 5.x (Q4 2025)
Integration & Ecosystem
Optional Enterprise Integrations:
TBD...
Beta Was this translation helpful? Give feedback.
All reactions