What's New?
Check out What's New for the full scoop.
Quick Summary
- Unified API β Consistent method names across all chunkers (
chunk_text, chunk_file, chunk_texts, chunk_files)
- PlainTextChunker merged into DocumentChunker β Handle both text and documents with one class
- SentenceSplitter rename β
split() renamed to split_text(), also added split_file()
- Shorter CLI flags β
-l for --lang, -h for --host, -m for --metadata, -t for --tokenizer-timeout
- Visualizer overhaul β Fullscreen mode, 3-row layout, smoother hovers
- Code chunking improvements β Fixed comment artifacts, added string protection
- More code languages β ColdFusion, VB.NET, PHP 8 attributes, Pascal support
- Dependency fixes β No more
pkg_resources headaches
- Direct imports β Now you can do
from chunklet import DocumentChunker without performance issues
- Test coverage β From 87% to
90.67%
Install
# Upgrade to latest
pip install chunklet-py -U
# Or install a specific version
pip install chunklet-py==2.2.0
Migration
Upgrading from v2.1.x? Here's what changed:
| Old |
New |
chunker.chunk() |
chunker.chunk_text() or chunker.chunk_file() |
chunker.batch_chunk() |
chunker.chunk_texts() or chunker.chunk_files() |
splitter.split() |
splitter.split_text() |
The old methods still work β they'll just yell at you with a deprecation warning.
Full Changelog
Everything else is in the changelog.
What's New?
Check out What's New for the full scoop.
Quick Summary
chunk_text,chunk_file,chunk_texts,chunk_files)split()renamed tosplit_text(), also addedsplit_file()-lfor--lang,-hfor--host,-mfor--metadata,-tfor--tokenizer-timeoutpkg_resourcesheadachesfrom chunklet import DocumentChunkerwithout performance issues90.67%Install
Migration
Upgrading from v2.1.x? Here's what changed:
chunker.chunk()chunker.chunk_text()orchunker.chunk_file()chunker.batch_chunk()chunker.chunk_texts()orchunker.chunk_files()splitter.split()splitter.split_text()The old methods still work β they'll just yell at you with a deprecation warning.
Full Changelog
Everything else is in the changelog.