You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
added auto_download logic to download data runtime (#3883)
- **Add auto-download for NLTK for Python Enviroment** When user import
`tokenize`, It will automatically download nltk data.
- Added `AUTO_DOWNLOAD_NLTK` flag in `tokenize.py` to download
`NLTK_DATA`
Copy file name to clipboardExpand all lines: CHANGELOG.md
+2-2
Original file line number
Diff line number
Diff line change
@@ -1,17 +1,17 @@
1
-
## 0.16.16-dev2
1
+
## 0.16.16
2
2
3
3
### Enhancements
4
4
5
5
### Features
6
6
-**Vectorize layout (inferred, extracted, and OCR) data structure** Using `np.ndarray` to store a group of layout elements or text regions instead of using a list of objects. This improves the memory efficiency and compute speed around layout merging and deduplication.
7
7
8
8
### Fixes
9
+
-**Add auto-download for NLTK for Python Enviroment** When user import tokenize, It will automatic download nltk data from `tokenize.py` file. Added `AUTO_DOWNLOAD_NLTK` flag in `tokenize.py` to download `NLTK_DATA`.
9
10
-**Correctly patch pdfminer to avoid PDF repair**. The patch applied to pdfminer's parser caused it to occasionally split tokens in content streams, throwing `PDFSyntaxError`. Repairing these PDFs sometimes failed (since they were not actually invalid) resulting in unnecessary OCR fallback.
0 commit comments