Skip to content

Commit e464c16

Browse files
authored
perf: upgrade pdfminer-six to 20260107 (#536)
Fixes ~15-18% performance regression introduced in 20251230 where f-strings were evaluated eagerly even when logging was disabled. See: pdfminer/pdfminer.six#1233 Fix: pdfminer/pdfminer.six#1234 <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Addresses a pdf parsing performance issue by updating dependency and aligning versioning. > > - Upgrade `pdfminer-six` to `20260107` in `requirements/base.txt` and constrain it in `requirements/constraints.in` (perf fix) > - Bump version to `0.0.92` in `prepline_general/api/__version__.py` and `preprocessing-pipeline-family.yaml` > - Update `CHANGELOG.md` with the perf fix note > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit cd8a7ef. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->
1 parent 912ccbf commit e464c16

File tree

5 files changed

+10
-4
lines changed

5 files changed

+10
-4
lines changed

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
## 0.0.92
2+
* Upgrade pdfminer-six to 20260107 to fix ~15-18% performance regression from eager f-string evaluation
3+
14
## 0.0.91
25
* Upgrade packages to resolve CVEs
36

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.0.91" # pragma: no cover
1+
__version__ = "0.0.92" # pragma: no cover

preprocessing-pipeline-family.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
name: general
2-
version: 0.0.91
2+
version: 0.0.92

requirements/base.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -216,8 +216,9 @@ pandas==2.3.3
216216
# unstructured-inference
217217
pdf2image==1.17.0
218218
# via unstructured
219-
pdfminer-six==20251230
219+
pdfminer-six==20260107
220220
# via
221+
# -c requirements/constraints.in
221222
# unstructured
222223
# unstructured-inference
223224
pi-heif==1.1.1

requirements/constraints.in

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,6 @@
55
####################################################################################################
66
numpy<2.0.0
77
# later versions of Starlette break middleware
8-
starlette==0.41.2
8+
starlette==0.41.2
9+
# pdfminer.six 20260107 includes performance fix
10+
pdfminer-six==20260107

0 commit comments

Comments
 (0)