You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: round numbers to reduce undeterministic behavior (#3740)
This PR rounds the floating point number associated with coordinates in
`pdfminer_processing.py`. This helps to eliminate machine precision
caused randomness in bounding box overlap detection. Currently the
rounding is set to the nearest machine precision for `np.float32` using
`np.finfo(float)`, which yields resolution = `1e-15`.
## future work
We should reduce the rounding to only 6 digits after floating point
since the data type `float32` has a resolution of only `1e-6`. However
it would break tests. A followup is required to tune the threshold
values in `pdfminer_processing.py` so that it works with `1e-6`
resolution.
Copy file name to clipboardExpand all lines: CHANGELOG.md
+3-1
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,9 @@
1
-
## 0.16.1-dev4
1
+
## 0.16.1-dev5
2
2
3
3
### Enhancements
4
4
5
+
***Round coordinates** Round coordinates when computing bounding box overlaps in `pdfminer_processing.py` to nearest machine precision. This can help reduce underterministic behavior from machine precision that affects which bounding boxes to combine.
0 commit comments