Commit e764bc5
authored
feat: round numbers to reduce undeterministic behavior (#3740)
This PR rounds the floating point number associated with coordinates in
`pdfminer_processing.py`. This helps to eliminate machine precision
caused randomness in bounding box overlap detection. Currently the
rounding is set to the nearest machine precision for `np.float32` using
`np.finfo(float)`, which yields resolution = `1e-15`.
## future work
We should reduce the rounding to only 6 digits after floating point
since the data type `float32` has a resolution of only `1e-6`. However
it would break tests. A followup is required to tune the threshold
values in `pdfminer_processing.py` so that it works with `1e-6`
resolution.1 parent 3240e3d commit e764bc5
File tree
3 files changed
+23
-12
lines changed- unstructured
- partition/pdf_image
3 files changed
+23
-12
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| 5 | + | |
| 6 | + | |
5 | 7 | | |
6 | 8 | | |
7 | 9 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
| 25 | + | |
24 | 26 | | |
25 | 27 | | |
26 | 28 | | |
| |||
115 | 117 | | |
116 | 118 | | |
117 | 119 | | |
118 | | - | |
| 120 | + | |
119 | 121 | | |
120 | 122 | | |
121 | 123 | | |
122 | 124 | | |
123 | 125 | | |
124 | 126 | | |
125 | 127 | | |
126 | | - | |
| 128 | + | |
127 | 129 | | |
128 | 130 | | |
129 | 131 | | |
130 | | - | |
| 132 | + | |
131 | 133 | | |
132 | 134 | | |
133 | 135 | | |
| |||
139 | 141 | | |
140 | 142 | | |
141 | 143 | | |
142 | | - | |
| 144 | + | |
143 | 145 | | |
144 | 146 | | |
145 | | - | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
146 | 150 | | |
147 | 151 | | |
148 | | - | |
| 152 | + | |
| 153 | + | |
149 | 154 | | |
150 | | - | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
151 | 158 | | |
152 | 159 | | |
153 | 160 | | |
154 | 161 | | |
155 | 162 | | |
156 | 163 | | |
157 | | - | |
| 164 | + | |
158 | 165 | | |
159 | | - | |
| 166 | + | |
160 | 167 | | |
161 | | - | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
162 | 171 | | |
163 | 172 | | |
164 | 173 | | |
| |||
0 commit comments