Commit fd0cc54
authored
Fix: sort elements extracted by
### Summary
- sort elements extracted by `pdfminer` to get consistent results from
`aggregate_by_block()`
### Testing
PDF:
[recalibrating-risk-report_4-4.pdf](https://github.com/Unstructured-IO/unstructured-inference/files/12835342/recalibrating-risk-report_4-4.pdf)
```
f_path = "dist/docs/recalibrating-risk-report_4-4.pdf"
layout = process_file_with_model(
filename=f_path,
model_name=None,
)
elements = layout.pages[0].elements
print("\n\n".join([str(el) for el in elements]))
print(len(elements))
```pdfminer (#244)1 parent 66fb179 commit fd0cc54
File tree
3 files changed
+7
-1
lines changed- unstructured_inference
- inference
3 files changed
+7
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
1 | 5 | | |
2 | 6 | | |
3 | 7 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
570 | 570 | | |
571 | 571 | | |
572 | 572 | | |
| 573 | + | |
| 574 | + | |
573 | 575 | | |
574 | 576 | | |
575 | 577 | | |
| |||
0 commit comments