You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: parsing pdf error - new_cells as str has no "copy" (#3130)
Closes#3119.
### Testing
Parsing the provided PDF should be successful.
[testing_brochure_2.pdf](https://github.com/user-attachments/files/15518094/testing_brochure_2.pdf)
```
filename = "testing_brochure_2.pdf"
with open(filename, "rb") as pdf_content:
elements = partition_pdf(
file=pdf_content,
infer_table_structure=True,
extract_image_block_types=["Image", "Table"],
chunking_strategy="by_title",
max_characters=1000,
new_after_n_chars=3000,
combine_text_under_n_chars=1000,
)
print("\n\n".join([str(el) for el in elements]))
```
Copy file name to clipboardExpand all lines: CHANGELOG.md
+2-1
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
## 0.14.4-dev6
1
+
## 0.14.4
2
2
3
3
### Enhancements
4
4
@@ -12,6 +12,7 @@
12
12
13
13
### Fixes
14
14
15
+
***Address the issue of unrecognized tables in `UnstructuredTableTransformerModel`** When a table is not recognized, the `element.metadata.text_as_html` attribute is set to an empty string.
15
16
***Remove root handlers in ingest logger**. Removes root handlers in ingest loggers to ensure secrets aren't accidentally exposed in Colab notebooks.
16
17
***Fix V2 S3 Destination Connector authentication** Fixes bugs with S3 Destination Connector where the connection config was neither registered nor properly deserialized.
17
18
***Clarified dependence on particular version of `python-docx`** Pinned `python-docx` version to ensure a particular method `unstructured` uses is included.
0 commit comments