Closed
Description
Describe the bug
Same issue as Unstructured-IO/unstructured-inference#303, I couldn't find an equivalent ticket on this project. Temp files run into an issue in Windows when they are opened/closed within the scope of the NamedTemporaryFile()
In line:
is a temp file created to pass as filename to process_file_with_ocr
-> pdf2image.convert_from_path
which then invokes pdfinfo on the tempfile yielding an error like
pdf2image.exceptions.PDFPageCountError: Unable to get page count.
I/O Error: Couldn't open file <temp file path here>: No error.
To Reproduce
On Windows
Note: first the issue outlined in Unstructured-IO/unstructured-inference#303 will occur, but once that is fixed (e.g. by applying Unstructured-IO/unstructured-inference#323) it will error on the ocr code as mentioned above
import tempfile
# print operating system name
import os
print(os.name)
# Create a temporary file
with tempfile.NamedTemporaryFile() as tmp_file:
# Write some data to the file
tmp_file.write(b'Hello, world!')
tmp_file.flush() # Flush the buffer to make sure data is written
# Get the name of the file
file_name = tmp_file.name
# Since the file is closed after the with block, we need to open it again for reading
with open(file_name, 'r') as file:
# Read the data from the file
content = file.read()
print("Content of the temp file:", content)
Expected behavior
Expected not to error, and to be able to support tempfiles on Windows