Skip to content

bug/windows reopen temp file (pdf hi_res) #3076

Closed
@KristianMischke

Description

@KristianMischke

Describe the bug
Same issue as Unstructured-IO/unstructured-inference#303, I couldn't find an equivalent ticket on this project. Temp files run into an issue in Windows when they are opened/closed within the scope of the NamedTemporaryFile()

In line:

with tempfile.NamedTemporaryFile() as tmp_file:

is a temp file created to pass as filename to process_file_with_ocr -> pdf2image.convert_from_path which then invokes pdfinfo on the tempfile yielding an error like

pdf2image.exceptions.PDFPageCountError: Unable to get page count.
I/O Error: Couldn't open file <temp file path here>: No error.

To Reproduce
On Windows

Note: first the issue outlined in Unstructured-IO/unstructured-inference#303 will occur, but once that is fixed (e.g. by applying Unstructured-IO/unstructured-inference#323) it will error on the ocr code as mentioned above

import tempfile

# print operating system name
import os
print(os.name)


# Create a temporary file
with tempfile.NamedTemporaryFile() as tmp_file:
    # Write some data to the file
    tmp_file.write(b'Hello, world!')
    tmp_file.flush()  # Flush the buffer to make sure data is written

    # Get the name of the file
    file_name = tmp_file.name

    # Since the file is closed after the with block, we need to open it again for reading
    with open(file_name, 'r') as file:
        # Read the data from the file
        content = file.read()
        print("Content of the temp file:", content)

Expected behavior
Expected not to error, and to be able to support tempfiles on Windows

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions