Skip to content

Camelot does not delete temporary folders created during processing till the process exits - changed behavior since 0.11.0 #537

Open
@imbellis

Description

@imbellis

Version 1.0.0 seems to have made a change to delete the temporary folders created only after the process exits.
This causes issues for long running processes as the number of folders keeps increasing and eventually fills up the temp folder.

Looking through the code, this appears to be due to a change in the TemporaryDirectory class in utils.py
While there is no description of the reason why this change was made, perhaps it might be a better solution to have an optional flag controlling this behavior.
I see that reverting to the older version of TemporaryDirectory breaks the code perhaps because the created folder is being referenced elsewhere in the handler class. Adding code to delete the created folders immediately after the output has been returned from parse() and _parse_page() seems to work without breaking.

Steps to reproduce the bug

Standard call to read_pdf. example below

 tables = camelot.read_pdf(filepath=filepath,
                                    pages=str(page_num),
                                    backend="pdfium",
                                    table_regions=[input_region],
                                    flavor=flavor)

**PDF**

<!-- Add the PDF file that you want to extract tables from. -->

Not pdf or data specific behavior

**Screenshots**

<!-- If applicable, add screenshots to help explain your problem. -->

**Environment**

I would expect same behaviour on all OS'es, 
- OS : Was tested on Ubuntu, RHEL and Windows WSL
- Python version: 3.12.3
- Numpy version: 2.1.2
- OpenCV version: 4.10.0
- Ghostscript version: 0.7
- camelot version: 1.0.0

**Additional context**

<!-- Add any other context about the problem here. -->

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions