Skip to content

Fix pdf_combine() "too many open files" error#32

Open
trevorld wants to merge 2 commits intoropensci:masterfrom
trevorld:too-many-open-files
Open

Fix pdf_combine() "too many open files" error#32
trevorld wants to merge 2 commits intoropensci:masterfrom
trevorld:too-many-open-files

Conversation

@trevorld
Copy link
Contributor

  • Use ClosedFileInputSource when combining PDFs so files are opened/closed on demand rather than held open throughout the write phase.
  • Also fixed an inner loop variable shadowing (i → j)

closes #21

@trevorld trevorld changed the title Fix pdf_combine() "too many open files" error Fix pdf_combine() "too many open files" error Mar 12, 2026
@jeroen
Copy link
Member

jeroen commented Mar 12, 2026

Is there any reason this needs a new function, instead of changing the existing read_pdf_with_password?

@trevorld
Copy link
Contributor Author

Is there any reason this needs a new function, instead of changing the existing read_pdf_with_password?

Do you mean:

  1. Should we always use ClosedFileInputSource?

or

  1. Should we add an extra argument to read_pdf_with_password() to switch between the two types of objects?

w.r.t. 1) in #21 (comment) it sounded like the ClosedFileInputSource get closed more aggressively and we might need to exert effort to keep them open in other functions

w.r.t. 2) I suppose we could do this if you'd like. The big difference between the two is one uses

pdf->processFile(infile, value.get_cstring());

and the other uses

auto cis = std::make_shared<ClosedFileInputSource>(infile);
pdf->processInputSource(cis, value.get_cstring());

so we could use an ifelse statement a couple of times in the code.

@trevorld trevorld force-pushed the too-many-open-files branch from 6476ce5 to bece410 Compare March 13, 2026 04:41
* Use ClosedFileInputSource when combining PDFs so files are opened/closed
  on demand rather than held open throughout the write phase.
* Also fixed an inner loop variable shadowing (i → j)

closes ropensci#21

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@trevorld trevorld force-pushed the too-many-open-files branch from bece410 to 39accf3 Compare March 13, 2026 05:04
@trevorld
Copy link
Contributor Author

Is there any reason this needs a new function, instead of changing the existing read_pdf_with_password?

  • I've added a use_cfis flag to read_pdf_with_password(). If true we use ClosedFileInputSource
  • I've also put a skip() before the reproducible example since on my machine it takes a few minutes and creates a 455Mb temporary file. I've added a simple 2 file case before the skip which is much faster.

@trevorld
Copy link
Contributor Author

Although I still skip() it I changed the reproducible "too many files" example in the new tests to instead combine 10,000 blank pdf files produced by

grDevices::pdf(blank_file)
grid::grid.newpage()
invisible(grDevices::dev.off())

Although on my machine it still takes 40 seconds and produces a 30Mb temporary file this is an order of magnitude better than combining pdf-example-password.original.pdf 10,000 times.

@trevorld trevorld force-pushed the too-many-open-files branch from c57be1e to 2033417 Compare March 13, 2026 17:36
This seems to be an order faster and produces an order
smaller temporary output file than using `pdf-example-password.original.pdf`
@trevorld trevorld force-pushed the too-many-open-files branch from 2033417 to 5feb713 Compare March 13, 2026 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error: too many open files

2 participants