Skip to content

[Feature] Add. PdfDoc.saveToTargetPath() for Writing PDF Directly to File Path #87

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

YoungSeok-Choi
Copy link

@YoungSeok-Choi YoungSeok-Choi commented Jan 26, 2025

What?

Currentiy there is only way to save an modified PDF File(PDFDocument.save()) which is creates a entire Copy of PDF size buffer

this PR add a new feature(PDFDocument.saveToTargetPath()) totally same functionality with PDFDocument.save() but reduces memory usage by writing PDF data directly to specific directory path (without copying Unit8Array)

Why?

I created this feature to address high memory usage when working in serverless environments (e.g., AWS Lambda). When processing large PDFs (e.g., 1GB), the current implementation requires at least 2GB of memory, resulting in substantial resource costs.

This PR reduces memory requirements, which is especially important in memory-constrained environments.

For more details, please refer to the related issue I created:

How?

  • PDFDocument.save(): Creates a Uint8Array for the entire size of the modified PDF file and writes this array.
  • PDFDocument.saveToTargetPath(): Writes the modified PDF file directly to a specified path using a Writable stream, avoiding the creation of a large Uint8Array.

Testing?

  • Verifying the input(outputPath) options is valid.
  • Ensuring that the results from PDFDocument.save() and PDFDocument.saveToTargetPath() are identical.

New Dependencies?

No Additional Dependency

Screenshots

Suggested Reading?

Yes

Anything Else?

Nothing!

Checklist

  • I read CONTRIBUTING.md.
  • I read MAINTAINERSHIP.md#pull-requests.
  • I added/updated unit tests for my changes.
  • I added/updated integration tests for my changes.
  • I ran the integration tests.
  • I tested my changes in Node, Deno, and the browser.
  • I viewed documents produced with my changes in Adobe Acrobat, Foxit Reader, Firefox, and Chrome.
  • I added/updated doc comments for any new/modified public APIs.
  • My changes work for both new and existing PDF files.
  • I ran the linter on my changes.

@Sharcoux
Copy link
Collaborator

This lib is cross-platform, including web. How can this work?

@YoungSeok-Choi
Copy link
Author

This lib is cross-platform, including web. How can this work?

Ahh.. right!
i should consider other platforms
i will do about that

thanks for reviewing! 🙏🙏

@YoungSeok-Choi
Copy link
Author

This lib is cross-platform, including web. How can this work?

By the way
The main concept of PdfDoc.saveToTargetPath() is that any kind of modified PDF
can write the fIle to specific directory Path (formed PDF format) regardless platform

currently my PR Only might work only Node.js Environment

@Sharcoux
Copy link
Collaborator

Sharcoux commented Feb 6, 2025

Maybe for other environments we could just write an error in the console explaining that this feature is not available on that environment?

@YoungSeok-Choi
Copy link
Author

YoungSeok-Choi commented Feb 6, 2025

Maybe for other environments we could just write an error in the console explaining that this feature is not available on that environment?

That's an idea!

Fist of all, i would try to handle all other environments include Node.js but, if it's hard to cover
then i will figure it out the way to report an error to non-node.js environments.

let me know you by next weekend @Sharcoux

@YoungSeok-Choi
Copy link
Author

i had some issue of memory usage of this feature.. 😂
beacause of lack of knowledge of stream(node)

give me some time for working on that!

@cjam
Copy link

cjam commented Apr 8, 2025

Would this PR help with adding streaming support to this library? This library is great, but I've ran into a memory issue in a serverless environment since it doesn't support streams it consumes more memory than my serverless function is allocated. In my case, I'm merging pdfs and would love to be able to stream pdfs into a writeable stream and direct it to disk, allowing this library to merge larger documents while maintaining constant-ish memory usage.

As per the feedback from @Sharcoux (WRT to compatbility), perhaps the Web streams api would be a better a more compatible approach which would allow the stream to be redirected to a file in node, or a blob or something in the browser.

I'd be happy to assist, but might be more effective with a pointer as to what portion of the codebase to focus on / high level approach.

@Sharcoux
Copy link
Collaborator

Sharcoux commented Apr 8, 2025

It would not. We read the pdf before parsing it. Then you can write it. Streaming won't help much as long as we need the full data to do the parsing.

@Sharcoux
Copy link
Collaborator

Sharcoux commented Apr 8, 2025

But that PR might still improve memory perfs as highlighted by @YoungSeok-Choi

I agree with the use of web stream api

@cjam
Copy link

cjam commented Apr 8, 2025

@Sharcoux thanks for the quick reply, is there any issue in the backlog for streaming support? I looked in this repo but didn't find anything, I think the original repo had a few things in there backlog / roadmap. Would be great to land have a discussion about approach / feasability.

@Sharcoux
Copy link
Collaborator

It's unlikely that we'll work on this, but feel free to open a PR and start discussing stuff. This is a community package. We'll just monitor the PRs and releases, and try to help when we can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants