Split and delete forms by reecebrowne · Pull Request #6277 · Stirling-Tools/Stirling-PDF

reecebrowne · 2026-04-30T13:51:54Z

Delete orphaned forms when removing pages and maintain forms correctly when splitting

…y when splitting

aikido-pr-checks · 2026-04-30T13:52:59Z

-                        log.error("Error closing document", e);
-                    }
+        Set<Integer> keep = new HashSet<>(keepIndices);
+        try (PDDocument doc = pdfDocumentFactory.load(sourceFile)) {


Loading the source PDF (pdfDocumentFactory.load(...)) inside a per-output loop causes repeated expensive PDF parses; reuse a single loaded representation or a lightweight clone instead.

Details

✨ AI Reasoning
The new approach persists the uploaded PDF to disk and then, for each output range, calls the PDF factory to load the source file and then mutates (removePage) the loaded document. Loading a PDDocument from disk (parsing PDF) is an I/O and CPU-heavy operation. When there are many output parts, repeated pdfDocumentFactory.load(...) in a loop scales linearly with the number of parts and duplicates work that could be done once (or batched) by operating on a shared cached representation or by using a lighter-weight copy/cloning strategy. This harms throughput for large splits and high-concurrency requests. The change was introduced in this diff where writeRangeToZip loads the source file per range.

🔧 How do I fix it?
Move constant work outside loops. Use StringBuilder instead of string concatenation in loops. Cache compiled regex patterns. Use hash-based lookups instead of nested loops. Batch database operations instead of N+1 queries.

_{Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.}
_{Reply @AikidoSec ignore: [REASON] to ignore this issue.}
_{More info}

stirlingbot · 2026-04-30T15:51:57Z

🚀 V2 Auto-Deployment Complete!

Your V2 PR with embedded architecture has been deployed!

🔗 Direct Test URL (non-SSL) http://54.175.155.236:6277

🔐 Secure HTTPS URL: https://6277.ssl.stirlingpdf.cloud

This deployment will be automatically cleaned up when the PR is closed.

🔄 Auto-deployed for approved V2 contributors.

ConnorYoh · 2026-04-30T16:26:08Z

+    private Set<COSDictionary> collectLiveWidgetDictionaries(PDDocument document) {
+        Set<COSDictionary> live = new HashSet<>();
+        int pageCount = document.getNumberOfPages();
+        for (int i = 0; i < pageCount; i++) {


Potential place for threaded implementation. If you got a chonky doc that is. May not be worth it

I don't think this is worth doing, used very big docs to test and had no issue nad I don't think the overhead for this is all that significant

ConnorYoh · 2026-04-30T16:28:25Z

+        return live;
+    }
+
+    private List<PDField> pruneFieldList(List<PDField> fields, Set<COSDictionary> liveWidgets) {


Fancy recursion! What happens if the number of fields get really big?

It's on tree depth not field count. This matches how pdf box does it if this fails due to tree depth then we have biger problems

ConnorYoh · 2026-05-01T13:03:39Z

+            group.setPartialName("group");
+
+            PDTextField kept = new PDTextField(acroForm);
+            kept.setPartialName("kept");
+            PDAnnotationWidget keptWidget = new PDAnnotationWidget();
+            keptWidget.setRectangle(new PDRectangle(50, 50, 100, 20));
+            keptWidget.setPage(pageA);
+            kept.setWidgets(List.of(keptWidget));
+            pageA.getAnnotations().add(keptWidget);
+
+            PDTextField dropped = new PDTextField(acroForm);
+            dropped.setPartialName("dropped");


Are these names shown to the user? Does it matter if they are english only?

These are just for tests. Shouldn't have added this file though it can just be moved to formutilstest

ConnorYoh · 2026-05-01T13:05:51Z

+                            continue;
+                        }
+                        if (hasForm) {
+                            writeRangeViaReload(


You have writeRangeViaReload and writeRangeViaSharedSource but you also have writeSplitViaReload/sharedSource in another file. Do they share implemetation that can reduce dupes

Great call out, will sort

Delete orphaned forms when removing pages and maintain forms correctl…

255626f

…y when splitting

reecebrowne requested review from ConnorYoh, Frooodle, Ludy87 and jbrunton96 as code owners April 30, 2026 13:51

dosubot Bot added size:XL This PR changes 500-999 lines ignoring generated files. Bugfix Pull requests that fix bugs labels Apr 30, 2026

stirlingbot Bot added Java Pull requests that update Java code Back End Issues related to back-end development API API-related issues or pull requests Test Testing-related issues or pull requests and removed Bugfix Pull requests that fix bugs labels Apr 30, 2026

aikido-pr-checks Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread app/common/src/main/java/stirling/software/common/util/FormUtils.java

aikido-pr-checks Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread app/core/src/main/java/stirling/software/SPDF/controller/api/SplitPDFController.java Outdated

Better comments

aa32a27

aikido-pr-checks Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread app/core/src/main/java/stirling/software/SPDF/controller/api/RearrangePagesPDFController.java Outdated

improve

605a440

aikido-pr-checks Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread app/core/src/main/java/stirling/software/SPDF/controller/api/SplitPDFController.java

Merge branch 'main' into bug/splitting-forms

cc57c2b

ConnorYoh reviewed May 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split and delete forms#6277

Split and delete forms#6277
reecebrowne wants to merge 4 commits intomainfrom
bug/splitting-forms

reecebrowne commented Apr 30, 2026

Uh oh!

Uh oh!

aikido-pr-checks Bot Apr 30, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stirlingbot Bot commented Apr 30, 2026

Uh oh!

ConnorYoh Apr 30, 2026

Uh oh!

reecebrowne May 1, 2026

Uh oh!

ConnorYoh Apr 30, 2026

Uh oh!

reecebrowne May 1, 2026

Uh oh!

ConnorYoh May 1, 2026

Uh oh!

reecebrowne May 1, 2026

Uh oh!

ConnorYoh May 1, 2026

Uh oh!

reecebrowne May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

reecebrowne commented Apr 30, 2026

Uh oh!

Uh oh!

aikido-pr-checks Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stirlingbot Bot commented Apr 30, 2026

🚀 V2 Auto-Deployment Complete!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants