Exception reading PDF files in PagePdfDocumentReader

**Bug description**
Parsing a publically available PDF file (https://www.novo-pi.com/ozempic.pdf) results in an exception:
```

Failed to ingest PDF file ozempic-pi.pdf
java.lang.RuntimeException: Failed to ingest PDF file ozempic-pi.pdf
	at com.vodori.platform.ai.advisor.service.DocumentIngestionService.ingestSupportingDocuments(DocumentIngestionService.java:103)
	at com.vodori.platform.ai.advisor.DocumentIngestionTest.setUp(DocumentIngestionTest.java:56)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
Caused by: java.lang.StringIndexOutOfBoundsException: Index 0 out of bounds for length 0
	at java.base/jdk.internal.util.Preconditions$1.apply(Preconditions.java:55)
	at java.base/jdk.internal.util.Preconditions$1.apply(Preconditions.java:52)
	at java.base/jdk.internal.util.Preconditions$4.apply(Preconditions.java:213)
	at java.base/jdk.internal.util.Preconditions$4.apply(Preconditions.java:210)
	at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:98)
	at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:106)
	at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:302)
	at java.base/java.lang.String.checkIndex(String.java:4832)
	at java.base/java.lang.StringLatin1.charAt(StringLatin1.java:46)
	at java.base/java.lang.String.charAt(String.java:1555)
	at org.springframework.ai.reader.pdf.layout.CharacterFactory.getCharacterFromTextPosition(CharacterFactory.java:97)
	at org.springframework.ai.reader.pdf.layout.CharacterFactory.createCharacterFromTextPosition(CharacterFactory.java:46)
	at org.springframework.ai.reader.pdf.layout.ForkPDFLayoutTextStripper.writeLine(ForkPDFLayoutTextStripper.java:114)
	at org.springframework.ai.reader.pdf.layout.ForkPDFLayoutTextStripper.writeTextPositionList(ForkPDFLayoutTextStripper.java:148)
	at org.springframework.ai.reader.pdf.layout.ForkPDFLayoutTextStripper.iterateThroughTextList(ForkPDFLayoutTextStripper.java:136)
	at org.springframework.ai.reader.pdf.layout.ForkPDFLayoutTextStripper.writePage(ForkPDFLayoutTextStripper.java:85)
	at org.springframework.ai.reader.pdf.layout.PDFLayoutTextStripperByArea.writePage(PDFLayoutTextStripperByArea.java:150)
	at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:380)
	at org.springframework.ai.reader.pdf.layout.ForkPDFLayoutTextStripper.processPage(ForkPDFLayoutTextStripper.java:68)
	at org.springframework.ai.reader.pdf.layout.PDFLayoutTextStripperByArea.extractRegions(PDFLayoutTextStripperByArea.java:123)
	at org.springframework.ai.reader.pdf.PagePdfDocumentReader.get(PagePdfDocumentReader.java:141)
	at org.springframework.ai.reader.pdf.PagePdfDocumentReader.get(PagePdfDocumentReader.java:48)
	at org.springframework.ai.document.DocumentReader.read(DocumentReader.java:25)
	at com.vodori.platform.ai.advisor.service.DocumentIngestionService.ingestSupportingDocuments(DocumentIngestionService.java:79)
	... 3 more
```

**Environment**
* Spring AI 1.0.M8
* Spring Boot 3.4.5
* MacOS
* Java 21

**Steps to reproduce**
Pass in the PDF linked above and call (which I pulled from the Spring AI docs)
```
List<Document> pages = new PagePdfDocumentReader(resource,
            PdfDocumentReaderConfig.builder().withPageTopMargin(0).withPageExtractedTextFormatter(ExtractedTextFormatter.builder().withNumberOfTopTextLinesToDelete(0).build())
                .withPagesPerDocument(1).build()).read();
```

**Expected behavior**
List of pages.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exception reading PDF files in PagePdfDocumentReader #3054

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Exception reading PDF files in PagePdfDocumentReader #3054

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions