The PDF Reader API is a Spring Boot-based application that allows users to upload and parse PDF documents, extract text, and process the extracted data using an LLM (Language Model). This API supports standard PDFs and password-protected PDFs, and it integrates OpenAI's GPT-4 Turbo for data extraction.
- 📄 Upload and parse PDF files
- 🔑 Secure PDF parsing with dynamic password generation
- 🧠 Extract structured data using GPT-4 Turbo
- 📂 Parse PDFs stored on disk
- Java 17
- Spring Boot
- Apache PDFBox (for PDF text extraction)
- OpenAI API (for processing extracted text)
- Maven (for dependency management)
git clone <repository-url>
cd pdf-reader-api
Create an application.yml
file inside src/main/resources/
with the following content:
spring:
servlet:
multipart:
enabled: true
max-file-size: 10MB
max-request-size: 20MB
theokanning:
openai:
api-key: YOUR_OPENAI_API_KEY
Replace YOUR_OPENAI_API_KEY
with a valid OpenAI API key.
mvn clean install
mvn spring-boot:run
The application will start at http://localhost:8080
Endpoint:
POST /api/pdf/parse
Request:
file
(Multipart File) → Upload a PDF file
Response:
{
"name": "Krishna Kumar",
"email": "[email protected]",
"opening_balance": 1000,
"closing_balance": 1500
}
Endpoint:
POST /api/pdf/parse-secure
Request:
file
(Multipart File)firstname
(String)dob
(String, format: YYYY-MM-DD)
Response:
{
"name": "Krishna Kumar",
"email": "[email protected]",
"opening_balance": 1000,
"closing_balance": 1500
}
Endpoint:
GET /api/pdf/parse-from-disk
Request:
filename
(String) → Name of the PDF file stored insrc/main/resources/pdf/
Response:
{
"name": "Krishna Kumar",
"email": "[email protected]",
"opening_balance": 1000,
"closing_balance": 1500
}
- Needs slight fix
- Shivam - Initial Development
This project is licensed under the MIT License.