PDF Reader API

Overview

The PDF Reader API is a Spring Boot-based application that allows users to upload and parse PDF documents, extract text, and process the extracted data using an LLM (Language Model). This API supports standard PDFs and password-protected PDFs, and it integrates OpenAI's GPT-4 Turbo for data extraction.

Features

📄 Upload and parse PDF files
🔑 Secure PDF parsing with dynamic password generation
🧠 Extract structured data using GPT-4 Turbo
📂 Parse PDFs stored on disk

Technologies Used

Java 17
Spring Boot
Apache PDFBox (for PDF text extraction)
OpenAI API (for processing extracted text)
Maven (for dependency management)

Installation and Setup

1️⃣ Clone the Repository

 git clone <repository-url>
 cd pdf-reader-api

2️⃣ Configure Application Properties

Create an application.yml file inside src/main/resources/ with the following content:

spring:
  servlet:
    multipart:
      enabled: true
      max-file-size: 10MB
      max-request-size: 20MB

theokanning:
  openai:
    api-key: YOUR_OPENAI_API_KEY

Replace YOUR_OPENAI_API_KEY with a valid OpenAI API key.

3️⃣ Build and Run the Application

mvn clean install
mvn spring-boot:run

The application will start at http://localhost:8080

API Endpoints

🟢 Upload & Parse PDF

Endpoint:

POST /api/pdf/parse

Request:

file (Multipart File) → Upload a PDF file

Response:

{
  "name": "Krishna Kumar",
  "email": "[email protected]",
  "opening_balance": 1000,
  "closing_balance": 1500
}

🔐 Secure PDF Parsing (Password Required)

Endpoint:

POST /api/pdf/parse-secure

Request:

file (Multipart File)
firstname (String)
dob (String, format: YYYY-MM-DD)

Response:

{
  "name": "Krishna Kumar",
  "email": "[email protected]",
  "opening_balance": 1000,
  "closing_balance": 1500
}

📂 Parse PDF from Disk

Endpoint:

GET /api/pdf/parse-from-disk

Request:

filename (String) → Name of the PDF file stored in src/main/resources/pdf/

Response:

{
  "name": "Krishna Kumar",
  "email": "[email protected]",
  "opening_balance": 1000,
  "closing_balance": 1500
}

Possible Issues and Fixes

It is giving the data in timeline from the pdf file but still showing error in postman

Needs slight fix

Contributors

Shivam - Initial Development

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
pdf-reader-api		pdf-reader-api
src/main/java/com/shivam/pdfreader		src/main/java/com/shivam/pdfreader
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF Reader API

Overview

Features

Technologies Used

Installation and Setup

1️⃣ Clone the Repository

2️⃣ Configure Application Properties

3️⃣ Build and Run the Application

API Endpoints

🟢 Upload & Parse PDF

🔐 Secure PDF Parsing (Password Required)

📂 Parse PDF from Disk

Possible Issues and Fixes

It is giving the data in timeline from the pdf file but still showing error in postman

Contributors

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Imshivam1/pdf-reader-api

Folders and files

Latest commit

History

Repository files navigation

PDF Reader API

Overview

Features

Technologies Used

Installation and Setup

1️⃣ Clone the Repository

2️⃣ Configure Application Properties

3️⃣ Build and Run the Application

API Endpoints

🟢 Upload & Parse PDF

🔐 Secure PDF Parsing (Password Required)

📂 Parse PDF from Disk

Possible Issues and Fixes

It is giving the data in timeline from the pdf file but still showing error in postman

Contributors

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages