Skip to content

Imshivam1/pdf-reader-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PDF Reader API

Overview

The PDF Reader API is a Spring Boot-based application that allows users to upload and parse PDF documents, extract text, and process the extracted data using an LLM (Language Model). This API supports standard PDFs and password-protected PDFs, and it integrates OpenAI's GPT-4 Turbo for data extraction.

Features

  • 📄 Upload and parse PDF files
  • 🔑 Secure PDF parsing with dynamic password generation
  • 🧠 Extract structured data using GPT-4 Turbo
  • 📂 Parse PDFs stored on disk

Technologies Used

  • Java 17
  • Spring Boot
  • Apache PDFBox (for PDF text extraction)
  • OpenAI API (for processing extracted text)
  • Maven (for dependency management)

Installation and Setup

1️⃣ Clone the Repository

 git clone <repository-url>
 cd pdf-reader-api

2️⃣ Configure Application Properties

Create an application.yml file inside src/main/resources/ with the following content:

spring:
  servlet:
    multipart:
      enabled: true
      max-file-size: 10MB
      max-request-size: 20MB

theokanning:
  openai:
    api-key: YOUR_OPENAI_API_KEY

Replace YOUR_OPENAI_API_KEY with a valid OpenAI API key.

3️⃣ Build and Run the Application

mvn clean install
mvn spring-boot:run

The application will start at http://localhost:8080


API Endpoints

🟢 Upload & Parse PDF

Endpoint:

POST /api/pdf/parse

Request:

  • file (Multipart File) → Upload a PDF file

Response:

{
  "name": "Krishna Kumar",
  "email": "[email protected]",
  "opening_balance": 1000,
  "closing_balance": 1500
}

🔐 Secure PDF Parsing (Password Required)

Endpoint:

POST /api/pdf/parse-secure

Request:

  • file (Multipart File)
  • firstname (String)
  • dob (String, format: YYYY-MM-DD)

Response:

{
  "name": "Krishna Kumar",
  "email": "[email protected]",
  "opening_balance": 1000,
  "closing_balance": 1500
}

📂 Parse PDF from Disk

Endpoint:

GET /api/pdf/parse-from-disk

Request:

  • filename (String) → Name of the PDF file stored in src/main/resources/pdf/

Response:

{
  "name": "Krishna Kumar",
  "email": "[email protected]",
  "opening_balance": 1000,
  "closing_balance": 1500
}

Possible Issues and Fixes

It is giving the data in timeline from the pdf file but still showing error in postman

  • Needs slight fix

Contributors

  • Shivam - Initial Development

License

This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages