Skip to content

Gepardec/llama-lesen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Lerne dem Llama lesen

A Quarkus CLI application that processes handwritten PDF forms using LLM analysis to extract structured data.

Overview

This application automates the process of digitizing handwritten forms by:

  1. Extracting images from PDF documents
  2. Analyzing handwritten content using Google Vertex AI
  3. Storing structured data in PostgreSQL database

Prerequisites

  • Java 21+
  • Maven 3.8+
  • PostgreSQL database
  • Google Cloud project with Vertex AI API enabled

Quick Start

1. Environment Setup

export DB_URL="jdbc:postgresql://localhost:5432/lerne_lama_db"
export DB_USER="lerne-lama"
export DB_PASSWORD="your-secure-password"
export VERTEX_AI_PROJECT_ID="your-gcp-project"
export VERTEX_AI_LOCATION="us-east5"
export VERTEX_AI_MODEL="llama-4-scout-17b-16e-instruct-maas"
export VERTEX_AI_CREDENTIALS_PATH="/path/to/service-account-key.json"
export VERTEX_AI_PUBLISHER="meta"

Important: Never commit service account keys or credentials to version control.

2. Build and Run

# Build the application
./mvnw clean package

# Run in development mode
./mvnw quarkus:dev

# Process a PDF file
java -jar target/quarkus-app/quarkus-run.jar -f document.pdf

Usage

# Basic usage
java -jar target/quarkus-app/quarkus-run.jar -f path/to/form.pdf

# With verbose output
java -jar target/quarkus-app/quarkus-run.jar -f path/to/form.pdf -v

# Show help
java -jar target/quarkus-app/quarkus-run.jar --help

Configuration

Key configuration options in application.yml:

  • Database: PostgreSQL connection settings
  • Vertex AI: Model configuration and authentication
  • Prompts: System and user prompts for LLM analysis

Architecture

Built with hexagonal architecture:

  • Domain: Core business models and ports
  • Application: Business logic and services
  • Infrastructure: External adapters (database, LLM, CLI)

Testing

# Run all tests
./mvnw test

# Run with coverage
./mvnw test jacoco:report

Development

The application uses:

  • Quarkus framework
  • PDFBox for PDF processing
  • Google Vertex AI for document analysis
  • PostgreSQL with Hibernate ORM
  • PicoCLI for command-line interface

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published