A Quarkus CLI application that processes handwritten PDF forms using LLM analysis to extract structured data.
This application automates the process of digitizing handwritten forms by:
- Extracting images from PDF documents
- Analyzing handwritten content using Google Vertex AI
- Storing structured data in PostgreSQL database
- Java 21+
- Maven 3.8+
- PostgreSQL database
- Google Cloud project with Vertex AI API enabled
export DB_URL="jdbc:postgresql://localhost:5432/lerne_lama_db"
export DB_USER="lerne-lama"
export DB_PASSWORD="your-secure-password"
export VERTEX_AI_PROJECT_ID="your-gcp-project"
export VERTEX_AI_LOCATION="us-east5"
export VERTEX_AI_MODEL="llama-4-scout-17b-16e-instruct-maas"
export VERTEX_AI_CREDENTIALS_PATH="/path/to/service-account-key.json"
export VERTEX_AI_PUBLISHER="meta"Important: Never commit service account keys or credentials to version control.
# Build the application
./mvnw clean package
# Run in development mode
./mvnw quarkus:dev
# Process a PDF file
java -jar target/quarkus-app/quarkus-run.jar -f document.pdf# Basic usage
java -jar target/quarkus-app/quarkus-run.jar -f path/to/form.pdf
# With verbose output
java -jar target/quarkus-app/quarkus-run.jar -f path/to/form.pdf -v
# Show help
java -jar target/quarkus-app/quarkus-run.jar --helpKey configuration options in application.yml:
- Database: PostgreSQL connection settings
- Vertex AI: Model configuration and authentication
- Prompts: System and user prompts for LLM analysis
Built with hexagonal architecture:
- Domain: Core business models and ports
- Application: Business logic and services
- Infrastructure: External adapters (database, LLM, CLI)
# Run all tests
./mvnw test
# Run with coverage
./mvnw test jacoco:reportThe application uses:
- Quarkus framework
- PDFBox for PDF processing
- Google Vertex AI for document analysis
- PostgreSQL with Hibernate ORM
- PicoCLI for command-line interface