Name	Name	Last commit message	Last commit date
parent directory ..
core	core
extractor	extractor
sample-pdfs	sample-pdfs
signature-validator	signature-validator
wasm	wasm
.gitignore	.gitignore
Cargo.lock	Cargo.lock
Cargo.toml	Cargo.toml
README.md	README.md

PDF Utils

A collection of lightweight Rust crates for parsing, extracting text, and verifying digital signatures in PDF documents. Designed specifically for zero-knowledge environments and constrained systems.

🎯 Design Philosophy

This repository provides minimal, dependency-light Rust crates for working with PDFs in zero-knowledge friendly environments. All core logic avoids heavy dependencies like lopdf, flate2, and openssl, making it suitable for:

Zero-knowledge virtual machines (e.g., SP1, Risc0)
WASM targets for web applications
Constrained, auditable environments requiring minimal attack surface
Blockchain applications needing PDF verification

📦 Crates Overview

`extractor` - PDF Text Extraction

Extracts plain text from PDF files with support for:

Common font encodings (ToUnicode, Differences, built-in maps)
CID fonts and glyph name mapping
Minimal PDF parsing with no external PDF libraries
Support for StandardEncoding, WinAnsiEncoding, MacRomanEncoding, and PDFDocEncoding

`signature-validator` - Digital Signature Verification

Verifies embedded digital signatures in PDFs using:

Raw PKCS#7/CMS parsing
Rust ASN.1 decoding
RSA/SHA1, SHA256, SHA384, and SHA512 digest verification
Content integrity and signature authenticity checks

`core` - Combined PDF Verification

Combines extractor and signature-validator to:

Validate that specific text appears in a signed PDF
Check its exact byte offset on a given page
Return boolean results for use in proofs or UIs
Provide unified interface for PDF verification

`wasm` - WebAssembly Interface

A thin WebAssembly wrapper around the core crate:

Browser-compatible PDF verification
JavaScript/TypeScript bindings
Base64 PDF input/output support

🚀 Quick Start

Basic Text Extraction

use extractor::extract_text;

let pdf_bytes = std::fs::read("document.pdf")?;
let pages = extract_text(pdf_bytes)?;
println!("Page 1: {}", pages[0]);

Signature Verification

use signature_validator::verify_pdf_signature;

let pdf_bytes = std::fs::read("signed_document.pdf")?;
let is_valid = verify_pdf_signature(&pdf_bytes)?;
println!("Signature valid: {}", is_valid);

Combined Verification

use core::verify_text;

let pdf_bytes = std::fs::read("document.pdf")?;
let result = verify_text(pdf_bytes, 0, "Sample Text", 100)?;
println!("Text found at position: {}", result.substring_matches);

🧪 Testing

All crates share the same workspace. Run the public tests with:

cargo test

Some crates have additional private tests that rely on PDF files not included in this repository. To run them, add the private_tests feature:

cargo test --features private_tests

Run tests for a specific crate:

cargo test -p extractor
cargo test -p signature-validator
cargo test -p core
cargo test -p wasm

📋 Feature Support

Feature	Support
Text Extraction	✅
Font Encoding	✅
Digital Signatures	✅
PKCS#7/CMS	✅
Multi-page Documents	✅
Compressed Streams	✅
Position-based Matching	✅
Combined Verification	✅
WebAssembly	✅
Image Extraction	❌
Form Field Processing	❌
ECDSA Signatures	❌
Certificate Chain Validation	❌
Timestamp Verification	❌
Multiple Signatures	❌
Complex Layout Analysis	❌

🔧 Dependencies

Minimal external dependencies - Only essential crates like miniz_oxide, rsa, sha2
No heavy PDF libraries - Custom lightweight PDF parser
Zero-knowledge friendly - All algorithms compatible with ZK-VMs
WASM compatible - All crates compile to WebAssembly

📚 Documentation

Each crate has detailed documentation:

🤝 Contributing

This project is designed for zero-knowledge applications. When contributing:

Keep dependencies minimal
Ensure ZK-VM compatibility
Add tests for new features
Document any breaking changes

📄 License

This project is licensed under the same terms as the parent repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

README.md

PDF Utils

🎯 Design Philosophy

📦 Crates Overview

`extractor` - PDF Text Extraction

`signature-validator` - Digital Signature Verification

`core` - Combined PDF Verification

`wasm` - WebAssembly Interface

🚀 Quick Start

Basic Text Extraction

Signature Verification

Combined Verification

🧪 Testing

📋 Feature Support

🔧 Dependencies

📚 Documentation

🤝 Contributing

📄 License

Uh oh!

FilesExpand file tree

pdf-utils

Directory actions

More options

Directory actions

More options

Latest commit

History

pdf-utils

Folders and files

parent directory

README.md

PDF Utils

🎯 Design Philosophy

📦 Crates Overview

extractor - PDF Text Extraction

signature-validator - Digital Signature Verification

core - Combined PDF Verification

wasm - WebAssembly Interface

🚀 Quick Start

Basic Text Extraction

Signature Verification

Combined Verification

🧪 Testing

📋 Feature Support

🔧 Dependencies

📚 Documentation

🤝 Contributing

📄 License

`extractor` - PDF Text Extraction

`signature-validator` - Digital Signature Verification

`core` - Combined PDF Verification

`wasm` - WebAssembly Interface