Skip to content

cosmez/dotnetpdf

Repository files navigation

dotnet.pdf

A .NET-based tool for PDF processing, dotnetpdf draws inspiration from PDFtk server.

Architecture (v2.0+)

dotnetpdf has been refactored into a modular architecture:

  • CLI Application (dotnet.pdf) - Command-line interface for PDF operations
  • Core Library (DotNet.Pdf.Core) - Reusable PDF processing services
    • Service-oriented architecture with dedicated classes for each PDF operation
    • Full dependency injection support
    • Comprehensive logging and error handling
    • Thread-safe operations

Core Services

  • PdfTextExtractionService - Extract text from PDF documents
  • PdfBookmarkService - Process PDF bookmarks and outlines
  • PdfInformationService - Extract document metadata
  • PdfAttachmentService - Handle PDF attachments
  • PdfPageObjectService - Analyze page objects
  • PdfFormFieldService - Inspect form fields
  • PdfWatermarkService - Add watermarks to documents

Features

dotnetpdf commands

  • split: Split a single PDF into multiple files.
  • merge: Combine multiple PDFs into one file.
  • convert: Convert PDF pages to images.
  • imagetopdf: Convert images to a PDF file.
  • text: Extract text from a PDF.
  • bookmarks: Extract PDF bookmarks (outlines).
  • info: Retrieves PDF metadata.
  • rotate: Rotate PDF pages by 90, 180, or 270 degrees.
  • remove: Remove specific pages from a PDF.
  • insert: Insert blank pages into a PDF at specified positions.
  • reorder: Reorder PDF pages according to a specified sequence.
  • list-attachments: List PDF attachments with metadata information.
  • extract-attachments: Extract PDF attachments to disk.
  • list-objects: List all graphical objects on a given page.
  • list-forms: List all interactive form fields in a document.
  • watermark: Add a text or image watermark to all pages of a document.

Installation

Install dotnetpdf as a .NET global tool:

dotnet tool install --global Emm.DotnetPdf

Usage

# Split a PDF using autogenerated names
dotnetpdf split --input <input.pdf>

# Split a PDF specifying output name
dotnetpdf split --input <input.pdf> -names '{page}_{original}_pdf'
 [README.md](..%2FREADME.md)
# Split a PDF using bookmarks as output names, specifying range
dotnetpdf split --input <input.pdf> --use-bookmarks --range 1-5

# You can also create a text file to specify the output filenames, one filename per pdf page
dotnetpdf split --input <input.pdf> --output-script <script.txt>

# Merge PDFs
dotnetpdf merge --output <output> --input <input1> --input <input2> 

# Merge Directory PDF's
dotnetpdf merge --output <output> --input-directory <directory> --recursive false

# Convert PDF to images
dotnetpdf convert --input <input.pdf> --output <directory> --range 1-5 --encoder .png --dpi 100

# Convert image to PDF
dotnetpdf imagetopdf --input <input.png> --output <output.pdf>

# Print PDF text to stdout
dotnetpdf text --input <input.pdf> --format text

# Print PDF text to stdout as json
dotnetpdf text --input <input.pdf> --format json

# Print PDF Bookmarks to stdout
dotnetpdf bookmarks --input <input.pdf>

# Extract PDF Information
dotnetpdf info --input <input.pdf> --format json

# Rotate PDF pages (90, 180, or 270 degrees)
dotnetpdf rotate --input <input.pdf> --output <rotated.pdf> --rotation 180

# Rotate specific pages only
dotnetpdf rotate --input <input.pdf> --output <rotated.pdf> --range 1-3 --rotation 90

# Remove specific pages from PDF
dotnetpdf remove --input <input.pdf> --output <cleaned.pdf> --pages 2,4,6

# Insert blank pages at specified positions
dotnetpdf insert --input <input.pdf> --output <expanded.pdf> --positions 1:2,5:1

# Insert blank pages with custom dimensions (in points)
dotnetpdf insert --input <input.pdf> --output <expanded.pdf> --positions 3:1 --width 595 --height 842

# Reorder PDF pages
dotnetpdf reorder --input <input.pdf> --output <reordered.pdf> --order 3,1,2,4

# List PDF attachments
dotnetpdf list-attachments --input <input.pdf>

# List PDF attachments in JSON format
dotnetpdf list-attachments --input <input.pdf> --format json

# Extract all PDF attachments
dotnetpdf extract-attachments --input <input.pdf> --output <output-directory>

# Extract specific attachment by index
dotnetpdf extract-attachments --input <input.pdf> --output <output-directory> --index 0

# List all objects on page 1
dotnetpdf list-objects --input <input.pdf> --page 1

# List all form fields in a document as JSON
dotnetpdf list-forms --input <input.pdf> --format json

# Add a text watermark
dotnetpdf watermark --input <input.pdf> --output <watermarked.pdf> --text "CONFIDENTIAL"

# Add an image watermark with custom options
dotnetpdf watermark --input <input.pdf> --output <watermarked.pdf> --image <logo.png> --scale 0.5 --opacity 128

# Print Help
dotnetpdf --help

Using DotNet.Pdf.Core Library

The core functionality is available as a reusable library:

using DotNet.Pdf.Core;
using Microsoft.Extensions.Logging;

// Setup logging
var loggerFactory = LoggerFactory.Create(builder => builder.AddConsole());

// Create PDF processor
var pdfProcessor = new PdfProcessor(loggerFactory);

// Extract text
var texts = pdfProcessor.GetPdfText("document.pdf", pageRange: null, password: "");
foreach (var pageText in texts)
{
    Console.WriteLine($"Page {pageText.Page}: {pageText.Text}");
}

// Get document information
var info = pdfProcessor.GetPdfInformation("document.pdf", password: "");
Console.WriteLine($"Title: {info.Title}, Pages: {info.Pages}");

// Extract bookmarks
var bookmarks = pdfProcessor.GetPdfBookmarks("document.pdf", password: "");
foreach (var bookmark in bookmarks)
{
    Console.WriteLine($"Level {bookmark.Level}: {bookmark.Title}");
}

Dependency Injection Setup

// In ASP.NET Core or Generic Host
services.AddSingleton<PdfProcessor>(provider =>
{
    var loggerFactory = provider.GetRequiredService<ILoggerFactory>();
    return new PdfProcessor(loggerFactory);
});

Migration from v1.x

See MIGRATION.md for detailed migration guide from the old static API to the new service-oriented architecture.

About

Command line tool for PDF processing.

Resources

Stars

Watchers

Forks

Packages

No packages published