Skip to content

Latest commit

 

History

History
180 lines (133 loc) · 5.45 KB

File metadata and controls

180 lines (133 loc) · 5.45 KB

OfficeIMO.Html.Pdf - HTML and PDF bridge

nuget version nuget downloads

OfficeIMO.Html.Pdf provides first-party HTML-to-PDF and PDF-to-HTML bridge workflows for OfficeIMO. It does not introduce a browser-grade HTML/CSS renderer or a second PDF parser; it composes the existing OfficeIMO engines.

Install

dotnet add package OfficeIMO.Html.Pdf

Quick start

using OfficeIMO.Html.Pdf;

string html = """
<h1>Quarterly update</h1>
<p>Generated by OfficeIMO.</p>
<table>
  <tr><th>Area</th><th>Status</th></tr>
  <tr><td>PDF bridge</td><td>Green</td></tr>
</table>
""";

html.SaveAsPdf("quarterly-update.pdf", new HtmlPdfSaveOptions {
    Profile = HtmlPdfProfile.Semantic
});

Examples

Use the document profile for print-oriented HTML

using OfficeIMO.Html.Pdf;
using OfficeIMO.Word.Html;
using OfficeIMO.Word.Pdf;

string html = File.ReadAllText("invoice.html");

var options = HtmlPdfSaveOptions.CreateDocumentProfile();
options.WordHtmlOptions = HtmlToWordOptions.CreateOfficeIMOProfile();
options.WordPdfOptions = new PdfSaveOptions {
    Title = "Invoice 2026-001",
    Author = "Evotec",
    IncludePageNumbers = true
};

html.SaveAsPdf("invoice.pdf", options);

Use the semantic profile for Markdown-like HTML

using OfficeIMO.Html.Pdf;
using OfficeIMO.Markdown.Pdf;

string html = """
<h1>Release notes</h1>
<p>This release includes <strong>PDF</strong> improvements.</p>
<ul><li>Split</li><li>Merge</li><li>Readback</li></ul>
""";

var options = HtmlPdfSaveOptions.CreateSemanticProfile();
options.MarkdownPdfOptions = new MarkdownPdfSaveOptions {
    VisualTheme = MarkdownPdfVisualTheme.TechnicalDocument()
};

byte[] pdfBytes = html.SaveAsPdf(options);
File.WriteAllBytes("release-notes.pdf", pdfBytes);

Capture HTML import and PDF export diagnostics

using OfficeIMO.Html.Pdf;

string html = File.ReadAllText("complex.html");
var options = HtmlPdfSaveOptions.CreateDocumentProfile();

var result = html.TrySaveAsPdf("complex.pdf", options);
if (!result.Succeeded) {
    foreach (string diagnostic in result.Diagnostics) {
        Console.WriteLine(diagnostic);
    }
}

foreach (var warning in options.ConversionReport.Warnings) {
    Console.WriteLine($"{warning.Source}: {warning.Message}");
}

Convert PDF to HTML

using OfficeIMO.Html.Pdf;

string html = PdfHtmlConverter.ToHtml("quarterly-update.pdf", new PdfHtmlSaveOptions {
    Profile = PdfHtmlProfile.Semantic
});

PdfHtmlConverter.SaveAsHtml("quarterly-update.pdf", "quarterly-update-review.html", new PdfHtmlSaveOptions {
    Profile = PdfHtmlProfile.PositionedReview,
    IncludeLinkAnnotations = true,
    IncludeFormWidgets = true,
    ImageExportMode = PdfHtmlImageExportMode.EmbeddedDataUri
});

Export only selected pages

using OfficeIMO.Html.Pdf;
using OfficeIMO.Pdf;

var options = new PdfHtmlSaveOptions {
    Profile = PdfHtmlProfile.PositionedReview,
    PageRanges = new[] { new PdfPageRange(1, 2), new PdfPageRange(5, 5) },
    IncludeLinkAnnotations = true,
    IncludeFormWidgets = true,
    ImageExportMode = PdfHtmlImageExportMode.PlaceholderOnly
};

PdfHtmlConversionResult result = PdfHtmlConverter.ToHtmlResult("packet.pdf", options);
File.WriteAllText("packet-review.html", result.Html);

Console.WriteLine($"Rendered {result.Summary.RenderedPageCount} page(s)");
Console.WriteLine($"Tables: {result.Summary.TableCount}");
Console.WriteLine($"Links: {result.Summary.LinkCount}");

Convert an already-loaded logical PDF

using OfficeIMO.Html.Pdf;
using OfficeIMO.Pdf;

PdfLogicalDocument logical = PdfDocument.Open("statement.pdf").Read.Logical("1-3");

string html = logical.ToHtml(new PdfHtmlSaveOptions {
    Profile = PdfHtmlProfile.Semantic,
    IncludeMetadata = true,
    IncludePageContainers = true
});

File.WriteAllText("statement.html", html);

Profiles

  • Semantic HTML to PDF: HTML -> OfficeIMO.Markdown.Html -> MarkdownDoc -> OfficeIMO.Markdown.Pdf -> OfficeIMO.Pdf.
  • Document HTML to PDF: HTML -> OfficeIMO.Word.Html -> WordDocument -> OfficeIMO.Word.Pdf -> OfficeIMO.Pdf.
  • Semantic PDF to HTML: PDF -> OfficeIMO.Pdf logical model -> structured HTML.
  • Positioned review PDF to HTML: PDF -> OfficeIMO.Pdf logical model -> page wrappers with positioned text/table/link/form hints.

Profile contract APIs expose stable identifiers, intended use, fidelity guarantees, diagnostics, and unsupported scope for wrappers, manifests, UI selectors, and product docs.

Boundaries

  • HTML ingestion belongs in OfficeIMO.Markdown.Html or OfficeIMO.Word.Html.
  • PDF layout, reading, and logical extraction belong in OfficeIMO.Pdf.
  • This package chooses and composes the bridge profile.
  • Browser-grade CSS layout and pixel-perfect PDF rendering are out of scope for this package.

Related packages

Targets and license