This matrix describes the current first-party OfficeIMO.Pdf state. It is intentionally blunt so PSWriteOffice can wrap what exists and avoid promising PSWritePDF/iText parity too early.
OfficeIMO.Drawing is a valid first-party dependency for shared drawing concepts such as colors, font metadata, image metadata, text measurement, and reusable drawing primitives. Prefer that reusable layer when a primitive belongs across OfficeIMO packages, while keeping PDF syntax, page objects, and PDF-specific layout inside OfficeIMO.Pdf.
Status values:
Supported: public API exists and has tests.Partial: useful capability exists, but scope is deliberately limited.Planned: roadmap item, no dependable public API yet.External bridge: exists elsewhere in OfficeIMO but not in the dependency-free PDF engine.
| Area | Capability | Status | Current API / Notes |
|---|---|---|---|
| Create | Build a PDF from fluent blocks | Partial | PdfDoc.Create(), headings with Word-like spacing-before suppression at fresh page/column starts, paragraphs, rich text with scoped per-run standard font family, font-size, and background/highlight changes, Word-compatible default half-inch paragraph tab stops with PdfParagraphStyle.DefaultTabStopWidth overrides, explicit paragraph tab runs with dotted, hyphen, or underscore leaders and left/center/right/decimal alignment through PdfParagraphBuilder.Tab(PdfTabLeaderStyle.Dots, PdfTabAlignment.DecimalSeparator), Word-like flow-object spacing-before suppression at fresh page/column starts, invisible Spacer(...) flow gaps, simple bullets/numbering plus rich list item runs through PdfListItem, RichBullets(...), and RichNumbered(...), panels, rows/columns, simple tables, JPEG/PNG images, headers, footers, page numbers; page-scoped content compose supports direct Item(...) groups, nested element groups, Spacer(...) rhythm blocks, and PageBreak() page transitions alongside columns and rows |
| Create | Save to bytes/path/stream workflow | Supported | ToBytes, Save(string), Save(Stream), SaveAsync(string), and SaveAsync(Stream) |
| Create | Metadata | Supported | PdfDoc.Meta(title, author, subject, keywords) |
| Create | Page setup | Partial | PdfOptions.PageSize, PdfOptions.Margins, PdfOptions.BackgroundColor, PageSize.FromInches(...), PageSize.FromCentimeters(...), PageMargins.UniformInches(...), PageMargins.FromInches(...), PageMargins.UniformCentimeters(...), PageMargins.FromCentimeters(...), PdfDoc.Size(...), Margin(...), Margin(PageMargins), Orientation(...), Portrait(), Landscape(), Background(...), top-level PdfDoc.Page(...) / Section(...), PdfDoc.Compose(...Page...) / Compose(...Section...), and matching PdfPageCompose methods provide Word-like size, orientation, margin, page-background color, and scoped flow setup with immediate intrinsic scalar validation and reusable Word-compatible PageMargins presets; richer section inheritance, mid-page section breaks, and image/background-shape page fills remain roadmap work |
| Create | Tables | Partial | Basic styling, proportional standard-font wrapping for cells and captions, rich PdfTableCell text runs with scoped color, bold/italic, underline/strike, font size, background/highlight, baseline, tabs, and links rendered through the shared rich text engine, table-cell images through PdfTableCell.WithImages(...), report-friendly TableStyles.Light() defaults, Word-like table presets with neutral header/footer separators, including TableNormal, TableGrid, TableGridLight, PlainTable1, GridTable1Light, and Accent1-6 variants with Word default theme border, separator, and soft band colors for the existing light grid/list styles, canonical style normalization through TableStyles.GetCanonicalWordStyleName(...) / TryGetCanonicalWordStyleName(...), canonical display names through TableStyles.CanonicalWordStyleNames, accepted aliases through TableStyles.SupportedWordStyleNames, row/header/footer separators, side-specific per-cell border overrides with independent side colors, widths, solid/dashed/dotted/dash-dot strokes, two-line borders, and diagonal-up/diagonal-down cell lines, body column fills, per-cell fills, proportional per-cell data bars through PdfCellDataBar, per-cell vector icons through PdfCellIcon, per-cell padding overrides, column and per-cell horizontal/vertical cell alignment, configurable cell spacing, configurable visual header/footer row counts with render-time bounds validation, optional repeated-header row count through PdfTableStyle.RepeatHeaderRowCount, table-wide and per-row minimum heights, table-wide and per-row row-break policies, table left indentation and max-width caps with left/center/right placement, spacing before/after with Word-like spacing-before suppression at fresh page/column starts, keep-together and keep-with-next page flow with matching first-row preflight diagnostics that honor configured column widths, fixed/min/max column widths including proportional fitting for oversized fixed-width tables in top-level and row/column frames, relative column width weights, column-scoped style bounds validation for sizing/fills/horizontal and vertical alignment, OfficeIMO.Drawing-backed auto-fit column sizing with token minimums, initial PdfTableCell column spans, row spans, rectangular merged cells with combined-box alignment, overlong row-span validation, row-spanned-cell header/footer boundary validation, row-spanned explicit cell fills/borders, explicit cell fill/data-bar/icon/border/padding/alignment coordinate bounds validation plus row-span and column-span continuation-slot skips, row/header/footer separators, body-column background fills that skip merged-cell continuation columns, row/background fills, and default table border grids that skip row-spanned and rectangular merged-cell interiors, cell-owned URI or named-destination links including linked column/row-spanned cell annotations over the merged text frame in top-level and row/column flows, and cell-owned named-destination anchors through PdfTableCell.NamedDestinationName / WithNamedDestination(...), row-by-row pagination, oversized-row line splitting, repeated header rows, caption-plus-first-row overflow diagnostics, generic line-item visual rhythm gates, and PDF-level clipping when cell text would escape its cell rectangle with a small antialiasing tolerance exist; richer merged-cell conflict behavior and report tables are still roadmap work |
| Create | Rows and columns | Partial | PdfRowCompose supports percentage columns with explicit gutters plus reusable PdfRowStyle defaults/overrides for Word-like column gutters, optional vertical column separators, row-level spacing, keep-together, and keep-with-next page flow through PdfOptions.DefaultRowStyle, PdfDoc.DefaultRowStyle(...), PdfPageCompose.DefaultRowStyle(...), PdfTheme.RowStyle, or per-row Style(...) / ColumnSeparator(...); column flows can use Item(...) groups and Spacer(...) for invisible vertical rhythm without fake blank text; the native Word exporter maps Word section columns with explicit column breaks, inline paragraph column breaks, explicit unequal section column widths from Word section properties, Word section column separator lines, and a heading/keep-with-next-aware automatic distribution fallback for multi-column sections without explicit breaks through this same row/column flow; kept rows that exceed the available page content height fail with a clear diagnostic, and richer balanced newspaper-style section flow remains roadmap work |
| Create | Images | Partial | JPEG and simple non-interlaced 8-bit grayscale/grayscale-alpha/RGB/RGBA PNG placement, including PNG alpha soft masks; image payload validation uses OfficeIMO.Drawing.OfficeImageReader and rejects unsupported recognized formats clearly; flow images can use shared OfficeImageFit stretch/contain/cover placement, shared OfficeClipPath rectangle/rounded/freeform clipping, and optional URI link annotations with contents metadata in top-level, row/column, and table-cell flows |
| Create | Drawing primitives | Partial | Flow lines, rectangles, rounded rectangles, ellipses, polygons, paths, and simple grouped drawing scenes render from shared OfficeIMO.Drawing descriptors with solid fill, two-stop linear gradient fill, simple offset shadow, stroke/width/dash style/line cap/line join/fill and stroke opacity/affine transform/clipping path/alignment/spacing/keep-with-next flow plus optional URI link annotations with contents metadata on generic shape and drawing scene blocks and on vector convenience helpers; richer gradients and richer shape effects remain roadmap work |
| Create | Headers and footers | Partial | Simple generated page headers and footers support PdfOptions, document-level PdfDoc.Header(...) / PdfDoc.Footer(...), and page-scoped PdfPageCompose.Header(...) / Footer(...) configuration with literal text formats, {page} / {pages} tokens, composed text/token segment builders, Word-like left/center/right text zones through PdfHeaderCompose.Zones(...) and PdfFooterCompose.Zones(...), simple images through Image(...), simple shared drawing shapes through Shape(...), plus first-page/even-page text, image, and shape variants through FirstPageZones(...), EvenPagesZones(...), FirstPageImage(...), EvenPagesImage(...), FirstPageShape(...), and EvenPagesShape(...), font, size, text color, alignment, margin-relative offsets with placement validation, first-page overrides, and odd/even page overrides; zone text is measured and rejected when it would overflow or overlap. First/even/odd selection is scoped to the current document or section flow, while visible page tokens continue by default across flows for Word-like section numbering. PdfOptions.PageNumberStart, PdfDoc.PageNumberStart(...), and PdfPageCompose.PageNumberStart(...) can explicitly restart the visible page number without breaking first/even/odd variant selection, and PdfOptions.PageNumberStyle, PdfDoc.PageNumberStyle(...), and PdfPageCompose.PageNumberStyle(...) can render decimal, roman, or alphabetic page tokens. The native Word-to-PDF path maps simple default/first/even header and footer text/images/shapes into this model, including left/center/right paragraph alignment, simple text-box text routed through header/footer zones, and simple two-/three-cell header/footer table text/images/shapes through first-party zones; richer table header and footer fidelity remains roadmap work |
| Create | Themes and styles | Partial | PdfTheme bundles default text, paragraph, heading, list, table, panel, rule, image, drawing, and row styles for PdfOptions.ApplyTheme(...), PdfDoc.Theme(...), and PdfPageCompose.Theme(...); PdfTheme.WordLike() provides a generic opt-in document theme with neutral typography, readable paragraph/list/table rhythm, heading hierarchy, table footer separators for summary rows, and flow-object spacing without introducing invoice/report-specific engine concepts |
| Create | Fonts | Partial | Standard PDF fonts only; document defaults, header/footer fonts, default text styles, and rich text runs can select Helvetica, Times, or Courier family variants without embedding. Helvetica and Times family measurement uses built-in glyph-width tables, including common WinAnsi punctuation and accented Latin letters, for generated layout and standard-font readback; TrueType/OpenType embedding is planned |
| Create | Outlines/bookmarks | Partial | PdfOptions.CreateOutlineFromHeadings writes nested PDF outlines from H1/H2/H3 blocks, generic PdfDoc.Bookmark(...) / compose Bookmark(...) helpers write simple PDF named destinations from the current top-level or row/column flow position with duplicate-name validation, rich PdfListItem can anchor per-item named destinations in top-level and row/column list flows, and paragraph LinkToBookmark(...) runs plus bookmark-targeted H1/H2/H3 links write internal GoTo annotations targeting those named destinations with missing-target validation |
| Create | Forms | Partial | PdfDoc.TextField(...), PdfDoc.CheckBox(...), PdfDoc.ChoiceField(...), PdfDoc.MultiSelectChoiceField(...), and PdfDoc.RadioButtonGroup(...) write initial simple AcroForm text fields, check boxes, scalar choice fields, multi-select choice fields, and vertical radio button groups in top-level page flow, compose item/element flow, and row/column flow; PdfTableCell.WithCheckBoxes(...) writes simple check boxes inside table cells and PdfTableCell.WithFormFields(...) writes simple table-cell text and scalar choice fields. Generated fields include visible normal appearance streams, /Widget annotations, catalog /AcroForm, generated /NeedAppearances false, Helvetica default resource registration, button Off/selected appearance states, choice field flags/options, and radio parent/kid widgets; PdfFormFieldStyle can set generated background, border, text, and button mark colors plus border width; generated fields are immediately readable by PdfInspector / PdfLogicalDocument and can be filled or filled-and-flattened by PdfFormFiller. The native Word exporter maps simple body-level and table-cell dropdown, combo box, and date picker content controls to these first-party form primitives. Richer field widgets and broader Word/Excel/PowerPoint mapped form export remain roadmap work |
| Shared drawing | Color interop | Supported | PdfColor.FromOfficeColor, PdfColor.FromOfficeColorOrNull, PdfColor.ToOfficeColor, and implicit OfficeColor to PdfColor conversion |
| Shared drawing | Image metadata | Supported | PDF image validation/rendering stores OfficeImageInfo on internal image blocks and uses OfficeImageReader for format detection |
| Shared drawing | Font metadata, text measurement, image fitting, vector descriptors | Partial | Generated table auto-fit sizing uses OfficeIMO.Drawing.OfficeTextMeasurer; flow images use OfficeIMO.Drawing.OfficeImageFit; flow lines, rectangles, rounded rectangles, ellipses, polygons, paths, and grouped scenes use OfficeIMO.Drawing.OfficeShape / OfficeDrawing, including shared stroke dash/cap/join, two-stop linear gradient fill, simple offset shadow, fill/stroke opacity, affine transform, and clipping path descriptors; keep expanding this shared layer instead of duplicating reusable primitives inside OfficeIMO.Pdf |
| Read | Load PDF object model | Partial | PdfReadDocument.Load(byte[]/path/stream) handles the current pragmatic parser scope and prefers the trailer /Root catalog when stale catalog objects remain in the file |
| Read | Lightweight document probe | Supported | PdfInspector.Probe(byte[]/path/stream) returns PdfDocumentProbe.HeaderVersion, HasEncryption, HasSignatures, HasForms, HasAnnotations, HasOutlines, HasCatalogViewSettings, HasPageLabels, HasCatalogNameTrees, HasNamedDestinations, HasOpenActions, HasViewerPreferences, HasTaggedContent, HasXmpMetadata, HasCatalogUri, HasOutputIntents, HasEmbeddedFiles, HasOptionalContent, and HasActiveContent without full parsing so wrappers can choose safe read/manipulation paths |
| Read | Wrapper validation/preflight | Supported | PdfValidator.Validate(byte[]/path/stream) and PdfInspector.Preflight(byte[]/path/stream) return IsValid/CanRead, CanExtractText, CanExtractImages, CanReadLogicalObjects, CanRewrite, CanManipulatePages, CanFillSimpleFormFields, CanFlattenSimpleFormFields, CanFillAndFlattenSimpleFormFields, Can(PdfPreflightCapability), GetCapabilityDiagnostics(PdfPreflightCapability), parsed DocumentInfo when available, structured ReadBlockers / RewriteBlockers, HasReadBlocker(...) / HasRewriteBlocker(...) helpers, and diagnostics for encrypted, signed, form-bearing, complex-outline-bearing, complex-page-label-bearing, unsupported-catalog-name-tree-bearing, complex-named-destination-name-tree-bearing, complex-open-action-dictionary-bearing, complex-viewer-preference-bearing, complex-XMP-metadata-bearing, complex-catalog-URI-bearing, tagged, complex-output-intent-bearing, complex-embedded-file-bearing, complex-optional-content-bearing, active-content-bearing, invalid rewrite object references, unsupported page content stream filters, invalid, empty, or parser-unsupported inputs; simple direct catalog view settings, simple outlines including simple GoTo action outline entries, simple direct page labels, direct named destinations, simple destination name trees including leaf /Kids, destination-array open actions, simple GoTo open-action dictionaries, simple viewer preferences, simple catalog XMP metadata streams, simple catalog URI base dictionaries, simple output intents, simple embedded-file attachment trees, and simple optional-content metadata are detected but no longer block rewrite; image extraction can still be allowed when document inspection succeeds but content-stream filters block text or logical-object extraction, and simple AcroForm fill/flatten gates are reported separately because form PDFs still block generic page-rewrite helpers while dedicated form helpers can support a narrower safe path |
| Read | Page count, page sizes, and rotation | Supported | PdfInspector.Inspect(byte[]/path/stream) and InspectPageRanges(byte[]/path/stream, PdfPageRange...) return PdfDocumentInfo.PageCount, HeaderVersion, PdfPageInfo geometry, RotationDegrees, page-level link annotations, page-level AcroForm widget annotations when readable, signature marker state, form marker state, annotation marker state, outline marker state, catalog-view-setting marker state, page-label marker state, catalog-name-tree marker state, named-destination marker state, open-action marker state, viewer-preference marker state, tagged-structure marker state, XMP metadata marker state, catalog URI marker state, output-intent marker state, embedded-file marker state, optional-content marker state, and active-content marker state; page-range inspection preserves caller order and overlaps while narrowing page labels, page-resolved outlines, named destinations, open actions, AcroForm fields, and form widgets to selected source pages |
| Read | Catalog view and identity | Partial | PdfReadDocument and PdfInspector.Inspect(...) expose simple catalog CatalogPageMode, CatalogPageLayout, CatalogVersion, and CatalogLanguage |
| Read | Page labels | Partial | PdfReadDocument.PageLabels and PdfInspector.Inspect(...).PageLabels read simple direct catalog /PageLabels number trees as generic page-label rules with StartPageIndex, StartPageNumber, Style, Prefix, StartNumber, PageLabelCount, and HasReadablePageLabels; rewrite-style copied-page label reindexing follows the trailer-root page tree so stale catalog objects do not skew selected-page labels; complex page-label trees remain marker/blocker-only until richer number-tree support exists |
| Read | Viewer preferences | Partial | PdfReadDocument.ViewerPreferences and PdfInspector.Inspect(...).ViewerPreferences read simple catalog /ViewerPreferences dictionaries as generic key/value entries with Count, GetValue(...), GetBoolean(...), and HasReadableViewerPreferences; complex viewer preference graphs remain readable only as markers/blockers until richer typed models exist |
| Read | Metadata | Supported | PdfReadDocument.Metadata, PdfInspector.Inspect(...).Metadata, PdfTextExtractor.GetMetadata(byte[]/path/stream) |
| Read | Outlines/bookmarks | Partial | PdfReadDocument.Outlines and PdfInspector.Inspect(...).Outlines read simple outline trees, indirect destinations, direct/name-tree named-destination targets, and simple GoTo action destinations from the trailer-root catalog; rewrite-style manipulation preserves simple outline trees, including simple GoTo action outline entries, whose destinations point only at copied pages, drops outline trees when a selected-page operation would leave stale outline destinations, and still blocks complex non-GoTo or additional-action outline trees |
| Read | Named destinations | Partial | PdfReadDocument.NamedDestinations and PdfInspector.Inspect(...).NamedDestinations read simple direct catalog /Dests dictionaries and simple catalog /Names /Dests name trees, including leaf /Kids, exposing Name, PageNumber, DestinationTop, NamedDestinationCount, and NamedDestinationNames; PdfReadPage.GetLinkAnnotations() / PdfInspector.Inspect(...) also read simple URI links plus GoTo/direct named-destination page links with distinct LinkUris and LinkDestinationNames; malformed or unsupported name-tree destinations remain blocked for rewrite |
| Read | Document open action | Partial | PdfReadDocument.OpenAction and PdfInspector.Inspect(...).OpenAction read simple destination-array and simple GoTo dictionary /OpenAction entries, exposing ActionType, PageNumber, DestinationTop, and HasReadableOpenAction; complex open actions remain readable only as markers/blockers until richer action support exists |
| Read | Text extraction | Partial | PdfReadDocument.ExtractText, PdfReadPage.ExtractText, PdfTextExtractor.ExtractAllText(byte[]/path/stream), PdfTextExtractor.ExtractAllTextByPageRanges(byte[]/path/stream, PdfPageRange...), PdfTextExtractor.ExtractTextByPage(byte[]/path/stream), and PdfTextExtractor.ExtractTextByPageRanges(byte[]/path/stream, PdfPageRange...); byte/path/stream whole-document extraction can write UTF-8 text to output paths or caller-owned streams, selected range extraction can return one concatenated text result or write one text file/stream for wrapper-style Convert-PdfToText -Pages, byte-array/path/stream page extraction can write deterministic source-page-0001.txt files, and range-list text extraction preserves caller order plus repeated or overlapping selections while writing selected source-page-numbered files with or without layout options |
| Read | Text positions/spans | Partial | PdfReadPage.GetTextSpans() returns generated standard-font spans with glyph-width-based advances when /Widths is omitted, including common WinAnsi punctuation and accented Latin letters |
| Read | Image extraction | Partial | PdfImageExtractor.ExtractImages(byte[]/path/stream/document), ExtractImagesByPageRanges(byte[]/path/stream/document, PdfPageRange...), and PdfReadDocument.ExtractImages() return page image XObjects; byte-array, path, and stream extraction can also write deterministic source-page-0001-image-0001.png files for all pages or selected source-page ranges, while range-list image extraction preserves caller order and deduplicates overlapping selections; JPEG images are returned as JPEG files and simple PNG-predictor Flate images as PNG files, including compatible grayscale/RGB Flate images with grayscale /SMask alpha as gray-alpha/RGBA PNGs |
| Read | Logical object model | Partial | PdfLogicalDocument.Load(byte[]/path/stream, PdfTextLayoutOptions?), LoadPageRanges(byte[]/path/stream, options, PdfPageRange...), From(PdfReadDocument, ...), and FromPageRanges(PdfReadDocument, options, PdfPageRange...) expose one wrapper-friendly read surface with metadata, selected source pages in caller order when ranges are used, PagesBySourcePageNumber, HasSourcePage(...), and GetPages(...) helpers that preserve range-selection duplicates, document-level TextBlocks, Headings, Paragraphs, ListItems, Tables, and Images, flattened logical Elements, ElementsByKind, ElementsByPageNumber, HasElementKind(...), and GetElements(...) helpers on both documents and pages, line-level text blocks, heuristic headings, list item objects with marker/level/text hints, heuristic paragraph groups, leader rows, detected tables with row/column/cell objects, image XObjects, URI/named-destination link annotation objects with document-level Links, LinksByUri, LinksByDestinationName, GetLinksByUri(...), and GetLinksByDestinationName(...), page-level AcroForm widget objects with current /AS and named /AP /N normal appearance states, catalog view settings, outlines/bookmarks, page-label rules, named destinations, open actions, viewer preferences, AcroForm /NeedAppearances, /SigFlags, named signature flag helpers, and /DA metadata, simple AcroForm fields with typed PdfFormFieldKind, inherited common /Ff flag helpers, scalar and array current/default values, selected/default-selected choice-option matching, inherited text /MaxLen, inherited AcroForm/field-tree /DA default appearance strings, inherited /Q text alignment, inherited simple choice /Opt options, distinct field page-number helpers, field-local widget page lookups, named, kind-based, and page-number form-field lookup helpers, document-level FormWidgets, FormWidgetsByFieldName, FormWidgetsByPageNumber, GetFormWidgets(string), GetFormWidgets(int), and simple form-widget page/rectangle objects. PdfLogicalDocument.ToMarkdown(...), PdfLogicalPage.ToMarkdown(...), and PdfTextExtractor.ExtractMarkdown(...) / ExtractMarkdownByPage(...) / ExtractMarkdownByPageRanges(...) / ExtractMarkdownByPageRangesAsDocument(...) render the same logical model as Markdown with headings, paragraphs, lists, detected tables, image placeholders, optional link/form annotations, UTF-8 output-path/stream helpers, and deterministic per-page .md files for wrapper pipelines. Range-based logical loads filter page labels, page-resolved outlines, named destinations, open actions, AcroForm fields, and form widgets to selected source pages while preserving duplicate selected page widgets in caller order. The two-page line-item statement fixture now guards source-page ordering, table readback, totals readback, and selected range ordering through the logical model. This is the first AST-style surface for PSWriteOffice-style workflows, but heading/paragraph/table/list detection remains heuristic rather than a full tagged-PDF or Word-like semantic reconstruction |
| Read | Simple structure extraction | Partial | PdfReadPage.ExtractStructured(...), PdfReadDocument.ExtractStructuredPages(...), PdfReadDocument.ExtractHeadingsByPage(...), PdfReadDocument.ExtractListItemsByPage(...), PdfReadDocument.ExtractParagraphsByPage(...), PdfTextExtractor.ExtractStructuredByPage(byte[]/path/stream, options), ExtractStructuredByPageRanges(byte[]/path/stream, options, PdfPageRange...), ExtractHeadingsByPage(byte[]/path/stream, options), ExtractHeadingsByPageRanges(byte[]/path/stream, options, PdfPageRange...), ExtractListItemsByPage(byte[]/path/stream, options), ExtractListItemsByPageRanges(byte[]/path/stream, options, PdfPageRange...), ExtractParagraphsByPage(byte[]/path/stream, options), ExtractParagraphsByPageRanges(byte[]/path/stream, options, PdfPageRange...), PdfTextExtractor.ExtractTablesByPage(byte[]/path/stream, options), and ExtractTablesByPageRanges(byte[]/path/stream, options, PdfPageRange...) expose column-aware text, heuristic headings, heuristic paragraph groups, list item marker/level hints, dot/hyphen/underscore leader rows that preserve decimal/currency value punctuation, and heuristic table rows/geometry for wrapper-friendly readback while preserving selected source page numbers for heading/list-item/paragraph/table results; PdfTextExtractor.ExtractTablesByPage(pdfBytes, outputDirectory, baseName, options), ExtractTablesByPage(inputPath, outputDirectory, options), ExtractTablesByPage(stream, outputDirectory, baseName, options), and matching ExtractTablesByPageRanges(...) overloads write deterministic escaped CSV files per detected table for all pages or selected source-page ranges, including the two-page line-item statement fixture with selected source-page order, line-item rows, and totals guarded for wrapper use |
| Manipulate | Split by page range | Partial | PdfPageExtractor.ExtractPageRange(byte[]/path/stream, firstPage, lastPage), ExtractPageRange(..., PdfPageRange), ExtractPageRanges(..., PdfPageRange...), SplitPages(byte[]/path/stream), and SplitPageRanges(..., PdfPageRange...) return bytes for wrapper pipelines; PdfPageRange.Parse(...), TryParse(...), ParseMany("1-3,5"), and TryParseMany(...) parse one-based single pages plus inclusive first-last / first..last range lists while preserving caller order; path and stream split helpers can also write deterministic source-page-0001.pdf and source-pages-0001-0003.pdf files; simple direct catalog /PageMode, /PageLayout, /Version, /Lang, simple direct /PageLabels number trees, simple outlines including simple GoTo action outline entries whose destinations point only at copied pages, direct /Dests dictionaries, simple /Names /Dests name trees including leaf /Kids, destination-array /OpenAction entries, simple GoTo open-action dictionaries, simple /ViewerPreferences dictionaries, simple catalog /Metadata XMP XML streams, simple catalog /URI base dictionaries, simple /OutputIntents metadata graphs, simple /Names /EmbeddedFiles attachment trees, simple catalog /AF associated-file arrays, and simple /OCProperties optional-content metadata are preserved, with copied-page labels reindexed, stale destinations/open actions pruned, stale outline trees/name-tree destinations dropped, and stale named-destination link annotations removed when their target pages are not copied; the two-page line-item statement fixture now guards split/extract readback through the logical model; currently scoped to PDFs handled by the OfficeIMO parser |
| Manipulate | Merge PDFs | Partial | PdfMerger.Merge(byte[]/stream inputs) and PdfMerger.MergeFilesToBytes(path inputs) can return bytes or write to output streams, while PdfMerger.MergeFiles(...) writes merged files from params paths or enumerable path lists and can write enumerable file-list inputs to output streams for wrapper pipelines; simple direct catalog /PageMode, /PageLayout, /Version, /Lang, simple direct /PageLabels number trees, simple outline trees including simple GoTo action outline entries, direct /Dests dictionaries, simple /Names /Dests name trees, destination-array /OpenAction entries, simple GoTo open-action dictionaries, simple /ViewerPreferences dictionaries, simple catalog /Metadata XMP XML streams, simple catalog /URI base dictionaries, simple /OutputIntents metadata graphs, simple /Names /EmbeddedFiles attachment trees, simple catalog /AF associated-file arrays, and simple /OCProperties optional-content metadata are preserved from the first source; the two-page line-item statement fixture now guards merge-after-split readback through the logical model; currently scoped to parser-supported PDFs |
| Manipulate | Extract pages | Partial | PdfPageExtractor.ExtractPages(byte[]/path/stream, pageNumbers), ExtractPageRange(...), and ExtractPageRanges(..., PdfPageRange...) create a new PDF from selected pages/ranges in requested order, including repeated selections and overlapping ranges as cloned page objects, preserves simple reachable URI and named-destination link annotations plus simple direct catalog /PageMode, /PageLayout, /Version, /Lang, simple direct /PageLabels number trees, simple outlines including simple GoTo action outline entries whose destinations point only at copied pages, direct /Dests dictionaries, simple /Names /Dests name trees including leaf /Kids, destination-array /OpenAction entries, simple GoTo open-action dictionaries, simple /ViewerPreferences dictionaries, simple catalog /Metadata XMP XML streams, simple catalog /URI base dictionaries, simple /OutputIntents metadata graphs, simple /Names /EmbeddedFiles attachment trees, simple catalog /AF associated-file arrays, and simple /OCProperties optional-content metadata, with copied-page labels reindexed, stale destinations/open actions pruned, stale outline trees/name-tree destinations dropped, and stale named-destination link annotations removed when their target pages are not copied; helpers can return bytes from path inputs and write byte, stream, or path inputs to caller-owned output streams |
| Manipulate | Import pages | Partial | PdfPageImporter.AppendPages, PrependPages, InsertPages, InsertPageRange, AppendPageRanges, PrependPageRanges, and InsertPageRanges import selected one-based source pages, inclusive source ranges from firstPage / lastPage pairs or PdfPageRange, parsed range lists, repeated selections/ranges as cloned pages, or all source pages when no selection is supplied, before, after, or inside a target PDF using byte-array, path, or stream inputs; InsertPages, InsertPageRange, and InsertPageRanges keep the target document as the primary catalog/metadata source even when inserting at page 1; helpers return bytes, write to paths, or write byte, stream, or path inputs to caller-owned output streams for wrapper pipelines and reuse parser-supported extraction plus merge object-copy behavior |
| Manipulate | Duplicate pages | Partial | PdfPageEditor.DuplicatePages(byte[]/path/stream, pageNumbers), DuplicatePageRange(byte[]/path/stream, firstPage, lastPage or PdfPageRange), and DuplicatePageRanges(..., PdfPageRange...) keep original document order and insert cloned copies immediately after each selected source page, including repeated page selections or repeated/overlapping parsed ranges as repeated clones, with byte-returning path helpers and output stream/path helpers for byte, stream, or path inputs in wrapper pipelines |
| Manipulate | Move pages | Partial | PdfPageEditor.MovePages(byte[]/path/stream, insertBeforePageNumber, pageNumbers), MovePageRange(byte[]/path/stream, insertBeforePageNumber, firstPage, lastPage or PdfPageRange), and MovePageRanges(..., PdfPageRange...) move selected one-based source pages, inclusive page ranges, or parsed range lists as a group in original relative order before another source page, or to the end with pageCount + 1; range-list movement treats overlaps as one moved page set and helpers include byte-returning path helpers plus output stream/path helpers for byte, stream, or path inputs in wrapper pipelines |
| Manipulate | Reorder pages | Partial | PdfPageEditor.ReorderPages(byte[]/path/stream, pageNumbers) and ReorderPageRanges(byte[]/path/stream, PdfPageRange...) create a new PDF containing every page exactly once in the requested order; range-list reorder can reuse PdfPageRange.ParseMany("3,1-2") for wrapper grammar, return bytes from file paths, or write byte, stream, or path inputs to output streams |
| Manipulate | Delete pages | Partial | PdfPageEditor.DeletePages(byte[]/path/stream, pageNumbers), DeletePageRange(byte[]/path/stream, firstPage, lastPage or PdfPageRange), and DeletePageRanges(..., PdfPageRange...) create a new PDF without selected pages, one inclusive page range, or a parsed range list; overlapping delete ranges are treated as one deletion set; helpers can return bytes from file paths or write byte, stream, or path inputs to output streams, and deleting every page is rejected |
| Manipulate | Rotate pages | Partial | PdfPageEditor.RotatePages(byte[]/path/stream, degrees, pageNumbers), RotatePageRange(byte[]/path/stream, degrees, firstPage, lastPage or PdfPageRange), and RotatePageRanges(..., PdfPageRange...) set /Rotate for selected pages, inclusive page ranges, parsed range lists, or all pages when no selection is supplied; range-list rotation treats overlaps as one selected page set and can return bytes from file paths or write byte, stream, or path inputs to output streams |
| Manipulate | Update metadata | Partial | PdfMetadataEditor.UpdateMetadata(byte[]/stream/path, ...) and UpdateMetadataToBytes(path, ...) preserve unspecified fields, while ReplaceMetadata(byte[]/stream/path, ...), ReplaceMetadataToBytes(path, ...), and path output helpers replace the Info dictionary fields; helpers can write byte, stream, or path inputs to caller-owned output streams, and path helpers can also return bytes |
| Manipulate | Text/image stamp/watermark | Partial | PdfStamper.StampText(byte[]/stream/path, ...), StampTextToBytes(path, ...), WatermarkText(byte[]/stream/path, ...), WatermarkTextToBytes(path, ...), StampImage(byte[]/stream/path PDF, byte[]/stream image, ...), StampImageToBytes(path PDF, byte[]/stream image, ...), WatermarkImage(byte[]/stream/path PDF, byte[]/stream image, ...), and WatermarkImageToBytes(path PDF, byte[]/stream image, ...) append content streams to selected pages, return bytes for wrapper pipelines, and can write byte, stream, or path PDF inputs to paths or caller-owned output streams; PdfTextStampOptions.UsePageRange(...) / UsePageRanges(...) and PdfImageStampOptions.UsePageRange(...) / UsePageRanges(...) select inclusive one-based page ranges or parsed range lists from firstPage / lastPage pairs or PdfPageRange without wrappers materializing page arrays, with overlapping range-list selections treated as one page selection set; simple PNG alpha soft masks are preserved for image stamps/watermarks |
| Forms | Inspect fields | Partial | PdfInspector.Inspect(...) and Preflight(...).DocumentInfo can list simple AcroForm fields through PdfDocumentInfo.FormFields, including document-level /NeedAppearances, /SigFlags, named /SigFlags helpers for signatures-exist and append-only, and /DA, fully qualified names, raw field types, typed PdfFormFieldKind, simple display Value, scalar or array Values, simple default display DefaultValue, scalar or array DefaultValues, selected/default-selected choice-option matching, alternate/mapping names, inherited common /Ff flag helpers such as read-only/required/no-export/text/button/choice/signature/button-kind/choice-kind hints, inherited text /MaxLen, inherited AcroForm/field-tree /DA default appearance strings, inherited /Q text alignment, inherited simple choice /Opt options with export/display text, distinct widget page numbers per field, field-local WidgetsByPageNumber and GetWidgets(int) helpers, and simple widget annotation field-name/page/rectangle/current-appearance/normal-appearance-state metadata plus named annotation /F flag helpers when readable; PdfDocumentInfo and PdfLogicalDocument expose FormFieldsByName, FormFieldsByKind, FormFieldsByPageNumber, FormFieldNames, TryGetFormField(...), GetFormFields(PdfFormFieldKind), and GetFormFields(int) so wrappers can query the same simple fields without hand-scanning raw lists, plus document-level and page-level FormWidgets, FormWidgetsByFieldName, FormWidgetsByPageNumber, GetFormWidgets(string), and GetFormWidgets(int) lookup helpers for widget geometry and appearance state; rewrite-style page manipulation remains blocked for form PDFs until broader preservation exists |
| Forms | Fill fields | Partial | PdfFormFiller.FillFields(...) can update simple AcroForm text/choice-style string values and button name values by fully qualified field name from bytes, paths, or streams, accepts choice values as export values or /Opt display text when available while storing the export value and painting display text, supports multi-select choice arrays through PdfFormFieldValue.FromValues(...), updates radio button groups by switching only the matching child widget appearance state on, generates simple text-widget normal appearance streams and simple button-widget Off/selected appearance states for widgets with /Rect, marks /NeedAppearances true, returns bytes from path inputs, writes path inputs to paths or caller-owned output streams, and rejects signed or active-content PDFs; rich widgets, JavaScript actions, and full appearance regeneration remain roadmap work |
| Forms | Flatten forms | Partial | PdfFormFiller.FlattenFields(...) and FillAndFlattenFields(...) can paint simple text-widget appearances, simple choice-widget text appearances with /Opt display text when available for scalar or array selected values, and simple button-widget normal appearance states into page content, generating minimal button appearances when needed, remove those page annotations, and remove the AcroForm tree for parser-supported PDFs from bytes, paths, or streams; helpers return bytes from path inputs and write path inputs to paths or caller-owned output streams; rich/custom appearances, JavaScript actions, and safe complex form preservation remain roadmap work |
| Security | Encryption/signatures/redaction | Partial | PdfInspector.Probe reports encryption/signature/form/outline/catalog-view-setting/page-label/catalog-name-tree/named-destination/open-action/viewer-preference/tagged-structure/XMP-metadata/catalog-URI/output-intent/embedded-file/optional-content/active-content markers and PdfInspector.Preflight turns unsupported markers into read/rewrite decisions with diagnostics plus structured PdfReadBlockerKind and PdfRewriteBlockerKind entries; encrypted PDFs fail with a clear unsupported diagnostic for parser-supported read/manipulation flows; signed PDFs, form PDFs, complex outline PDFs, complex page-label PDFs, unsupported catalog name-tree PDFs, malformed or unsupported named-destination name-tree PDFs, complex open-action dictionary PDFs, complex viewer-preference PDFs, complex XMP metadata PDFs, complex catalog URI PDFs, tagged PDFs, complex output-intent PDFs, complex embedded-file/associated-file PDFs, complex optional-content PDFs, and active-content PDFs are blocked for rewrite-style manipulation. Simple direct catalog view settings, simple outlines including simple GoTo action outline entries, simple direct page labels, direct named destinations, simple destination name trees including leaf /Kids, destination-array open actions, simple GoTo open-action dictionaries, simple viewer preferences, simple catalog XMP metadata streams, simple catalog URI base dictionaries, simple output intents, simple embedded-file attachment trees, simple associated-file arrays, and simple optional-content metadata are preserved. Creation, validation, redaction, and encrypted reading remain planned |
| Convert | Word to PDF without QuestPDF | Partial | OfficeIMO.Word.Pdf now defaults to the first-party engine; PdfSaveOptions.PageSize and Margins provide a QuestPDF-free page setup surface using first-party OfficeIMO.Pdf geometry types, with explicit PageSize geometry preserved unless PdfSaveOptions.Orientation is set; the current native path maps basic Word sections, page setup, Word document background color, Word section columns with explicit and inline paragraph column breaks, explicit unequal section column widths, Word section column separator lines, and heading/keep-with-next-aware automatic distribution for multi-column sections without explicit breaks, page breaks, headings including linked headings, paragraphs/runs with common Word/PDF font family requests mapped to standard Helvetica, Times, and Courier PDF families, isolated run color, font-size, superscript/subscript baseline, justified paragraph alignment, text-wrapping breaks, and highlight/background state, paragraph spacing/indents, simple tab stops with leaders/alignment, keep-with-next/keep-lines/widow-control flags, simple shaded and uniform/non-uniform bordered paragraphs, Word horizontal lines and paragraph top/bottom border rules, simple level-0 bullet/decimal lists with rich list-item runs, list-item bookmarks, links/bookmarks with tooltip metadata, generated table-of-contents entries with internal links to heading destinations, heading-based PDF outlines, footnote/endnote markers, simple tables with supported Word table style presets, rich text runs inside table cells, default and per-cell table margins, table cell spacing, table-level borders, uniform/non-uniform, double, and diagonal cell borders, uniform and non-uniform row heights, row-level break policies, preferred DXA table widths that fit into narrower native PDF column frames, explicit autofit-to-contents tables, cell fills, left/center/right table placement, uniform column and non-uniform per-cell horizontal/vertical alignment, simple merged cells, separated first-row visual table styling and repeated leading table header rows, and linked cells including linked merged cells, paragraph-aligned images, simple VML shapes plus the DrawingML preset flow shapes exposed by WordShape, simple body text boxes rendered through first-party panel paragraphs, simple body, table-cell, header, and footer picture content controls rendered as first-party PDF images, simple body repeating-section text items rendered as ordinary first-party PDF paragraphs, simple table-cell repeating-section text items rendered as first-party rich table-cell text, simple header/footer repeating-section text items rendered as first-party zone text, simple header/footer text boxes with extractable text routed through first-party zones, simple inline body/table/header/footer text content controls, simple body-level and table-cell Word check boxes as inspectable PDF AcroForm check boxes with readback and Poppler raster-baseline coverage in the native Word report fixture, simple body-level and table-cell Word dropdown, combo box, and date picker content controls as inspectable PDF AcroForm choice/text fields, simple header/footer Word check boxes, dropdowns, combo boxes, and date pickers as static first-party zone text, simple default/first/even header and footer text/images/shapes with left/center/right paragraph alignment, Word PAGE/NUMPAGES header/footer fields and their simple numeric format switches, and simple header/footer table-cell text/images/shapes mapped to first-party zones, simple footnote/endnote markers with end-of-section note text, metadata, and page-number footer settings including Word section page-number starts/styles into OfficeIMO.Pdf; the Poppler lane now includes a daily-layout Word fixture covering TOC, margins, page background color, columns including inline column breaks, separator lines, fonts, colors, lists, links, images, headers/footers, and a table inside the column flow. PdfSaveOptions.Warnings records unsupported native header/footer visual content such as shapes without supported geometry, text boxes without extractable text, SmartArt, equations, unsupported content controls, and embedded documents, plus unsupported body SmartArt, equations, unsupported header/footer content controls, embedded documents, and unhandled body elements that are not yet faithfully mapped. The old QuestPDF/SkiaSharp engine path has been removed from OfficeIMO.Word.Pdf; remaining work is fidelity and coverage in the first-party exporter |
| Convert | Excel to PDF | Partial | OfficeIMO.Excel.Pdf provides the first Excel-to-PDF package surface. The exporter maps selected or all visible workbook worksheets into first-party OfficeIMO.Pdf headings and tables, honors worksheet print areas, worksheet orientation, worksheet margins, hidden workbook worksheet filtering for default all-sheet exports, hidden worksheet rows and columns, repeated print-title rows through the PDF table header model, manual worksheet row and column page breaks as explicit PDF page breaks while preserving repeated header/title rows across split table chunks, simple worksheet header/footer text zones with first-page and even-page text variants plus page-number, page-count, sheet-name, date, time, workbook file-name, and workbook path tokens, simple line-level header/footer font family/style, font size, and RGB text color when representable as one first-party PDF header/footer line style, and supported header/footer images, worksheet merged cells through PDF table column/row spans, supported worksheet drawing images anchored into exported PDF table cells when the anchor cell is exported and otherwise emitted as PDF flow images in anchor order, supported column/bar/line/area/scatter/radar/pie/doughnut worksheet chart families as first-party vector drawing snapshots when chart data can be read, and common number formats plus basic explicit cell font emphasis, font color, fill color, two-color conditional color-scale fills, conditional data bars, conditional icon-set indicators, horizontal/vertical alignment, simple cell borders including dashed, dotted, dash-dot, double, and diagonal strokes, external cell hyperlinks, internal workbook links as sheet-level PDF named destinations, explicit worksheet column widths, explicit worksheet row heights, manual worksheet print scale, and fit-to-width table sizing through first-party table/rich-text/image primitives; supports explicit page size/margin options through reusable PDF geometry types; can return bytes or write to paths/streams; and now has a Poppler raster baseline for a daily two-sheet workbook covering worksheet header/footer text/images, orientation/margins, merged title cells, fills/borders, number formats, explicit row/column sizing, hidden row/column filtering, anchored worksheet images, chart snapshots, and internal/external links. ExcelPdfSaveOptions.Warnings records unsupported or simplified export features such as mixed or rich per-run worksheet header/footer formatting, unsupported or unreadable worksheet/header/footer images, unsupported or unreadable chart snapshots, and row truncation from MaxRowsPerSheet. Richer worksheet header/footer formatting beyond the current line-level style mapping, cell-specific internal workbook-link destinations, fit-to-height and automatic multi-page pagination/scaling, richer worksheet image placement fidelity beyond exported table-cell anchors, richer chart fidelity beyond initial column/bar/line/area/scatter/radar/pie/doughnut snapshots, richer cell style fidelity such as additional conditional formats and locale-specific formats, richer merged-cell edge cases, and broader unsupported-feature diagnostics remain roadmap work |
| Convert | PowerPoint to PDF | Planned | Later phases after the PDF layout engine matures |
Word-to-PDF equation note: simple OMML equations with extractable math text are mapped as static first-party PDF text in body paragraphs, table cells, headers, and footers. Equation warnings in the convert row refer to equations without extractable text.
For PSWriteOffice parity work, call PdfInspector.Preflight before read or rewrite-style operations and wrap only the rows marked Supported or carefully expose Partial rows with clear naming. Prefer the direct capability gates for command dispatch: CanExtractText for text/structured text readback, CanExtractImages for image extraction, CanReadLogicalObjects for PdfLogicalDocument PDF-to-object conversion, CanManipulatePages for extract/split/merge/import/edit/stamp/metadata rewrite, CanFillSimpleFormFields for simple AcroForm value updates, and CanFlattenSimpleFormFields or CanFillAndFlattenSimpleFormFields for simple text/choice/button-widget flattening. Use Can(PdfPreflightCapability) and GetCapabilityDiagnostics(PdfPreflightCapability) when writing generic wrappers, and keep HasReadBlocker(...), HasRewriteBlocker(...), ReadBlockers, and RewriteBlockers for advanced user-facing explanation, with Diagnostics as the readable log/error text. Creation wrappers should expose Word-like primitives such as document defaults, sections, paragraphs, tables, drawings, images, headers, footers, and page setup rather than template nouns such as invoices or statements. For Word-like table style pickers, show TableStyles.CanonicalWordStyleNames, accept TableStyles.SupportedWordStyleNames, and normalize caller input with TableStyles.GetCanonicalWordStyleName(...) or TryGetCanonicalWordStyleName(...) before storing wrapper configuration. Page extraction, range extraction, splitting, basic merge, duplicate, move, delete, reorder, rotate, metadata editing, text/image stamp/watermark operations, simple form-field inventory with typed field kind, common flag helpers, scalar or array current/default values, selected/default-selected choice options, text max length, and choice options, simple AcroForm value fill with basic text-widget appearances, and simple text/choice/button-widget flattening can now be wrapped as early capabilities, but richer image transparency cases, full appearance regeneration, complex form flattening, and advanced page editing should stay behind feature work until the import/edit pipeline is stronger.