Skip to content

Docx Renderer Extension

Vladimir Schneider edited this page Aug 13, 2019 · 11 revisions

flexmark-java Docx-Renderer extension

Overview

Renders the parsed Markdown AST to docx format using the docx4j library.

See the DocxConverterCommonMark Sample for code and Customizing Docx Rendering for an overview and information on customizing the styles.

Pegdown version can be found in DocxConverterPegdown Sample

⚠️ Emoji extension with Java7 will not load GitHub provided images. Use Java8+ or do not set EmojiExtension.USE_SHORTCUT_TYPE to EmojiShortcutType.GITHUB or EmojiShortcutType.ANY_GITHUB_PREFERRED which causes GitHub provided images to be used.

Syntax

Renders AST generated by flexmark-java parser. No special syntax is implemented by this extension.

Limited Attributes Node Handling

  • .className on paragraph elements will set the docx styleId to className if the style id is found. This allows using specific style ids to change formatting for paragraphs.
  • Use {style=""} to set attributes on text or block elements. Only the following are processed:
    • color - text color
    • background-color - shade fill color, pattern always solid.
    • font-family - not implemented
    • font-size - not implemented
    • font-weight - set/clear bold (if using numeric weights then >= 550 sets bold, less clears it)
    • font-style - set/clear italic

Parsing Details

artifact: flexmark-docx-converter

The following options are available:

Defined in DocxRenderer class:

  • STYLES_XML default getResourceString("/styles.xml") , default styles section if missing in wordprocessing package
  • NUMBERING_XML default getResourceString("/numbering.xml") , default numbering section if missing in wordprocessing package
  • RENDER_BODY_ONLY default false , when rendering to string will only output the body of the document part. Used for tests.
  • MAX_IMAGE_WIDTH default 0 , max image width, 0 no max
  • DEFAULT_LINK_RESOLVER default true , use default link resolver, which uses the DOC_RELATIVE_URL and DOC_ROOT_URL options
  • DOC_RELATIVE_URL default "" , the prefix to use for all relative URLs: not starting with protocol or /
  • DOC_ROOT_URL default "" , the prefix to use for all absolute URLs: ones starting with /
  • LINEBREAK_ON_INLINE_HTML_BR default true , convert inline HTML <br> to line break in the docx
  • TABLE_CAPTION_TO_PARAGRAPH default true , convert table captions to paragraphs, styled with TableCaption style id
  • TABLE_CAPTION_BEFORE_TABLE default false , insert caption before table
  • TOC_GENERATE default false , whether to generate TOC, even if no TOC Markdown element is present in the file
  • TOC_INSTRUCTION default "TOC \\o \"1-3\" \\h \\z \\u " , defines the instruction string used for the TOC element
  • NO_CHARACTER_STYLES default false , when true will not set character style but explicitly set the run values from the style
  • CODE_HIGHLIGHT_SHADING default "" , when non-empty will use this color as a highlight, also overrides NO_CHARACTER_STYLES to true, see NOTE on Highlight Colors colors.
  • DOC_EMOJI_IMAGE_VERT_OFFSET default -0.10 , vertical offset of emoji image as a factor of line height at point of insertion. The final value is rounded to nearest pt so jumps of 1 pt for small changes of this value can occur.
  • DOC_EMOJI_IMAGE_VERT_SIZE default 1.05 , size of emoji image as a factor of line height at point of insertion.
  • LOCAL_HYPERLINK_MISSING_HIGHLIGHT default "red" , when non-empty will highlight unresolved hyperlinks local to the document with this color. see NOTE on Highlight Colors colors.
  • LOCAL_HYPERLINK_MISSING_FORMAT default "Missing target id: #%s" , when non-empty uses String.format() on the given string with the missing ref anchor as the argument to generate a tooltip for unresolved hyperlinks
  • LOCAL_HYPERLINK_SUFFIX default "" , appends this suffix to in document hyperlink anchor reference. Needed in some cases for post processing.
NOTE on Highlight Colors

Docx format requires a named color. Any color provided that does not match a named color will be converted to the closest named color.

When CODE_HIGHLIGHT_SHADING is set to "shade" then will use the closest named color taken from the SourceText shade fill color if available.

Style Names used for rendering various markdown elements

Block element styles:

  • DEFAULT_STYLE default "Normal", style to use for the markdown element
  • LOOSE_PARAGRAPH_STYLE default "ParagraphTextBody", style to use for loose list type items
  • TIGHT_PARAGRAPH_STYLE default "BodyText", style to use for tight list type items
  • PREFORMATTED_TEXT_STYLE default "PreformattedText", style to use for fenced code and indented code
  • BLOCK_QUOTE_STYLE default "Quotations", style to use for block quotes
  • ASIDE_BLOCK_STYLE default "AsideBlock", style to use for aside blocks
  • HORIZONTAL_LINE_STYLE default "HorizontalLine", style to use for thematic breaks
  • TABLE_CAPTION default "TableCaption", style to use for table captions
  • TABLE_CONTENTS default "TableContents", style to use for table bodies
  • TABLE_HEADING default "TableHeading", style to use for table headings
  • FOOTNOTE_STYLE default "Footnote", style to use for footnote text
  • BULLET_LIST_STYLE default "BulletList", numbering list style to use for bullet list item paragraph
  • NUMBERED_LIST_STYLE default "NumberedList", numbering list style to use for numbered list item paragraph

Inline element styles:

  • BOLD_STYLE default "StrongEmphasis", style to use for hte markdown element
  • ITALIC_STYLE default "Emphasis", style to use for hte markdown element
  • STRIKE_THROUGH_STYLE default "Strikethrough", style to use for hte markdown element
  • SUBSCRIPT_STYLE default "Subscript", style to use for hte markdown element
  • SUPERSCRIPT_STYLE default "Superscript", style to use for hte markdown element
  • INS_STYLE default "Underlined", style to use for hte markdown element
  • INLINE_CODE_STYLE default "SourceText", style to use for hte markdown element
  • HYPERLINK_STYLE default "Hyperlink", style to use for hte markdown element
  • FOOTNOTE_ANCHOR_STYLE default "FootnoteReference", style to use for hte markdown element

List Element Styles

Unordered lists use numbering list style named BulletList while ordered lists use NumberedList. If these are not present then default numbering style (id = 2) is used for unordered lists and default numbering style (id = 3) is used for ordered lists.

The following are equivalent to Renderer properties of the same name. Included in DocxRenderer for convenience.

For the TOC_INSTRUCTION string see Docx4j GettingStarted under the heading TOC Content Control

NOTE: Word does not handle inserted HTML very well. Any HTML not suppressed will be escaped: ie. it will render into the document as text. The exception is for the <br> tag which if enabled will be rendered as a line break.

  • ESCAPE_HTML_BLOCKS default value of ESCAPE_HTML, escape html blocks found in the document
  • ESCAPE_HTML_COMMENT_BLOCKS default value of ESCAPE_HTML_BLOCKS, escape html comment blocks found in the document.
  • ESCAPE_HTML default false, escape all html found in the document
  • ESCAPE_INLINE_HTML_COMMENTS default value of ESCAPE_HTML_BLOCKS, escape inline html found in the document
  • ESCAPE_INLINE_HTML default value of ESCAPE_HTML, escape inline html found in the document
  • PERCENT_ENCODE_URLS default false, percent encode urls
  • RECHECK_UNDEFINED_REFERENCES default false, Recheck the existence of refences in Parser.REFERENCES for link and image refs marked undefined. Used when new references are added after parsing
  • SUPPRESS_HTML_BLOCKS default value of SUPPRESS_HTML, suppress html output for html blocks
  • SUPPRESS_HTML_COMMENT_BLOCKS default value of SUPPRESS_HTML_BLOCKS, suppress html output for html comment blocks
  • SUPPRESS_HTML default false, suppress html output for all html
  • SUPPRESS_INLINE_HTML_COMMENTS default value of SUPPRESS_INLINE_HTML, suppress html output for inline html comments
  • SUPPRESS_INLINE_HTML default value of SUPPRESS_HTML, suppress html output for inline html