Skip to content

Add bidirectional (RTL) text support#2935

Open
abaye123 wants to merge 1 commit into
bpampuch:masterfrom
abaye123:rtl-support
Open

Add bidirectional (RTL) text support#2935
abaye123 wants to merge 1 commit into
bpampuch:masterfrom
abaye123:rtl-support

Conversation

@abaye123

Copy link
Copy Markdown

Closes #184 (RTL/Hebrew/Arabic rendering).

Summary

Adds proper right-to-left text rendering to pdfmake. Implements the visible parts of the Unicode Bidi Algorithm (UAX#9) at the line level, leaning on fontkit (already part of pdfkit) for the per-text-call glyph reversal that it does well, and filling in the parts fontkit can't see across multiple text() calls.

No changes required in pdfkit — the fix is entirely in pdfmake.

What works

  • rtl: true on any text/table/list node (or in defaultStyle).
  • Auto-bidi for mixed paragraphs even without rtl: true (Hebrew embedded in English etc.).
  • Default alignment becomes right when rtl: true and no explicit alignment is set.
  • Bracket mirroring per UAX#9 L4 ((שלום) renders with the brackets pointing the right way for a Hebrew reader).
  • Currency symbols (₪, €, £, $, ¥, ¢ and U+20A0–U+20CF) treated as ET so '₪3.50' renders as a single LTR group inside RTL text — currency sign visually to the left of the amount.
  • Tables: rtl: true reverses column order, with correct colSpan handling.
  • Lists: rtl: true puts bullets/numbers on the right edge.
  • Margins: [left, top, right, bottom] mirrors to [right, top, left, bottom] in RTL.
  • Justify: last line of a justify paragraph in RTL snaps to the right edge instead of the left.

How it works

  • src/helpers/bidi.js (new) — minimal UAX#9 implementation: bidi class detection, paragraph direction (P2/P3), W1–W7, N1/N2, I1/I2, L1, L2, L4 (bracket mirroring).
  • src/helpers/bidi.js applyBidiToLine() — segments a line's inlines by (sourceInline, bidiLevel, isStrongRTL), applies L2 visual reordering, mirrors brackets in odd-level segments, manually reverses neutral-only RTL segments (that fontkit otherwise leaves in logical order), swaps leadingCut/trailingCut to match fontkit's runtime char reversal.
  • LayoutBuilder.buildNextLine calls applyBidiToLine after the line is built but before it's added to the page.
  • DocMeasure.measureTable reverses the column order when rtl is in effect, walking left-to-right in the original row and writing right-to-left in the new row so colSpan descriptors land at the correct (leftmost) slot of their visual span block.
  • LayoutBuilder.processList reserves the marker gap on the right (vs. left) when rtl is set, and pins the marker to the right end of its block to keep the visible bullet-to-text gap symmetric with LTR.
  • ElementWriter.alignLine recognizes the RTL last-line case for justify.
  • helpers/node.js getNodeMargin mirrors [left, top, right, bottom] when rtl is in effect.

Test plan

  • npm run lint clean
  • npm run mocha — 419 unit tests pass (37 new for bidi/RTL: 32 in tests/unit/helpers/bidi.spec.js, 5 in tests/unit/DocMeasure.spec.js)
  • npm run build:node — clean Babel output
  • Visual regression via examples/rtl_hebrew.js — 10 sections covering pure RTL, mixed bidi, embedded English/digits, alignment, bold/large styling, multi-line wrapping, RTL table with prices, ordered/bullet lists, brackets and quotes, justify

Known limitations (out of scope)

  • Arabic letter shaping (joining/contextual forms) — needs HarfBuzz; out of scope. Hebrew works fully.
  • Explicit bidi embedding controls (LRE/RLE/PDF/LRI/RLI/FSI/PDI) — implicit ordering only.

Sample

examples/rtl_hebrew.js produces examples/pdfs/rtl_hebrew.pdf. Run:

node examples/rtl_hebrew.js

(Uses Arial from C:\Windows\Fonts because Roboto has no Hebrew glyphs.)

Implements UAX#9 visual reordering at the line level so Hebrew/Arabic
content renders in correct visual right-to-left order. The shaping/glyph
reversal itself is left to fontkit (already part of pdfkit) — pdfmake
now only handles the parts fontkit can't see across: inline ordering,
bracket mirroring, RTL-only inlines, and per-script segmentation.

Document model:
  * New \
tl: true\ style property (works on text nodes, tables, lists,
    and via defaultStyle).
  * When effective rtl=true and no explicit alignment, default alignment
    becomes 'right'.
  * Mixed Hebrew/Latin paragraphs without an explicit rtl flag still
    auto-reorder via bidi (paragraph base = LTR, embedded RTL runs
    handled).

Layout:
  * Per-line UAX#9 pass in LayoutBuilder.buildNextLine — segments inlines
    by source / bidi-level / strong-RTL status, applies L2 reordering,
    L4 bracket mirroring, and a manual reversal for neutral-only RTL
    segments (which fontkit otherwise leaves in logical order, breaking
    visual flow with adjacent Hebrew inlines).
  * leadingCut / trailingCut are swapped on RTL-level inlines to track
    fontkit's runtime char reversal at the line edges.

Tables:
  * \
tl: true\ reverses column order in DocMeasure.measureTable, with
    proper colSpan handling: the descriptor stays at the leftmost slot
    of its (visual) span block so extendWidthsForColSpans keeps working.

Lists:
  * \
tl: true\ puts bullets/numbers on the right edge of the marker
    block (LTR puts them at the left), with the same visual gap to text.

Margins:
  * In RTL mode, \margin: [left, top, right, bottom]\ mirrors so
    margin[0] becomes the visual-right (logical-start) margin.

Justify:
  * Last line of a justify paragraph in RTL snaps to the right margin
    instead of the default left.

Currency / numbers:
  * Currency symbols (₪ € £ ¥ \$ ¢ and U+20A0–U+20CF) classified as ET
    so W5 folds them into the adjacent EN run — '₪3.50' renders as one
    LTR group inside an RTL paragraph, with ₪ visually to the left of
    the digits.

No changes required in pdfkit — fontkit already does Hebrew glyph
ordering for individual text() calls. The pdfmake layer handles
everything fontkit can't see across.

New example: examples/rtl_hebrew.js (10 sections covering pure RTL,
mixed bidi, alignment, lists, tables, brackets, justify).
Tests: tests/unit/helpers/bidi.spec.js + RTL table cases in
tests/unit/DocMeasure.spec.js — total 37 new tests.

Known limitations (out of scope for this PR; can follow up):
  * Arabic shaping (joining/contextual forms) — needs HarfBuzz.
  * Bidi explicit embedding controls (LRE/RLE/PDF/LRI/RLI/FSI/PDI).
@abaye123

Copy link
Copy Markdown
Author

Hi @liborm85, thanks for the consideration! Pushed a fix for the CI failures (amended into the original commit):

  • Download Hebrew font on demand in the example so the build doesn't need it committed
  • Suppress the URL warning in bidi.js
  • Renamed ot[new name] to satisfy the typos check

CI should be green now. Ready to test when you get a chance 🙏

@abaye123

abaye123 commented Jun 4, 2026

Copy link
Copy Markdown
Author

Hi @liborm85 Is there any prospect of a merger?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Right to left language (RTL)

1 participant