Add bidirectional (RTL) text support#2935
Open
abaye123 wants to merge 1 commit into
Open
Conversation
Implements UAX#9 visual reordering at the line level so Hebrew/Arabic
content renders in correct visual right-to-left order. The shaping/glyph
reversal itself is left to fontkit (already part of pdfkit) — pdfmake
now only handles the parts fontkit can't see across: inline ordering,
bracket mirroring, RTL-only inlines, and per-script segmentation.
Document model:
* New \
tl: true\ style property (works on text nodes, tables, lists,
and via defaultStyle).
* When effective rtl=true and no explicit alignment, default alignment
becomes 'right'.
* Mixed Hebrew/Latin paragraphs without an explicit rtl flag still
auto-reorder via bidi (paragraph base = LTR, embedded RTL runs
handled).
Layout:
* Per-line UAX#9 pass in LayoutBuilder.buildNextLine — segments inlines
by source / bidi-level / strong-RTL status, applies L2 reordering,
L4 bracket mirroring, and a manual reversal for neutral-only RTL
segments (which fontkit otherwise leaves in logical order, breaking
visual flow with adjacent Hebrew inlines).
* leadingCut / trailingCut are swapped on RTL-level inlines to track
fontkit's runtime char reversal at the line edges.
Tables:
* \
tl: true\ reverses column order in DocMeasure.measureTable, with
proper colSpan handling: the descriptor stays at the leftmost slot
of its (visual) span block so extendWidthsForColSpans keeps working.
Lists:
* \
tl: true\ puts bullets/numbers on the right edge of the marker
block (LTR puts them at the left), with the same visual gap to text.
Margins:
* In RTL mode, \margin: [left, top, right, bottom]\ mirrors so
margin[0] becomes the visual-right (logical-start) margin.
Justify:
* Last line of a justify paragraph in RTL snaps to the right margin
instead of the default left.
Currency / numbers:
* Currency symbols (₪ € £ ¥ \$ ¢ and U+20A0–U+20CF) classified as ET
so W5 folds them into the adjacent EN run — '₪3.50' renders as one
LTR group inside an RTL paragraph, with ₪ visually to the left of
the digits.
No changes required in pdfkit — fontkit already does Hebrew glyph
ordering for individual text() calls. The pdfmake layer handles
everything fontkit can't see across.
New example: examples/rtl_hebrew.js (10 sections covering pure RTL,
mixed bidi, alignment, lists, tables, brackets, justify).
Tests: tests/unit/helpers/bidi.spec.js + RTL table cases in
tests/unit/DocMeasure.spec.js — total 37 new tests.
Known limitations (out of scope for this PR; can follow up):
* Arabic shaping (joining/contextual forms) — needs HarfBuzz.
* Bidi explicit embedding controls (LRE/RLE/PDF/LRI/RLI/FSI/PDI).
Author
|
Hi @liborm85, thanks for the consideration! Pushed a fix for the CI failures (amended into the original commit):
CI should be green now. Ready to test when you get a chance 🙏 |
Author
|
Hi @liborm85 Is there any prospect of a merger? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #184 (RTL/Hebrew/Arabic rendering).
Summary
Adds proper right-to-left text rendering to pdfmake. Implements the visible parts of the Unicode Bidi Algorithm (UAX#9) at the line level, leaning on fontkit (already part of pdfkit) for the per-text-call glyph reversal that it does well, and filling in the parts fontkit can't see across multiple
text()calls.No changes required in pdfkit — the fix is entirely in pdfmake.
What works
rtl: trueon any text/table/list node (or indefaultStyle).rtl: true(Hebrew embedded in English etc.).rightwhenrtl: trueand no explicit alignment is set.(שלום)renders with the brackets pointing the right way for a Hebrew reader).'₪3.50'renders as a single LTR group inside RTL text — currency sign visually to the left of the amount.rtl: truereverses column order, with correct colSpan handling.rtl: trueputs bullets/numbers on the right edge.[left, top, right, bottom]mirrors to[right, top, left, bottom]in RTL.How it works
src/helpers/bidi.js(new) — minimal UAX#9 implementation: bidi class detection, paragraph direction (P2/P3), W1–W7, N1/N2, I1/I2, L1, L2, L4 (bracket mirroring).src/helpers/bidi.jsapplyBidiToLine()— segments a line's inlines by(sourceInline, bidiLevel, isStrongRTL), applies L2 visual reordering, mirrors brackets in odd-level segments, manually reverses neutral-only RTL segments (that fontkit otherwise leaves in logical order), swaps leadingCut/trailingCut to match fontkit's runtime char reversal.LayoutBuilder.buildNextLinecallsapplyBidiToLineafter the line is built but before it's added to the page.DocMeasure.measureTablereverses the column order whenrtlis in effect, walking left-to-right in the original row and writing right-to-left in the new row so colSpan descriptors land at the correct (leftmost) slot of their visual span block.LayoutBuilder.processListreserves the marker gap on the right (vs. left) whenrtlis set, and pins the marker to the right end of its block to keep the visible bullet-to-text gap symmetric with LTR.ElementWriter.alignLinerecognizes the RTL last-line case for justify.helpers/node.jsgetNodeMarginmirrors[left, top, right, bottom]whenrtlis in effect.Test plan
npm run lintcleannpm run mocha— 419 unit tests pass (37 new for bidi/RTL: 32 intests/unit/helpers/bidi.spec.js, 5 intests/unit/DocMeasure.spec.js)npm run build:node— clean Babel outputexamples/rtl_hebrew.js— 10 sections covering pure RTL, mixed bidi, embedded English/digits, alignment, bold/large styling, multi-line wrapping, RTL table with prices, ordered/bullet lists, brackets and quotes, justifyKnown limitations (out of scope)
Sample
examples/rtl_hebrew.jsproducesexamples/pdfs/rtl_hebrew.pdf. Run:(Uses Arial from
C:\Windows\Fontsbecause Roboto has no Hebrew glyphs.)