Skip to content

layout: SplitAfterLine produces literal hyphen artifact when hyphenated word pair lands in same half #256

@carlos7ags

Description

@carlos7ags

Summary

When SplitAfterLine (added in PR #253) is called on a paragraph with SetHyphens("auto"), and a hyphenated word pair lands BOTH in the same head/tail half, the reconstructed paragraph's run text contains a literal - artifact.

Example: original word "linguistic" hyphenates to part="linguis-" + rest="tic" between rendered lines 5 and 6. If the user splits at line 8, both halves of the pair go into the head. cloneWithWords joins them as "linguis-tic" (with literal hyphen) instead of recombining to "linguistic".

The user-visible artifact is a hyphen mid-word in the rendered output of the head, in a position that wasn't in the original source text.

Why it doesn't bite the page-split path

Hyphenation runs only inside Paragraph.Layout() (paragraph.go ~line 390), not in wrapWords which is what PlanLayout (page-split) uses. PR #250 added a guard test (TestNoHyphenInPageSplitOverflow) locking this in. So today only SplitAfterLine is affected, since it goes through Layout.

Documented behavior

TestSplitAfterLineHyphenationInternalToHead in layout/split_test.go documents the current state — head re-lays at the same width with stable line count, but the hyphen artifact persists in the run text.

Fix sketch

  1. Add Word.HyphenatedBoundary bool field.
  2. hyphenateWord (paragraph.go ~line 868) sets it on both part and rest.
  3. cloneWithWords (paragraph.go ~line 1646), when joining two consecutive words that BOTH have HyphenatedBoundary == true, strips the trailing - from the prev text and joins without space, recovering the original word.
  4. wordToRun does NOT propagate the flag (it's a measurement-time signal, not user data).

When the pair is split across halves (part in head, rest in tail), each half's words slice contains only one of them, so neither half's join logic triggers. Each renders correctly: head ends with "linguis-", tail starts with "tic".

Scope

Narrow — only affects callers using SetHyphens("auto") AND SplitAfterLine. Not user-visible for the HubSpot-PDF clamp/appendix flow unless that flow opts into hyphenation.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions