Fix partial paragraph highlighting disappearing in {{content}} (Fixes #446)#854
Open
namcusamlc wants to merge 1 commit into
Open
Fix partial paragraph highlighting disappearing in {{content}} (Fixes #446)#854namcusamlc wants to merge 1 commit into
namcusamlc wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR resolves an issue where partially selected highlights on pages with dense
<p>tags (such as Gemini App and Investopedia) would fail to render or partially disappear when evaluating the template's{{content}}variable. This directly addresses GitHub Issue #446 ("BUG: {{content}} no longer adding highlights properly").The Problem
Saved highlights are stored with precise XPaths, text offsets, and character lengths relative to the original page's DOM (e.g.
fullHtml). However, the template extraction pipeline previously:getPageContentresponse payload was stripping all rich highlight metadata—includingxpath,startOffset,endOffset, andid—leaving only a flat array of plain text strings.<p>Mismatches: On pages with dense<p>elements, partial selections (selecting only a few words in a paragraph rather than the full block) could not be matched accurately by plain text searches, leading to highlights silently failing or disappearing during markdown generation.range.surroundContentsto inject<mark>tags, which crashes in standard browser engines if a selection crosses structural tag boundaries (e.g., inline formatting tags like<i>,<strong>,<em>, or<a>).The Solution
src/content.tsandsrc/utils/content-extractor.ts) to return the fullAnyHighlightDataobjects containing XPaths and offsets rather than simple text strings.fullHtmldirectly toprocessHighlights. We now parse the original document DOM first, evaluate the exact storedxpathcoordinates to find the target element, and apply the highlight there.range.surroundContentswith a robustrange.extractContents()pattern. This extracts the content within the highlighted range and appends it inside a new<mark>node before inserting it back, completely avoiding crashes when selections cross inline HTML boundaries.findTextNodeAtOffsethelper that traverses text nodes using aTreeWalkerto pinpoint exact starting and ending offsets for partial selections.DefuddleClass.parse(). This lets Defuddle extract the article structure with the<mark>tags intact, generating flawless markdown highlights.Verification Plan
Manual Verification
<p>tags (Investopedia, Gemini Web App).{{content}}and "Clip to Obsidian" output.