Fix markdown task list parsing after blank lines#3018
Conversation
|
Hi maintainers, could you please review this PR and approve the workflows? This fixes the task list parsing issue when there's a blank line before it. Thanks! |
|
Hi @matthew-carroll ! 👋 I wanted to clarify the CI failures in this PR — they appear to be pre-existing issues unrelated to my changes. My changes only touch:
Regarding the CI failures:
All 165 unit tests in the markdown test suite pass with my changes ✅ Thanks for reviewing! |
|
@dreamwithdiki I'm currently testing an AI reviewer on this PR, so ignore that. But in the meantime, for the other failures, please rebase your work on the latest from main. I don't think there should be any failures with the latest on main. Or maybe just one build failure. |
…reamwithdiki/super_editor into fix/task-list-after-blank-line
|
@dreamwithdiki can you update the PR description to include an explanation about what was wrong with the original implementation? Ideally, provide some literal Markdown samples to make it clear. I'll need to see that to be able to validate how you solved it. |
|
@matthew-carroll I have updated the PR description with a detailed explanation of the original issue, the AST structure changes caused by loose lists, and how this PR solves it. Let me know if you need any further clarification! |
Fixes #2924
What was wrong with the original implementation?
The original implementation of
_addTaskassumed that all task list items were "tight lists," meaning it expected the checkbox<input>element to always be the directfirstChildof the<li>element.However, according to the standard Markdown specification, if there is a blank line between list items, the list becomes a "loose list". When the
markdownpackage parses a loose list, it wraps the text content of the list items inside a<p>(paragraph) tag.Literal Markdown Sample:
In the AST generated by the markdown parser, the above loose list produces this structure:
Because the
firstChildof the<li>became a<p>tag instead of the<input>tag, the original logic failed to recognize it as a task item and either failed to parse or misinterpreted the node, leading to the deserialization failure.How this PR solves it:
I refactored the task parsing logic to be resilient against paragraph wrappers:
_findTaskCheckbox(Element element): Instead of blindly checking the first child, it recursively searches for the<input type="checkbox">within the list item's children._taskTextContent(Element element): It safely extracts the raw text content of the task, ignoring the<input>element itself and safely stripping out any structural whitespaces introduced by the<p>wrapper.This ensures that both tight lists (direct
<input>) and loose lists (<p>wrapped<input>) are deserialized correctly intoTaskNodeobjects. I have also added a comprehensive unit test to prevent regression on this specific case.