Skip to content

Commit 5347295

Browse files
LntanohuangAsksksnclaude
authored
fix(html-parser): correct h4 heading mapping from ##### to #### (#13833)
## Summary - Fix incorrect Markdown heading mapping for `h4` in `TITLE_TAGS` dictionary - `h4` was mapped to `"#####"` (h5 level) instead of `"####"` (correct h4 level) Closes #13819 ## Details In `deepdoc/parser/html_parser.py`, the `TITLE_TAGS` dictionary had a typo where `h4` was assigned 5 `#` characters instead of 4, causing h4 headings to be converted to h5-level Markdown headings during HTML parsing. ## Test plan - [ ] Parse an HTML document containing `<h4>` tags and verify the output uses `####` (4 hashes) - [ ] Verify other heading levels remain correct 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Asksksn <Asksksn@noreply.gitcode.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 2faaa9f commit 5347295

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

deepdoc/parser/html_parser.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ def get_encoding(file):
3333
"table", "pre", "code", "blockquote",
3434
"figure", "figcaption"
3535
]
36-
TITLE_TAGS = {"h1": "#", "h2": "##", "h3": "###", "h4": "#####", "h5": "#####", "h6": "######"}
36+
TITLE_TAGS = {"h1": "#", "h2": "##", "h3": "###", "h4": "####", "h5": "#####", "h6": "######"}
3737

3838

3939
class RAGFlowHtmlParser:

0 commit comments

Comments
 (0)