← Back to docs
Parse errors that JustHTML can detect and report.
By default, JustHTML silently recovers from errors (like browsers do). To collect errors:
from justhtml import JustHTML
doc = JustHTML ("<p>Hello" , collect_errors = True )
for error in doc .errors :
print (f"{ error .line } :{ error .column } - { error .category } :{ error .code } " )
doc.errors is ordered by source position (line, column), with unknown positions (if any) appearing last.
Each error has a category field:
tokenizer: lexical/tokenization errors
treebuilder: tree construction (structure) errors
security: sanitizer findings (only when you opt in via unsafe_handling="collect")
To reject malformed HTML entirely:
from justhtml import JustHTML , StrictModeError
try :
doc = JustHTML ("<p>Hello" , strict = True )
except StrictModeError as e :
print (e ) # Shows source location
In strict mode, JustHTML raises on the earliest error by source position.
Error Locations (Line/Column)
JustHTML reports a source location for each parse error as a best-effort pointer to where the parser detected the problem in the input stream.
Coordinates are 1-based: the first character in the input is (line=1, column=1).
Tokenizer-detected character errors (for example unexpected-null-character) should point at the exact offending character within the input, even if that character is emitted as part of a larger run of text.
Tree-builder (structure) errors are associated with the token that triggered the error.
In practice this usually means the error points at (or near) the triggering token location, because the tree builder operates on tokens rather than individual characters.
When available, JustHTML will highlight the full triggering tag range.
EOF-related errors point to the end-of-input position where the parser realized it could not continue.
This means error locations are not universally “at the beginning” or “at the end” of a token: character-level errors point at the character, while token-level (tree builder) errors generally point at the triggering token’s start.
Node Locations (Optional)
Sometimes you want a source location for a node , not just for parse errors.
For performance reasons, node locations are disabled by default . To enable them, pass track_node_locations=True when parsing:
from justhtml import JustHTML
doc = JustHTML ("<p>hi</p>" , track_node_locations = True )
p = doc .query ("p" )[0 ]
print (p .origin_location ) # (1, 1)
print (p .origin_line ) # 1
print (p .origin_col ) # 1
print (p .origin_offset ) # 0 (0-indexed)
Each node exposes best-effort origin metadata:
origin_location -> (line, col) | None (both 1-indexed)
origin_line -> int | None (1-indexed)
origin_col -> int | None (1-indexed)
origin_offset -> int | None (0-indexed offset into the input)
Notes:
If track_node_locations=False (default), these are typically None.
Locations are best-effort. When the tree builder creates or moves nodes as part of error recovery, the reported origin is the location of the token that created the node (or the closest available source position).
Enabling node tracking adds overhead. If you only need error locations, prefer collect_errors=True / strict=True.
Example: Reporting missing includes
import sys
from pathlib import Path
from justhtml import JustHTML
with open (sys .argv [1 ]) as f :
html = f .read ()
doc = JustHTML (html , track_node_locations = True )
for include_node in doc .query ("x-include" ):
src = include_node .attrs .get ("src" , "" )
if not Path (src ).exists ():
line , col = include_node .origin_location or (0 , 0 )
print (f"Missing include source: { src } ({ sys .argv [1 ]} :{ line } .{ col } )" )
Errors detected during tokenization (lexical analysis).
Code
Description
eof-in-doctype
Unexpected end of file in DOCTYPE declaration
eof-in-doctype-name
Unexpected end of file while reading DOCTYPE name
eof-in-doctype-public-identifier
Unexpected end of file in DOCTYPE public identifier
eof-in-doctype-system-identifier
Unexpected end of file in DOCTYPE system identifier
expected-doctype-name-but-got-right-bracket
Expected DOCTYPE name but got >
missing-whitespace-before-doctype-name
Missing whitespace after <!DOCTYPE
abrupt-doctype-public-identifier
DOCTYPE public identifier ended abruptly
abrupt-doctype-system-identifier
DOCTYPE system identifier ended abruptly
missing-quote-before-doctype-public-identifier
Missing quote before DOCTYPE public identifier
missing-quote-before-doctype-system-identifier
Missing quote before DOCTYPE system identifier
missing-doctype-public-identifier
Missing DOCTYPE public identifier
missing-doctype-system-identifier
Missing DOCTYPE system identifier
missing-whitespace-before-doctype-public-identifier
Missing whitespace before DOCTYPE public identifier
missing-whitespace-after-doctype-public-identifier
Missing whitespace after DOCTYPE public identifier
missing-whitespace-between-doctype-public-and-system-identifiers
Missing whitespace between DOCTYPE identifiers
missing-whitespace-after-doctype-name
Missing whitespace after DOCTYPE name
unexpected-character-after-doctype-public-keyword
Unexpected character after PUBLIC keyword
unexpected-character-after-doctype-system-keyword
Unexpected character after SYSTEM keyword
unexpected-character-after-doctype-public-identifier
Unexpected character after public identifier
unexpected-character-after-doctype-system-identifier
Unexpected character after system identifier
Comment Errors
Code
Description
eof-in-comment
Unexpected end of file in comment
abrupt-closing-of-empty-comment
Comment ended abruptly with -->
incorrectly-closed-comment
Comment ended with --!> instead of -->
incorrectly-opened-comment
Incorrectly opened comment
Code
Description
eof-in-tag
Unexpected end of file in tag
eof-before-tag-name
Unexpected end of file before tag name
empty-end-tag
Empty end tag </> is not allowed
invalid-first-character-of-tag-name
Invalid first character of tag name
unexpected-question-mark-instead-of-tag-name
Unexpected ? instead of tag name
unexpected-character-after-solidus-in-tag
Unexpected character after / in tag
Code
Description
duplicate-attribute
Duplicate attribute name
missing-attribute-value
Missing attribute value
unexpected-character-in-attribute-name
Unexpected character in attribute name
unexpected-character-in-unquoted-attribute-value
Unexpected character in unquoted attribute value
missing-whitespace-between-attributes
Missing whitespace between attributes
unexpected-equals-sign-before-attribute-name
Unexpected = before attribute name
Code
Description
eof-in-script-html-comment-like-text
Unexpected end of file in script with HTML-like comment
eof-in-script-in-script
Unexpected end of file in nested script tag
Code
Description
eof-in-cdata
Unexpected end of file in CDATA section
cdata-in-html-content
CDATA section only allowed in SVG/MathML content
Character Reference Errors
Code
Description
control-character-reference
Invalid control character in character reference
illegal-codepoint-for-numeric-entity
Invalid codepoint in numeric character reference
missing-semicolon-after-character-reference
Missing semicolon after character reference
named-entity-without-semicolon
Named entity used without semicolon
noncharacter-character-reference
Noncharacter in character reference
Code
Description
unexpected-null-character
Unexpected NULL character (U+0000)
noncharacter-in-input-stream
Noncharacter in input stream
Errors detected during tree construction.
Code
Description
unexpected-doctype
Unexpected DOCTYPE declaration
unknown-doctype
Unknown DOCTYPE (expected <!DOCTYPE html>)
expected-doctype-but-got-chars
Expected DOCTYPE but got text content
expected-doctype-but-got-eof
Expected DOCTYPE but reached end of file
expected-doctype-but-got-start-tag
Expected DOCTYPE but got start tag
expected-doctype-but-got-end-tag
Expected DOCTYPE but got end tag
Code
Description
unexpected-start-tag
Unexpected start tag in current context
unexpected-end-tag
Unexpected end tag in current context
unexpected-end-tag-before-html
Unexpected end tag before <html>
unexpected-end-tag-before-head
Unexpected end tag before <head>
unexpected-end-tag-after-head
Unexpected end tag after <head>
unexpected-start-tag-ignored
Start tag ignored in current context
unexpected-start-tag-implies-end-tag
Start tag implicitly closes previous element
Code
Description
expected-closing-tag-but-got-eof
Expected closing tag but reached end of file
expected-named-closing-tag-but-got-eof
Expected specific closing tag but reached end of file
Code
Description
invalid-codepoint
Invalid character (U+0000 NULL or U+000C FORM FEED)
invalid-codepoint-before-head
Invalid character before <head>
invalid-codepoint-in-body
Invalid character in <body>
invalid-codepoint-in-table-text
Invalid character in table text
invalid-codepoint-in-select
Invalid character in <select>
invalid-codepoint-in-foreign-content
Invalid character in SVG/MathML content
Code
Description
foster-parenting-character
Text content in table requires foster parenting
foster-parenting-start-tag
Start tag in table requires foster parenting
unexpected-character-implies-table-voodoo
Unexpected character in table triggers foster parenting
unexpected-start-tag-implies-table-voodoo
Start tag in table triggers foster parenting
unexpected-end-tag-implies-table-voodoo
End tag in table triggers foster parenting
unexpected-implied-end-tag-in-table-view
Unexpected implied end tag while closing table
eof-in-table
Unexpected end of file in table
unexpected-cell-in-table-body
Unexpected table cell outside of table row
unexpected-form-in-table
Form element not allowed in table context
unexpected-hidden-input-in-table
Hidden input in table triggers foster parenting
Code
Description
unexpected-token-in-frameset
Unexpected content in <frameset>
unexpected-token-after-frameset
Unexpected content after <frameset>
unexpected-token-after-after-frameset
Unexpected content after frameset closed
Code
Description
unexpected-token-after-body
Unexpected content after </body>
unexpected-char-after-body
Unexpected character after </body>
Column Group / Template Table Context Errors
Code
Description
unexpected-characters-in-column-group
Text not allowed in <colgroup>
unexpected-characters-in-template-column-group
Text not allowed in template column group
unexpected-start-tag-in-column-group
Start tag not allowed in <colgroup>
unexpected-start-tag-in-template-column-group
Start tag not allowed in template column group
unexpected-start-tag-in-template-table-context
Start tag not allowed in template table context
Code
Description
unexpected-start-tag-in-cell-fragment
Start tag not allowed in cell fragment context
unexpected-end-tag-in-fragment-context
End tag not allowed in fragment parsing context
Code
Description
unexpected-hidden-input-after-head
Unexpected hidden input after <head>
Code
Description
unexpected-doctype-in-foreign-content
Unexpected DOCTYPE in SVG/MathML content
unexpected-html-element-in-foreign-content
HTML element breaks out of SVG/MathML content
unexpected-end-tag-in-foreign-content
Mismatched end tag in SVG/MathML content
Code
Description
unexpected-start-tag-in-select
Unexpected start tag in <select>
unexpected-end-tag-in-select
Unexpected end tag in <select>
unexpected-select-in-select
Unexpected nested <select> in <select>
Code
Description
end-tag-too-early
End tag closed early (unclosed children)
adoption-agency-1.3
Misnested tags require adoption agency algorithm
non-void-html-element-start-tag-with-trailing-solidus
Self-closing syntax on non-void element (e.g., <div/>)
image-start-tag
Deprecated <image> tag (use <img> instead)
Errors reported by the sanitizer when you opt in via unsafe_handling="collect".
Code
Description
unsafe-html
Unsafe HTML detected by sanitization policy (see error.message for details)