Releases: EmilStenstrom/justhtml
Releases · EmilStenstrom/justhtml
Release v1.12.0
Security
- (Severity: High) Markdown output now HTML-escapes text-node content before applying Markdown escaping, preventing attacker-controlled text such as
<script>from turning into raw HTML whento_markdown()output is rendered. - (Severity: Moderate) Sanitization now hardens
scriptandstyleraw-text content by neutralizing embedded closing-tag sequences and dropping non-text children, preventing sanitized DOM trees from serializing into breakout HTML.
Release v1.11.0
Added
- Sanitization: Add
SanitizationPolicy.strip_invisible_unicodeto strip invisible Unicode used for obfuscation from text and attribute values before other sanitizer checks run.
Changed
- Sanitization:
strip_invisible_unicodeis enabled by default and covers variation selectors, zero-width/bidi controls, and private-use characters.
Security
- (Severity: Low) Harden sanitization against invisible-Unicode obfuscation in text, attributes, and URL-like values such as disguised
javascript:schemes.
Release v1.10.0
Security
- (Severity: Low) Harden JustHTML against denial-of-service from attacker-controlled deeply nested HTML. Parsing post-processing, deep cloning, pretty HTML serialization, and Markdown rendering now use iterative traversal instead of recursion, preventing
RecursionErrorcrashes on pathological nesting.
Release v1.9.1
Fixed
- Serialization: Preserve literal text inside
scriptandstyleelements during HTML serialization so round-trips do not turn raw text content like>or&into entity text.
Release v1.9.0
Added
- Builder: Add
justhtml.builderwith explicitelement(),text(),comment(), anddoctype()factories for programmatic HTML construction. - Parser: Allow
JustHTML(...)to accept built nodes directly and normalize them through the existing HTML5 parser. - Docs: Add a dedicated Building HTML guide and expand the API/README documentation around programmatic HTML generation.
Changed
- Sanitization: Preserve doctypes by default in document mode.
- Sanitization: Add
<caption>to the default allowed tag set. - Typing: Normalize
SanitizationPolicy.allowed_tagstofrozenset[str], improving type safety when composing policies.
Fixed
- Builder & Serialization: Preserve arbitrary doctype names and identifiers across build/serialize/parse round-trips.
- Builder: Reject unsupported namespaces up front; builder namespaces are limited to HTML, SVG, and MathML.
Release v1.8.0
Added
- CLI: Add
--strictflag to fail with exit code 2 and print an error message on any parse error.
Release v1.7.0
Added
- Selectors: Add
query_one()onJustHTMLandNodefor retrieving the first match (orNone).
Fixed
- Packaging: Include
py.typedin wheels for PEP 561 type hinting support.
Changed
- Performance: ~9% faster
JustHTML(...).to_html(pretty=False)than 1.6.0 on theweb100kjusthtml_to_htmlbenchmark (200 files x 3 iterations): 7.244s -> 6.571s (median). - Performance: Multiple internal speedups in serializer, tokenizer, tree builder, and transforms for lower per-document overhead.
Docs
- Expand API and selector documentation (including performance notes).
Release v1.6.0
Added
- Text extraction: Add
separator_blocks_onlytoto_text()(and CLI--separator-blocks-only) to only applyseparatorbetween block-level elements.
Changed
- Transforms: Improve performance of URL attribute handling and comment sanitization when applying DOM transforms.
Release v1.5.0
Added
- Serialization & Sanitization: Introduce additional serialization contexts, and update docs to talk about the importance of putting your sanitized content in the right context (see docs/sanitization.md).
Changed
- Sanitization: Switch the sanitizer pipeline to be built up entirely of basic transform blocks (see docs/transforms.md).
Changed
- Tokenizer: Add fast-path handling for tag names and attribute parsing to reduce overhead in common cases.
- Sanitization: Speed up URL normalization and scheme validation while preserving policy semantics (see docs/url-cleaning.md).
- Transforms: Optimize sanitizer transform dispatch and attribute rewrite hot paths for lower per-node overhead (see docs/transforms.md).
Release v1.4.0
Changed
- Serializer: Always escape
<and>in attribute values (quoted values) and escape<in unquoted values for spec-compliant output. This follows a whatwg html specification and browser change not yet in the html5lib test suite.