Skip to content

Releases: EmilStenstrom/justhtml

Release v1.12.0

17 Mar 21:58

Choose a tag to compare

Security

  • (Severity: High) Markdown output now HTML-escapes text-node content before applying Markdown escaping, preventing attacker-controlled text such as <script> from turning into raw HTML when to_markdown() output is rendered.
  • (Severity: Moderate) Sanitization now hardens script and style raw-text content by neutralizing embedded closing-tag sequences and dropping non-text children, preventing sanitized DOM trees from serializing into breakout HTML.

Release v1.11.0

15 Mar 22:04

Choose a tag to compare

Added

  • Sanitization: Add SanitizationPolicy.strip_invisible_unicode to strip invisible Unicode used for obfuscation from text and attribute values before other sanitizer checks run.

Changed

  • Sanitization: strip_invisible_unicode is enabled by default and covers variation selectors, zero-width/bidi controls, and private-use characters.

Security

  • (Severity: Low) Harden sanitization against invisible-Unicode obfuscation in text, attributes, and URL-like values such as disguised javascript: schemes.

Release v1.10.0

15 Mar 14:59

Choose a tag to compare

Security

  • (Severity: Low) Harden JustHTML against denial-of-service from attacker-controlled deeply nested HTML. Parsing post-processing, deep cloning, pretty HTML serialization, and Markdown rendering now use iterative traversal instead of recursion, preventing RecursionError crashes on pathological nesting.

Release v1.9.1

10 Mar 20:09

Choose a tag to compare

Fixed

  • Serialization: Preserve literal text inside script and style elements during HTML serialization so round-trips do not turn raw text content like > or & into entity text.

Release v1.9.0

08 Mar 22:46

Choose a tag to compare

Added

  • Builder: Add justhtml.builder with explicit element(), text(), comment(), and doctype() factories for programmatic HTML construction.
  • Parser: Allow JustHTML(...) to accept built nodes directly and normalize them through the existing HTML5 parser.
  • Docs: Add a dedicated Building HTML guide and expand the API/README documentation around programmatic HTML generation.

Changed

  • Sanitization: Preserve doctypes by default in document mode.
  • Sanitization: Add <caption> to the default allowed tag set.
  • Typing: Normalize SanitizationPolicy.allowed_tags to frozenset[str], improving type safety when composing policies.

Fixed

  • Builder & Serialization: Preserve arbitrary doctype names and identifiers across build/serialize/parse round-trips.
  • Builder: Reject unsupported namespaces up front; builder namespaces are limited to HTML, SVG, and MathML.

Release v1.8.0

05 Mar 17:08

Choose a tag to compare

Added

  • CLI: Add --strict flag to fail with exit code 2 and print an error message on any parse error.

Release v1.7.0

08 Feb 20:42

Choose a tag to compare

Added

  • Selectors: Add query_one() on JustHTML and Node for retrieving the first match (or None).

Fixed

  • Packaging: Include py.typed in wheels for PEP 561 type hinting support.

Changed

  • Performance: ~9% faster JustHTML(...).to_html(pretty=False) than 1.6.0 on the web100k justhtml_to_html benchmark (200 files x 3 iterations): 7.244s -> 6.571s (median).
  • Performance: Multiple internal speedups in serializer, tokenizer, tree builder, and transforms for lower per-document overhead.

Docs

  • Expand API and selector documentation (including performance notes).

Release v1.6.0

06 Feb 22:41

Choose a tag to compare

Added

  • Text extraction: Add separator_blocks_only to to_text() (and CLI --separator-blocks-only) to only apply separator between block-level elements.

Changed

  • Transforms: Improve performance of URL attribute handling and comment sanitization when applying DOM transforms.

Release v1.5.0

01 Feb 23:16

Choose a tag to compare

Added

  • Serialization & Sanitization: Introduce additional serialization contexts, and update docs to talk about the importance of putting your sanitized content in the right context (see docs/sanitization.md).

Changed

  • Sanitization: Switch the sanitizer pipeline to be built up entirely of basic transform blocks (see docs/transforms.md).

Changed

  • Tokenizer: Add fast-path handling for tag names and attribute parsing to reduce overhead in common cases.
  • Sanitization: Speed up URL normalization and scheme validation while preserving policy semantics (see docs/url-cleaning.md).
  • Transforms: Optimize sanitizer transform dispatch and attribute rewrite hot paths for lower per-node overhead (see docs/transforms.md).

Release v1.4.0

29 Jan 21:47

Choose a tag to compare

Changed

  • Serializer: Always escape < and > in attribute values (quoted values) and escape < in unquoted values for spec-compliant output. This follows a whatwg html specification and browser change not yet in the html5lib test suite.