JustHTML Documentation

A pure Python HTML5 parser that just works.

Search requires JavaScript. You can still search the docs via GitHub: Search in docs/

Quickstart - Get up and running in 2 minutes
Learn by examples - Real-world StackOverflow tasks rewritten with JustHTML
API Reference - Complete public API documentation
Command Line - Use justhtml to extract HTML, text, or Markdown
AI Agent Instructions - Copy/paste usage context for LLMs and coding agents
Extracting Text - to_text() and to_markdown()
CSS Selectors - Query elements with familiar CSS syntax
Transforms - Apply declarative DOM transforms after parsing
- Linkify - Convert URLs/emails in text nodes into links
Building HTML - Programmatically build node trees and normalize them with JustHTML(...)
Fragment Parsing - Parse HTML fragments in context
Sanitization & Security - Overview of safe-by-default sanitization and policy configuration
- HTML Cleaning - Tags/attributes allowlists and inline styles
- URL Cleaning - URL validation, URL handling, and srcset
- Unsafe Handling - What happens when unsafe input is encountered (strip/collect/raise)
- Migrating from Bleach - Guide for replacing Bleach cleaner/filter pipelines
Streaming - Memory-efficient parsing for large files
Encoding & Byte Input - How byte streams are decoded (including windows-1252 fallback)
Error Codes - Parse error codes and their meanings
Correctness Testing - How we verify 100% HTML5 compliance
Playground - Run JustHTML in your browser

Provide feedback