A pure Python HTML5 parser that just works.
Search
Search requires JavaScript. You can still search the docs via GitHub: Search in docs/
<script src="assets/search.js"></script>- Quickstart - Get up and running in 2 minutes
- Learn by examples - Real-world StackOverflow tasks rewritten with JustHTML
- API Reference - Complete public API documentation
- Command Line - Use
justhtmlto extract HTML, text, or Markdown - AI Agent Instructions - Copy/paste usage context for LLMs and coding agents
- Extracting Text -
to_text()andto_markdown() - CSS Selectors - Query elements with familiar CSS syntax
- Transforms - Apply declarative DOM transforms after parsing
- Linkify - Convert URLs/emails in text nodes into links
- Building HTML - Programmatically build node trees and normalize them with
JustHTML(...) - Fragment Parsing - Parse HTML fragments in context
- Sanitization & Security - Overview of safe-by-default sanitization and policy configuration
- HTML Cleaning - Tags/attributes allowlists and inline styles
- URL Cleaning - URL validation, URL handling, and
srcset - Unsafe Handling - What happens when unsafe input is encountered (strip/collect/raise)
- Migrating from Bleach - Guide for replacing Bleach cleaner/filter pipelines
- Streaming - Memory-efficient parsing for large files
- Encoding & Byte Input - How byte streams are decoded (including
windows-1252fallback) - Error Codes - Parse error codes and their meanings
- Correctness Testing - How we verify 100% HTML5 compliance
- Playground - Run JustHTML in your browser