All notable changes to smelt are documented here.
First public release.
- Universal
converttool — single entry point for PDF, DOCX, PPTX, XLSX, images, audio, web pages, YouTube, HTML, EPUB, and text files. 10 additional per-type tools available viaexpose_specialized_toolsconfig - 8 backends with automatic fallback: MinerU, Docling, PyMuPDF, Trafilatura, faster-whisper, YouTube transcript API, BeautifulSoup, Pandoc
- 4 output formats: Markdown, XML, JSON, DocTags — all tools accept a
formatparameter - Conversion cache with LRU eviction and 4 MCP resource endpoints (list, full text, metadata, per-page)
- Image handling modes — reference, base64, or discard — configurable per format
- Batch conversion — convert all supported files in a directory in one call
- Backend and pagination controls — choose backend, page ranges, and truncation limits per request
- YAML configuration with env-var overrides (
SMELT_{SECTION}_{KEY}) and multi-path search - Docker support — CPU and GPU compose profiles with pre-built Dockerfiles
- Fix exception type mismatch between pipeline and backends (proper
NoBackendError) - Fix empty table crash when all cells are blank
- Fix heading level parsing in backends
- Fix cell index tracking for inline tables
- Add
InternalDoc.validate()to catch malformed documents early
- Full CI pipeline: lint → typecheck → test → publish (Python 3.11–3.13 matrix)
- 81 test files covering unit, backend, formatter, integration, and adversarial edge cases
- ruff + mypy strict mode
- GPL-3.0-or-later license