Skip to content

Latest commit

 

History

History
34 lines (25 loc) · 1.66 KB

File metadata and controls

34 lines (25 loc) · 1.66 KB

Changelog

All notable changes to smelt are documented here.

0.1.0 — 2026-03-05

First public release.

New Features

  • Universal convert tool — single entry point for PDF, DOCX, PPTX, XLSX, images, audio, web pages, YouTube, HTML, EPUB, and text files. 10 additional per-type tools available via expose_specialized_tools config
  • 8 backends with automatic fallback: MinerU, Docling, PyMuPDF, Trafilatura, faster-whisper, YouTube transcript API, BeautifulSoup, Pandoc
  • 4 output formats: Markdown, XML, JSON, DocTags — all tools accept a format parameter
  • Conversion cache with LRU eviction and 4 MCP resource endpoints (list, full text, metadata, per-page)
  • Image handling modes — reference, base64, or discard — configurable per format
  • Batch conversion — convert all supported files in a directory in one call
  • Backend and pagination controls — choose backend, page ranges, and truncation limits per request
  • YAML configuration with env-var overrides (SMELT_{SECTION}_{KEY}) and multi-path search
  • Docker support — CPU and GPU compose profiles with pre-built Dockerfiles

Bug Fixes

  • Fix exception type mismatch between pipeline and backends (proper NoBackendError)
  • Fix empty table crash when all cells are blank
  • Fix heading level parsing in backends
  • Fix cell index tracking for inline tables
  • Add InternalDoc.validate() to catch malformed documents early

Infrastructure

  • Full CI pipeline: lint → typecheck → test → publish (Python 3.11–3.13 matrix)
  • 81 test files covering unit, backend, formatter, integration, and adversarial edge cases
  • ruff + mypy strict mode
  • GPL-3.0-or-later license