|
| 1 | +<!DOCTYPE html> |
| 2 | +<html lang="en"> |
| 3 | +<head> |
| 4 | + <meta charset="UTF-8"> |
| 5 | + <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| 6 | + <title>ODT to HTML in JavaScript — Convert .odt Files with odf-kit</title> |
| 7 | + <meta name="description" content="Convert .odt files to HTML in Node.js with odf-kit. Pure JavaScript, no LibreOffice required. npm install odf-kit. Extracts text, headings, tables, lists, bold, italic, and hyperlinks."> |
| 8 | + <meta name="keywords" content="odt to html javascript, npm odt-to-html, odt parser npm, read odt nodejs, convert odt html node, odt-parser npm, npm install odt-to-html, parse odt javascript, odt reader javascript, openDocument html conversion"> |
| 9 | + <meta name="author" content="GitHubNewbie0"> |
| 10 | + <meta name="robots" content="index, follow"> |
| 11 | + <link rel="canonical" href="https://githubnewbie0.github.io/odf-kit/guides/odt-to-html-javascript.html"> |
| 12 | + |
| 13 | + <meta property="og:type" content="article"> |
| 14 | + <meta property="og:title" content="ODT to HTML in JavaScript — odf-kit"> |
| 15 | + <meta property="og:description" content="Convert .odt files to HTML in Node.js with odf-kit. Pure JavaScript, no LibreOffice required."> |
| 16 | + <meta property="og:url" content="https://githubnewbie0.github.io/odf-kit/guides/odt-to-html-javascript.html"> |
| 17 | + |
| 18 | + <link rel="preconnect" href="https://fonts.googleapis.com"> |
| 19 | + <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin> |
| 20 | + <link href="https://fonts.googleapis.com/css2?family=DM+Sans:ital,wght@0,400;0,500;0,700&family=JetBrains+Mono:wght@400;500&display=swap" rel="stylesheet"> |
| 21 | + |
| 22 | + <style> |
| 23 | + :root { |
| 24 | + --bg: #fafaf8; |
| 25 | + --surface: #ffffff; |
| 26 | + --text: #1a1a1a; |
| 27 | + --text-secondary: #5a5a5a; |
| 28 | + --accent: #2563eb; |
| 29 | + --accent-hover: #1d4ed8; |
| 30 | + --border: #e5e5e0; |
| 31 | + --code-bg: #1e1e2e; |
| 32 | + --code-text: #cdd6f4; |
| 33 | + --code-keyword: #cba6f7; |
| 34 | + --code-string: #a6e3a1; |
| 35 | + --code-comment: #6c7086; |
| 36 | + --code-type: #89b4fa; |
| 37 | + --code-fn: #f9e2af; |
| 38 | + --code-num: #fab387; |
| 39 | + --tag-bg: #f0fdf4; |
| 40 | + --tag-text: #166534; |
| 41 | + --tag-border: #bbf7d0; |
| 42 | + --shadow-lg: 0 4px 16px rgba(0,0,0,0.08), 0 2px 4px rgba(0,0,0,0.04); |
| 43 | + } |
| 44 | + *, *::before, *::after { margin: 0; padding: 0; box-sizing: border-box; } |
| 45 | + html { scroll-behavior: smooth; font-size: 16px; } |
| 46 | + body { font-family: 'DM Sans', -apple-system, BlinkMacSystemFont, sans-serif; background: var(--bg); color: var(--text); line-height: 1.7; -webkit-font-smoothing: antialiased; } |
| 47 | + .container { max-width: 740px; margin: 0 auto; padding: 0 1.5rem; } |
| 48 | + .nav { padding: 1rem 0; border-bottom: 1px solid var(--border); } |
| 49 | + .nav .container { display: flex; align-items: center; gap: 0.5rem; font-size: 0.88rem; } |
| 50 | + .nav a { color: var(--accent); text-decoration: none; } |
| 51 | + .nav a:hover { text-decoration: underline; } |
| 52 | + .nav .sep { color: var(--text-secondary); } |
| 53 | + .hero { padding: 3.5rem 0 2rem; border-bottom: 1px solid var(--border); } |
| 54 | + .hero-badge { display: inline-block; font-family: 'JetBrains Mono', monospace; font-size: 0.72rem; font-weight: 500; color: var(--tag-text); background: var(--tag-bg); border: 1px solid var(--tag-border); padding: 0.25rem 0.75rem; border-radius: 100px; margin-bottom: 1rem; } |
| 55 | + .hero h1 { font-size: 2.2rem; font-weight: 700; letter-spacing: -0.03em; line-height: 1.15; margin-bottom: 0.75rem; } |
| 56 | + .hero p { font-size: 1.05rem; color: var(--text-secondary); line-height: 1.6; max-width: 620px; } |
| 57 | + .content { padding: 2.5rem 0; } |
| 58 | + .content h2 { font-size: 1.4rem; font-weight: 700; letter-spacing: -0.02em; margin: 2.5rem 0 0.75rem; } |
| 59 | + .content h2:first-child { margin-top: 0; } |
| 60 | + .content p { color: var(--text-secondary); margin-bottom: 1.25rem; line-height: 1.65; } |
| 61 | + .content p a { color: var(--accent); text-decoration: none; } |
| 62 | + .content p a:hover { text-decoration: underline; } |
| 63 | + .content ul { color: var(--text-secondary); margin: 0 0 1.25rem 1.5rem; line-height: 1.65; } |
| 64 | + .content li { margin-bottom: 0.4rem; } |
| 65 | + .content code { font-family: 'JetBrains Mono', monospace; font-size: 0.85em; background: #f0f0ec; padding: 0.15rem 0.4rem; border-radius: 4px; } |
| 66 | + .code-block { background: var(--code-bg); border-radius: 10px; padding: 1.5rem; overflow-x: auto; box-shadow: var(--shadow-lg); margin-bottom: 1.5rem; position: relative; } |
| 67 | + .code-block::before { content: ''; position: absolute; top: 12px; left: 14px; width: 9px; height: 9px; border-radius: 50%; background: #f38ba8; box-shadow: 14px 0 0 #f9e2af, 28px 0 0 #a6e3a1; } |
| 68 | + .code-block pre { padding-top: 0.75rem; font-family: 'JetBrains Mono', monospace; font-size: 0.8rem; line-height: 1.7; color: var(--code-text); } |
| 69 | + .kw { color: var(--code-keyword); } .str { color: var(--code-string); } .cmt { color: var(--code-comment); } .typ { color: var(--code-type); } .fn { color: var(--code-fn); } .num { color: var(--code-num); } |
| 70 | + .install-cmd { display: inline-block; font-family: 'JetBrains Mono', monospace; font-size: 0.88rem; background: var(--code-bg); color: var(--code-text); padding: 0.55rem 1.2rem; border-radius: 8px; margin-bottom: 1.5rem; } |
| 71 | + .install-cmd .prompt { color: var(--code-comment); user-select: none; } |
| 72 | + .comparison-table { width: 100%; border-collapse: collapse; font-size: 0.88rem; margin-bottom: 1.5rem; } |
| 73 | + .comparison-table th { text-align: left; padding: 0.65rem 1rem; border-bottom: 2px solid var(--border); font-weight: 600; } |
| 74 | + .comparison-table td { padding: 0.55rem 1rem; border-bottom: 1px solid var(--border); color: var(--text-secondary); } |
| 75 | + .comparison-table tr td:first-child { color: var(--text); font-weight: 500; } |
| 76 | + .check { color: #16a34a; } |
| 77 | + .cross { color: #dc2626; } |
| 78 | + .partial { color: #d97706; } |
| 79 | + .cta { padding: 3rem 0; text-align: center; border-top: 1px solid var(--border); } |
| 80 | + .cta h2 { font-size: 1.4rem; font-weight: 700; margin-bottom: 0.5rem; } |
| 81 | + .cta p { color: var(--text-secondary); margin-bottom: 1.25rem; } |
| 82 | + .btn { display: inline-flex; align-items: center; gap: 0.5rem; font-family: 'DM Sans', sans-serif; font-size: 0.95rem; font-weight: 500; text-decoration: none; padding: 0.65rem 1.4rem; border-radius: 8px; transition: all 0.15s ease; } |
| 83 | + .btn-primary { background: var(--accent); color: #fff; } |
| 84 | + .btn-primary:hover { background: var(--accent-hover); } |
| 85 | + .btn-secondary { background: var(--surface); color: var(--text); border: 1px solid var(--border); } |
| 86 | + .btn-secondary:hover { border-color: #ccc; } |
| 87 | + .cta-buttons { display: flex; gap: 0.75rem; justify-content: center; flex-wrap: wrap; } |
| 88 | + footer { padding: 2rem 0; text-align: center; border-top: 1px solid var(--border); } |
| 89 | + footer p { font-size: 0.8rem; color: var(--text-secondary); } |
| 90 | + footer a { color: var(--accent); text-decoration: none; } |
| 91 | + footer a:hover { text-decoration: underline; } |
| 92 | + @media (max-width: 640px) { |
| 93 | + .hero h1 { font-size: 1.7rem; } |
| 94 | + .code-block pre { font-size: 0.73rem; } |
| 95 | + .comparison-table { font-size: 0.8rem; } |
| 96 | + .comparison-table th, .comparison-table td { padding: 0.4rem 0.5rem; } |
| 97 | + } |
| 98 | + </style> |
| 99 | + |
| 100 | + <script data-goatcounter="https://odf-kit.goatcounter.com/count" |
| 101 | + async src="//gc.zgo.at/count.js"></script> |
| 102 | +</head> |
| 103 | +<body> |
| 104 | + |
| 105 | + <nav class="nav"> |
| 106 | + <div class="container"> |
| 107 | + <a href="../">odf-kit</a> |
| 108 | + <span class="sep">/</span> |
| 109 | + <span>Guides</span> |
| 110 | + <span class="sep">/</span> |
| 111 | + <span>ODT to HTML</span> |
| 112 | + </div> |
| 113 | + </nav> |
| 114 | + |
| 115 | + <section class="hero"> |
| 116 | + <div class="container"> |
| 117 | + <div class="hero-badge">Developer Guide</div> |
| 118 | + <h1>ODT to HTML in JavaScript</h1> |
| 119 | + <p>Convert .odt files to HTML in Node.js — pure JavaScript, no LibreOffice required. odf-kit reads .odt files and outputs clean, semantic HTML with headings, tables, lists, and inline formatting preserved.</p> |
| 120 | + </div> |
| 121 | + </section> |
| 122 | + |
| 123 | + <section class="content"> |
| 124 | + <div class="container"> |
| 125 | + |
| 126 | + <h2>The problem with ODT-to-HTML in JavaScript</h2> |
| 127 | + <p>The JavaScript ecosystem has <code>mammoth</code> for .docx-to-HTML conversion. For .odt files, the options have been far worse: <code>odt2html</code> was abandoned in 2016, <code>odt.js</code> is limited and unmaintained, and most solutions fall back to shelling out to LibreOffice headless — which requires LibreOffice to be installed on the server, adds significant startup latency, and doesn't work in serverless or edge environments.</p> |
| 128 | + <p>odf-kit fills this gap. The <code>odf-kit/reader</code> module is a pure JavaScript ODT parser that reads .odt files directly, with no native dependencies, no LibreOffice, and no subprocess calls.</p> |
| 129 | + |
| 130 | + <h2>Installation</h2> |
| 131 | + <div class="install-cmd"><span class="prompt">$ </span>npm install odf-kit</div> |
| 132 | + <p>Requires Node.js 22 or later. ESM only — use <code>import</code>, not <code>require</code>.</p> |
| 133 | + |
| 134 | + <h2>Quick start: ODT to HTML in one line</h2> |
| 135 | + <div class="code-block"> |
| 136 | + <pre><span class="kw">import</span> { <span class="fn">odtToHtml</span> } <span class="kw">from</span> <span class="str">"odf-kit/reader"</span>; |
| 137 | +<span class="kw">import</span> { readFileSync } <span class="kw">from</span> <span class="str">"fs"</span>; |
| 138 | + |
| 139 | +<span class="kw">const</span> bytes = <span class="kw">new</span> <span class="typ">Uint8Array</span>(readFileSync(<span class="str">"document.odt"</span>)); |
| 140 | +<span class="kw">const</span> html = <span class="fn">odtToHtml</span>(bytes); |
| 141 | + |
| 142 | +<span class="cmt">// html is a complete HTML document string:</span> |
| 143 | +<span class="cmt">// <!DOCTYPE html><html><head>...</head><body>...</body></html></span></pre> |
| 144 | + </div> |
| 145 | + |
| 146 | + <h2>Fragment output (embed in an existing page)</h2> |
| 147 | + <p>Pass <code>{ fragment: true }</code> to get just the body content without the HTML document wrapper — useful when embedding output into an existing page:</p> |
| 148 | + <div class="code-block"> |
| 149 | + <pre><span class="kw">import</span> { <span class="fn">odtToHtml</span> } <span class="kw">from</span> <span class="str">"odf-kit/reader"</span>; |
| 150 | +<span class="kw">import</span> { readFileSync } <span class="kw">from</span> <span class="str">"fs"</span>; |
| 151 | + |
| 152 | +<span class="kw">const</span> bytes = <span class="kw">new</span> <span class="typ">Uint8Array</span>(readFileSync(<span class="str">"document.odt"</span>)); |
| 153 | +<span class="kw">const</span> fragment = <span class="fn">odtToHtml</span>(bytes, { fragment: <span class="kw">true</span> }); |
| 154 | + |
| 155 | +<span class="cmt">// Returns inner body content only:</span> |
| 156 | +<span class="cmt">// <h1>Title</h1><p>Body text...</p><table>...</table></span></pre> |
| 157 | + </div> |
| 158 | + |
| 159 | + <h2>Access the document model</h2> |
| 160 | + <p>Use <code>readOdt()</code> when you need access to the structured document model — for example to extract metadata, iterate over paragraphs, or build a custom renderer:</p> |
| 161 | + <div class="code-block"> |
| 162 | + <pre><span class="kw">import</span> { <span class="fn">readOdt</span> } <span class="kw">from</span> <span class="str">"odf-kit/reader"</span>; |
| 163 | +<span class="kw">import</span> { readFileSync } <span class="kw">from</span> <span class="str">"fs"</span>; |
| 164 | + |
| 165 | +<span class="kw">const</span> bytes = <span class="kw">new</span> <span class="typ">Uint8Array</span>(readFileSync(<span class="str">"document.odt"</span>)); |
| 166 | +<span class="kw">const</span> doc = <span class="fn">readOdt</span>(bytes); |
| 167 | + |
| 168 | +<span class="cmt">// Document metadata from meta.xml</span> |
| 169 | +console.<span class="fn">log</span>(doc.metadata.title); |
| 170 | +console.<span class="fn">log</span>(doc.metadata.creator); |
| 171 | +console.<span class="fn">log</span>(doc.metadata.creationDate); |
| 172 | + |
| 173 | +<span class="cmt">// Structured body — array of paragraphs, headings, lists, tables</span> |
| 174 | +<span class="kw">for</span> (<span class="kw">const</span> block <span class="kw">of</span> doc.body) { |
| 175 | + <span class="kw">if</span> (block.kind === <span class="str">"heading"</span>) { |
| 176 | + console.<span class="fn">log</span>(`H${block.level}: `, block.spans.<span class="fn">map</span>(s => s.text).<span class="fn">join</span>(<span class="str">""</span>)); |
| 177 | + } |
| 178 | +} |
| 179 | + |
| 180 | +<span class="cmt">// Or convert to HTML via toHtml()</span> |
| 181 | +<span class="kw">const</span> html = doc.<span class="fn">toHtml</span>({ fragment: <span class="kw">true</span> });</pre> |
| 182 | + </div> |
| 183 | + |
| 184 | + <h2>What gets extracted</h2> |
| 185 | + <table class="comparison-table"> |
| 186 | + <thead> |
| 187 | + <tr><th>Element</th><th>HTML output</th><th>Status</th></tr> |
| 188 | + </thead> |
| 189 | + <tbody> |
| 190 | + <tr><td>Paragraphs</td><td><code><p></code></td><td><span class="check">✓</span></td></tr> |
| 191 | + <tr><td>Headings (levels 1–6)</td><td><code><h1></code>–<code><h6></code></td><td><span class="check">✓</span></td></tr> |
| 192 | + <tr><td>Bold text</td><td><code><strong></code></td><td><span class="check">✓</span></td></tr> |
| 193 | + <tr><td>Italic text</td><td><code><em></code></td><td><span class="check">✓</span></td></tr> |
| 194 | + <tr><td>Underline</td><td><code><u></code></td><td><span class="check">✓</span></td></tr> |
| 195 | + <tr><td>Strikethrough</td><td><code><s></code></td><td><span class="check">✓</span></td></tr> |
| 196 | + <tr><td>Superscript / subscript</td><td><code><sup></code> / <code><sub></code></td><td><span class="check">✓</span></td></tr> |
| 197 | + <tr><td>Hyperlinks</td><td><code><a href></code></td><td><span class="check">✓</span></td></tr> |
| 198 | + <tr><td>Bullet lists</td><td><code><ul><li></code></td><td><span class="check">✓</span></td></tr> |
| 199 | + <tr><td>Numbered lists</td><td><code><ol><li></code></td><td><span class="check">✓</span></td></tr> |
| 200 | + <tr><td>Nested lists</td><td>Nested <code><ul></code>/<code><ol></code></td><td><span class="check">✓</span></td></tr> |
| 201 | + <tr><td>Tables (including merged cells)</td><td><code><table><tr><td></code> with colspan/rowspan</td><td><span class="check">✓</span></td></tr> |
| 202 | + <tr><td>Document metadata</td><td>title, creator, dates</td><td><span class="check">✓</span></td></tr> |
| 203 | + <tr><td>Line breaks</td><td><code><br></code></td><td><span class="check">✓</span></td></tr> |
| 204 | + <tr><td>Named styles (bold headings, etc.)</td><td>Resolved from styles.xml</td><td><span class="check">✓</span></td></tr> |
| 205 | + <tr><td>Images</td><td>—</td><td><span class="partial">Roadmap</span></td></tr> |
| 206 | + <tr><td>Fonts, colors, font sizes</td><td>—</td><td><span class="partial">Roadmap</span></td></tr> |
| 207 | + <tr><td>Footnotes / endnotes</td><td>—</td><td><span class="partial">Roadmap</span></td></tr> |
| 208 | + </tbody> |
| 209 | + </table> |
| 210 | + |
| 211 | + <h2>Why not shell out to LibreOffice?</h2> |
| 212 | + <p>LibreOffice headless (<code>libreoffice --headless --convert-to html</code>) is the common fallback for ODT-to-HTML conversion. It works, but it comes with real costs:</p> |
| 213 | + <ul> |
| 214 | + <li><strong>Startup time</strong> — LibreOffice takes 2–5 seconds to start even for a small document</li> |
| 215 | + <li><strong>Deployment complexity</strong> — LibreOffice must be installed on every server, container, and CI runner</li> |
| 216 | + <li><strong>Serverless incompatibility</strong> — LibreOffice doesn't run on AWS Lambda, Vercel, Cloudflare Workers, or similar environments</li> |
| 217 | + <li><strong>Size</strong> — LibreOffice is several hundred megabytes; it dominates Docker image size</li> |
| 218 | + <li><strong>Security surface</strong> — a process spawned from your application with filesystem access is a meaningful attack surface</li> |
| 219 | + </ul> |
| 220 | + <p>odf-kit parses .odt files directly in JavaScript. No subprocess, no installed software, no startup penalty. It works anywhere Node.js runs.</p> |
| 221 | + |
| 222 | + <h2>Comparison with other options</h2> |
| 223 | + <table class="comparison-table"> |
| 224 | + <thead> |
| 225 | + <tr><th>Option</th><th>Pure JS</th><th>Maintained</th><th>Tables</th><th>Metadata</th></tr> |
| 226 | + </thead> |
| 227 | + <tbody> |
| 228 | + <tr><td>odf-kit/reader</td><td><span class="check">✓</span></td><td><span class="check">✓</span> Active (2026)</td><td><span class="check">✓</span></td><td><span class="check">✓</span></td></tr> |
| 229 | + <tr><td>LibreOffice headless</td><td><span class="cross">✗</span> Subprocess</td><td><span class="check">✓</span></td><td><span class="check">✓</span></td><td><span class="partial">Partial</span></td></tr> |
| 230 | + <tr><td>odt2html (npm)</td><td><span class="check">✓</span></td><td><span class="cross">✗</span> Abandoned (2016)</td><td><span class="cross">✗</span></td><td><span class="cross">✗</span></td></tr> |
| 231 | + <tr><td>odt.js</td><td><span class="check">✓</span></td><td><span class="cross">✗</span> Unmaintained</td><td><span class="partial">Partial</span></td><td><span class="cross">✗</span></td></tr> |
| 232 | + </tbody> |
| 233 | + </table> |
| 234 | + |
| 235 | + <h2>odf-kit does more than read</h2> |
| 236 | + <p>The same library that reads .odt files can also create them from scratch and fill existing templates with data. If you're building a document pipeline — generate, fill, or convert — odf-kit handles all three without additional dependencies.</p> |
| 237 | + <ul> |
| 238 | + <li><a href="fill-odt-template-javascript.html">Fill .odt templates with JavaScript</a> — design in LibreOffice, fill from code</li> |
| 239 | + <li><a href="../">Create .odt files programmatically</a> — headings, tables, images, lists, page layout</li> |
| 240 | + </ul> |
| 241 | + |
| 242 | + </div> |
| 243 | + </section> |
| 244 | + |
| 245 | + <section class="cta"> |
| 246 | + <div class="container"> |
| 247 | + <h2>Convert your first .odt file</h2> |
| 248 | + <div class="install-cmd" style="margin-bottom: 1.25rem;"><span class="prompt">$ </span>npm install odf-kit</div> |
| 249 | + <div class="cta-buttons"> |
| 250 | + <a href="https://github.com/GitHubNewbie0/odf-kit#readme" class="btn btn-primary">Full documentation</a> |
| 251 | + <a href="https://www.npmjs.com/package/odf-kit" class="btn btn-secondary">npm</a> |
| 252 | + </div> |
| 253 | + </div> |
| 254 | + </section> |
| 255 | + |
| 256 | + <footer> |
| 257 | + <div class="container"> |
| 258 | + <p><strong>odf-kit</strong> · Apache 2.0 · <a href="https://github.com/GitHubNewbie0/odf-kit">GitHub</a> · <a href="https://www.npmjs.com/package/odf-kit">npm</a> · <a href="../">Home</a></p> |
| 259 | + </div> |
| 260 | + </footer> |
| 261 | + |
| 262 | +</body> |
| 263 | +</html> |
0 commit comments