Skip to content

Commit f175110

Browse files
committed
Add ODT-to-HTML guide page; update index to v0.5.2, 410 tests
1 parent c4e2149 commit f175110

2 files changed

Lines changed: 269 additions & 4 deletions

File tree

Lines changed: 263 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,263 @@
1+
<!DOCTYPE html>
2+
<html lang="en">
3+
<head>
4+
<meta charset="UTF-8">
5+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
6+
<title>ODT to HTML in JavaScript — Convert .odt Files with odf-kit</title>
7+
<meta name="description" content="Convert .odt files to HTML in Node.js with odf-kit. Pure JavaScript, no LibreOffice required. npm install odf-kit. Extracts text, headings, tables, lists, bold, italic, and hyperlinks.">
8+
<meta name="keywords" content="odt to html javascript, npm odt-to-html, odt parser npm, read odt nodejs, convert odt html node, odt-parser npm, npm install odt-to-html, parse odt javascript, odt reader javascript, openDocument html conversion">
9+
<meta name="author" content="GitHubNewbie0">
10+
<meta name="robots" content="index, follow">
11+
<link rel="canonical" href="https://githubnewbie0.github.io/odf-kit/guides/odt-to-html-javascript.html">
12+
13+
<meta property="og:type" content="article">
14+
<meta property="og:title" content="ODT to HTML in JavaScript — odf-kit">
15+
<meta property="og:description" content="Convert .odt files to HTML in Node.js with odf-kit. Pure JavaScript, no LibreOffice required.">
16+
<meta property="og:url" content="https://githubnewbie0.github.io/odf-kit/guides/odt-to-html-javascript.html">
17+
18+
<link rel="preconnect" href="https://fonts.googleapis.com">
19+
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
20+
<link href="https://fonts.googleapis.com/css2?family=DM+Sans:ital,wght@0,400;0,500;0,700&family=JetBrains+Mono:wght@400;500&display=swap" rel="stylesheet">
21+
22+
<style>
23+
:root {
24+
--bg: #fafaf8;
25+
--surface: #ffffff;
26+
--text: #1a1a1a;
27+
--text-secondary: #5a5a5a;
28+
--accent: #2563eb;
29+
--accent-hover: #1d4ed8;
30+
--border: #e5e5e0;
31+
--code-bg: #1e1e2e;
32+
--code-text: #cdd6f4;
33+
--code-keyword: #cba6f7;
34+
--code-string: #a6e3a1;
35+
--code-comment: #6c7086;
36+
--code-type: #89b4fa;
37+
--code-fn: #f9e2af;
38+
--code-num: #fab387;
39+
--tag-bg: #f0fdf4;
40+
--tag-text: #166534;
41+
--tag-border: #bbf7d0;
42+
--shadow-lg: 0 4px 16px rgba(0,0,0,0.08), 0 2px 4px rgba(0,0,0,0.04);
43+
}
44+
*, *::before, *::after { margin: 0; padding: 0; box-sizing: border-box; }
45+
html { scroll-behavior: smooth; font-size: 16px; }
46+
body { font-family: 'DM Sans', -apple-system, BlinkMacSystemFont, sans-serif; background: var(--bg); color: var(--text); line-height: 1.7; -webkit-font-smoothing: antialiased; }
47+
.container { max-width: 740px; margin: 0 auto; padding: 0 1.5rem; }
48+
.nav { padding: 1rem 0; border-bottom: 1px solid var(--border); }
49+
.nav .container { display: flex; align-items: center; gap: 0.5rem; font-size: 0.88rem; }
50+
.nav a { color: var(--accent); text-decoration: none; }
51+
.nav a:hover { text-decoration: underline; }
52+
.nav .sep { color: var(--text-secondary); }
53+
.hero { padding: 3.5rem 0 2rem; border-bottom: 1px solid var(--border); }
54+
.hero-badge { display: inline-block; font-family: 'JetBrains Mono', monospace; font-size: 0.72rem; font-weight: 500; color: var(--tag-text); background: var(--tag-bg); border: 1px solid var(--tag-border); padding: 0.25rem 0.75rem; border-radius: 100px; margin-bottom: 1rem; }
55+
.hero h1 { font-size: 2.2rem; font-weight: 700; letter-spacing: -0.03em; line-height: 1.15; margin-bottom: 0.75rem; }
56+
.hero p { font-size: 1.05rem; color: var(--text-secondary); line-height: 1.6; max-width: 620px; }
57+
.content { padding: 2.5rem 0; }
58+
.content h2 { font-size: 1.4rem; font-weight: 700; letter-spacing: -0.02em; margin: 2.5rem 0 0.75rem; }
59+
.content h2:first-child { margin-top: 0; }
60+
.content p { color: var(--text-secondary); margin-bottom: 1.25rem; line-height: 1.65; }
61+
.content p a { color: var(--accent); text-decoration: none; }
62+
.content p a:hover { text-decoration: underline; }
63+
.content ul { color: var(--text-secondary); margin: 0 0 1.25rem 1.5rem; line-height: 1.65; }
64+
.content li { margin-bottom: 0.4rem; }
65+
.content code { font-family: 'JetBrains Mono', monospace; font-size: 0.85em; background: #f0f0ec; padding: 0.15rem 0.4rem; border-radius: 4px; }
66+
.code-block { background: var(--code-bg); border-radius: 10px; padding: 1.5rem; overflow-x: auto; box-shadow: var(--shadow-lg); margin-bottom: 1.5rem; position: relative; }
67+
.code-block::before { content: ''; position: absolute; top: 12px; left: 14px; width: 9px; height: 9px; border-radius: 50%; background: #f38ba8; box-shadow: 14px 0 0 #f9e2af, 28px 0 0 #a6e3a1; }
68+
.code-block pre { padding-top: 0.75rem; font-family: 'JetBrains Mono', monospace; font-size: 0.8rem; line-height: 1.7; color: var(--code-text); }
69+
.kw { color: var(--code-keyword); } .str { color: var(--code-string); } .cmt { color: var(--code-comment); } .typ { color: var(--code-type); } .fn { color: var(--code-fn); } .num { color: var(--code-num); }
70+
.install-cmd { display: inline-block; font-family: 'JetBrains Mono', monospace; font-size: 0.88rem; background: var(--code-bg); color: var(--code-text); padding: 0.55rem 1.2rem; border-radius: 8px; margin-bottom: 1.5rem; }
71+
.install-cmd .prompt { color: var(--code-comment); user-select: none; }
72+
.comparison-table { width: 100%; border-collapse: collapse; font-size: 0.88rem; margin-bottom: 1.5rem; }
73+
.comparison-table th { text-align: left; padding: 0.65rem 1rem; border-bottom: 2px solid var(--border); font-weight: 600; }
74+
.comparison-table td { padding: 0.55rem 1rem; border-bottom: 1px solid var(--border); color: var(--text-secondary); }
75+
.comparison-table tr td:first-child { color: var(--text); font-weight: 500; }
76+
.check { color: #16a34a; }
77+
.cross { color: #dc2626; }
78+
.partial { color: #d97706; }
79+
.cta { padding: 3rem 0; text-align: center; border-top: 1px solid var(--border); }
80+
.cta h2 { font-size: 1.4rem; font-weight: 700; margin-bottom: 0.5rem; }
81+
.cta p { color: var(--text-secondary); margin-bottom: 1.25rem; }
82+
.btn { display: inline-flex; align-items: center; gap: 0.5rem; font-family: 'DM Sans', sans-serif; font-size: 0.95rem; font-weight: 500; text-decoration: none; padding: 0.65rem 1.4rem; border-radius: 8px; transition: all 0.15s ease; }
83+
.btn-primary { background: var(--accent); color: #fff; }
84+
.btn-primary:hover { background: var(--accent-hover); }
85+
.btn-secondary { background: var(--surface); color: var(--text); border: 1px solid var(--border); }
86+
.btn-secondary:hover { border-color: #ccc; }
87+
.cta-buttons { display: flex; gap: 0.75rem; justify-content: center; flex-wrap: wrap; }
88+
footer { padding: 2rem 0; text-align: center; border-top: 1px solid var(--border); }
89+
footer p { font-size: 0.8rem; color: var(--text-secondary); }
90+
footer a { color: var(--accent); text-decoration: none; }
91+
footer a:hover { text-decoration: underline; }
92+
@media (max-width: 640px) {
93+
.hero h1 { font-size: 1.7rem; }
94+
.code-block pre { font-size: 0.73rem; }
95+
.comparison-table { font-size: 0.8rem; }
96+
.comparison-table th, .comparison-table td { padding: 0.4rem 0.5rem; }
97+
}
98+
</style>
99+
100+
<script data-goatcounter="https://odf-kit.goatcounter.com/count"
101+
async src="//gc.zgo.at/count.js"></script>
102+
</head>
103+
<body>
104+
105+
<nav class="nav">
106+
<div class="container">
107+
<a href="../">odf-kit</a>
108+
<span class="sep">/</span>
109+
<span>Guides</span>
110+
<span class="sep">/</span>
111+
<span>ODT to HTML</span>
112+
</div>
113+
</nav>
114+
115+
<section class="hero">
116+
<div class="container">
117+
<div class="hero-badge">Developer Guide</div>
118+
<h1>ODT to HTML in JavaScript</h1>
119+
<p>Convert .odt files to HTML in Node.js — pure JavaScript, no LibreOffice required. odf-kit reads .odt files and outputs clean, semantic HTML with headings, tables, lists, and inline formatting preserved.</p>
120+
</div>
121+
</section>
122+
123+
<section class="content">
124+
<div class="container">
125+
126+
<h2>The problem with ODT-to-HTML in JavaScript</h2>
127+
<p>The JavaScript ecosystem has <code>mammoth</code> for .docx-to-HTML conversion. For .odt files, the options have been far worse: <code>odt2html</code> was abandoned in 2016, <code>odt.js</code> is limited and unmaintained, and most solutions fall back to shelling out to LibreOffice headless — which requires LibreOffice to be installed on the server, adds significant startup latency, and doesn't work in serverless or edge environments.</p>
128+
<p>odf-kit fills this gap. The <code>odf-kit/reader</code> module is a pure JavaScript ODT parser that reads .odt files directly, with no native dependencies, no LibreOffice, and no subprocess calls.</p>
129+
130+
<h2>Installation</h2>
131+
<div class="install-cmd"><span class="prompt">$ </span>npm install odf-kit</div>
132+
<p>Requires Node.js 22 or later. ESM only — use <code>import</code>, not <code>require</code>.</p>
133+
134+
<h2>Quick start: ODT to HTML in one line</h2>
135+
<div class="code-block">
136+
<pre><span class="kw">import</span> { <span class="fn">odtToHtml</span> } <span class="kw">from</span> <span class="str">"odf-kit/reader"</span>;
137+
<span class="kw">import</span> { readFileSync } <span class="kw">from</span> <span class="str">"fs"</span>;
138+
139+
<span class="kw">const</span> bytes = <span class="kw">new</span> <span class="typ">Uint8Array</span>(readFileSync(<span class="str">"document.odt"</span>));
140+
<span class="kw">const</span> html = <span class="fn">odtToHtml</span>(bytes);
141+
142+
<span class="cmt">// html is a complete HTML document string:</span>
143+
<span class="cmt">// &lt;!DOCTYPE html&gt;&lt;html&gt;&lt;head&gt;...&lt;/head&gt;&lt;body&gt;...&lt;/body&gt;&lt;/html&gt;</span></pre>
144+
</div>
145+
146+
<h2>Fragment output (embed in an existing page)</h2>
147+
<p>Pass <code>{ fragment: true }</code> to get just the body content without the HTML document wrapper — useful when embedding output into an existing page:</p>
148+
<div class="code-block">
149+
<pre><span class="kw">import</span> { <span class="fn">odtToHtml</span> } <span class="kw">from</span> <span class="str">"odf-kit/reader"</span>;
150+
<span class="kw">import</span> { readFileSync } <span class="kw">from</span> <span class="str">"fs"</span>;
151+
152+
<span class="kw">const</span> bytes = <span class="kw">new</span> <span class="typ">Uint8Array</span>(readFileSync(<span class="str">"document.odt"</span>));
153+
<span class="kw">const</span> fragment = <span class="fn">odtToHtml</span>(bytes, { fragment: <span class="kw">true</span> });
154+
155+
<span class="cmt">// Returns inner body content only:</span>
156+
<span class="cmt">// &lt;h1&gt;Title&lt;/h1&gt;&lt;p&gt;Body text...&lt;/p&gt;&lt;table&gt;...&lt;/table&gt;</span></pre>
157+
</div>
158+
159+
<h2>Access the document model</h2>
160+
<p>Use <code>readOdt()</code> when you need access to the structured document model — for example to extract metadata, iterate over paragraphs, or build a custom renderer:</p>
161+
<div class="code-block">
162+
<pre><span class="kw">import</span> { <span class="fn">readOdt</span> } <span class="kw">from</span> <span class="str">"odf-kit/reader"</span>;
163+
<span class="kw">import</span> { readFileSync } <span class="kw">from</span> <span class="str">"fs"</span>;
164+
165+
<span class="kw">const</span> bytes = <span class="kw">new</span> <span class="typ">Uint8Array</span>(readFileSync(<span class="str">"document.odt"</span>));
166+
<span class="kw">const</span> doc = <span class="fn">readOdt</span>(bytes);
167+
168+
<span class="cmt">// Document metadata from meta.xml</span>
169+
console.<span class="fn">log</span>(doc.metadata.title);
170+
console.<span class="fn">log</span>(doc.metadata.creator);
171+
console.<span class="fn">log</span>(doc.metadata.creationDate);
172+
173+
<span class="cmt">// Structured body — array of paragraphs, headings, lists, tables</span>
174+
<span class="kw">for</span> (<span class="kw">const</span> block <span class="kw">of</span> doc.body) {
175+
<span class="kw">if</span> (block.kind === <span class="str">"heading"</span>) {
176+
console.<span class="fn">log</span>(`H${block.level}: `, block.spans.<span class="fn">map</span>(s => s.text).<span class="fn">join</span>(<span class="str">""</span>));
177+
}
178+
}
179+
180+
<span class="cmt">// Or convert to HTML via toHtml()</span>
181+
<span class="kw">const</span> html = doc.<span class="fn">toHtml</span>({ fragment: <span class="kw">true</span> });</pre>
182+
</div>
183+
184+
<h2>What gets extracted</h2>
185+
<table class="comparison-table">
186+
<thead>
187+
<tr><th>Element</th><th>HTML output</th><th>Status</th></tr>
188+
</thead>
189+
<tbody>
190+
<tr><td>Paragraphs</td><td><code>&lt;p&gt;</code></td><td><span class="check"></span></td></tr>
191+
<tr><td>Headings (levels 1–6)</td><td><code>&lt;h1&gt;</code><code>&lt;h6&gt;</code></td><td><span class="check"></span></td></tr>
192+
<tr><td>Bold text</td><td><code>&lt;strong&gt;</code></td><td><span class="check"></span></td></tr>
193+
<tr><td>Italic text</td><td><code>&lt;em&gt;</code></td><td><span class="check"></span></td></tr>
194+
<tr><td>Underline</td><td><code>&lt;u&gt;</code></td><td><span class="check"></span></td></tr>
195+
<tr><td>Strikethrough</td><td><code>&lt;s&gt;</code></td><td><span class="check"></span></td></tr>
196+
<tr><td>Superscript / subscript</td><td><code>&lt;sup&gt;</code> / <code>&lt;sub&gt;</code></td><td><span class="check"></span></td></tr>
197+
<tr><td>Hyperlinks</td><td><code>&lt;a href&gt;</code></td><td><span class="check"></span></td></tr>
198+
<tr><td>Bullet lists</td><td><code>&lt;ul&gt;&lt;li&gt;</code></td><td><span class="check"></span></td></tr>
199+
<tr><td>Numbered lists</td><td><code>&lt;ol&gt;&lt;li&gt;</code></td><td><span class="check"></span></td></tr>
200+
<tr><td>Nested lists</td><td>Nested <code>&lt;ul&gt;</code>/<code>&lt;ol&gt;</code></td><td><span class="check"></span></td></tr>
201+
<tr><td>Tables (including merged cells)</td><td><code>&lt;table&gt;&lt;tr&gt;&lt;td&gt;</code> with colspan/rowspan</td><td><span class="check"></span></td></tr>
202+
<tr><td>Document metadata</td><td>title, creator, dates</td><td><span class="check"></span></td></tr>
203+
<tr><td>Line breaks</td><td><code>&lt;br&gt;</code></td><td><span class="check"></span></td></tr>
204+
<tr><td>Named styles (bold headings, etc.)</td><td>Resolved from styles.xml</td><td><span class="check"></span></td></tr>
205+
<tr><td>Images</td><td></td><td><span class="partial">Roadmap</span></td></tr>
206+
<tr><td>Fonts, colors, font sizes</td><td></td><td><span class="partial">Roadmap</span></td></tr>
207+
<tr><td>Footnotes / endnotes</td><td></td><td><span class="partial">Roadmap</span></td></tr>
208+
</tbody>
209+
</table>
210+
211+
<h2>Why not shell out to LibreOffice?</h2>
212+
<p>LibreOffice headless (<code>libreoffice --headless --convert-to html</code>) is the common fallback for ODT-to-HTML conversion. It works, but it comes with real costs:</p>
213+
<ul>
214+
<li><strong>Startup time</strong> — LibreOffice takes 2–5 seconds to start even for a small document</li>
215+
<li><strong>Deployment complexity</strong> — LibreOffice must be installed on every server, container, and CI runner</li>
216+
<li><strong>Serverless incompatibility</strong> — LibreOffice doesn't run on AWS Lambda, Vercel, Cloudflare Workers, or similar environments</li>
217+
<li><strong>Size</strong> — LibreOffice is several hundred megabytes; it dominates Docker image size</li>
218+
<li><strong>Security surface</strong> — a process spawned from your application with filesystem access is a meaningful attack surface</li>
219+
</ul>
220+
<p>odf-kit parses .odt files directly in JavaScript. No subprocess, no installed software, no startup penalty. It works anywhere Node.js runs.</p>
221+
222+
<h2>Comparison with other options</h2>
223+
<table class="comparison-table">
224+
<thead>
225+
<tr><th>Option</th><th>Pure JS</th><th>Maintained</th><th>Tables</th><th>Metadata</th></tr>
226+
</thead>
227+
<tbody>
228+
<tr><td>odf-kit/reader</td><td><span class="check"></span></td><td><span class="check"></span> Active (2026)</td><td><span class="check"></span></td><td><span class="check"></span></td></tr>
229+
<tr><td>LibreOffice headless</td><td><span class="cross"></span> Subprocess</td><td><span class="check"></span></td><td><span class="check"></span></td><td><span class="partial">Partial</span></td></tr>
230+
<tr><td>odt2html (npm)</td><td><span class="check"></span></td><td><span class="cross"></span> Abandoned (2016)</td><td><span class="cross"></span></td><td><span class="cross"></span></td></tr>
231+
<tr><td>odt.js</td><td><span class="check"></span></td><td><span class="cross"></span> Unmaintained</td><td><span class="partial">Partial</span></td><td><span class="cross"></span></td></tr>
232+
</tbody>
233+
</table>
234+
235+
<h2>odf-kit does more than read</h2>
236+
<p>The same library that reads .odt files can also create them from scratch and fill existing templates with data. If you're building a document pipeline — generate, fill, or convert — odf-kit handles all three without additional dependencies.</p>
237+
<ul>
238+
<li><a href="fill-odt-template-javascript.html">Fill .odt templates with JavaScript</a> — design in LibreOffice, fill from code</li>
239+
<li><a href="../">Create .odt files programmatically</a> — headings, tables, images, lists, page layout</li>
240+
</ul>
241+
242+
</div>
243+
</section>
244+
245+
<section class="cta">
246+
<div class="container">
247+
<h2>Convert your first .odt file</h2>
248+
<div class="install-cmd" style="margin-bottom: 1.25rem;"><span class="prompt">$ </span>npm install odf-kit</div>
249+
<div class="cta-buttons">
250+
<a href="https://github.com/GitHubNewbie0/odf-kit#readme" class="btn btn-primary">Full documentation</a>
251+
<a href="https://www.npmjs.com/package/odf-kit" class="btn btn-secondary">npm</a>
252+
</div>
253+
</div>
254+
</section>
255+
256+
<footer>
257+
<div class="container">
258+
<p><strong>odf-kit</strong> · Apache 2.0 · <a href="https://github.com/GitHubNewbie0/odf-kit">GitHub</a> · <a href="https://www.npmjs.com/package/odf-kit">npm</a> · <a href="../">Home</a></p>
259+
</div>
260+
</footer>
261+
262+
</body>
263+
</html>

0 commit comments

Comments
 (0)