The Web client runs OCR entirely in the browser using the IBM Granite Docling 258M model via WebGPU or WASM. No server is required.
Install the required peer dependencies:
npm install @huggingface/transformers onnxruntime-webFor PDF support, also install:
npm install unpdfWebGPU is the recommended backend for best performance. If WebGPU is unavailable, the client falls back to WASM.
| Backend | Performance | Compatibility |
|---|---|---|
webgpu |
Fast (GPU-accelerated) | Chrome 113+, Edge 113+ |
wasm |
Slower (CPU-based) | All modern browsers |
auto |
Tries WebGPU first, falls back to WASM | All modern browsers |
import { createWebClient } from "docling-sdk/web";
const client = createWebClient({ device: "webgpu" });Or through the main factory:
import { Docling } from "docling-sdk";
const client = new Docling({
web: {
device: "webgpu",
maxNewTokens: 4096,
},
});Download and load the model. The first call downloads the model (~500 MB), which is then cached in IndexedDB:
client.on("loading", ({ progress, status }) => {
console.log(`Loading: ${Math.round(progress * 100)}% - ${status}`);
});
await client.initialize();
// client.ready === trueTerminate the Web Worker and release resources:
client.destroy();
// client.ready === false| Property | Type | Description |
|---|---|---|
client.ready |
boolean |
true after initialize() completes |
client.processing |
boolean |
true while processImage() is running |
processImage accepts several input types:
const result = await client.processImage(input);| Input type | Description |
|---|---|
File |
File from <input type="file"> |
Blob |
Binary data blob |
string |
Data URL or base64 string |
HTMLCanvasElement |
Canvas element |
HTMLImageElement |
Image element |
ImageBitmap |
Decoded image bitmap |
OffscreenCanvas |
Offscreen canvas |
const result = await client.processImage(file, {
maxNewTokens: 8192, // Override max tokens for this call
});The result object contains all output formats:
const result = await client.processImage(file);
result.raw; // Raw DocTags markup
result.markdown; // Converted Markdown
result.html; // Converted HTML
result.plainText; // Plain text
result.json; // Structured WebOCRDocument
result.tables; // Extracted tables: { headers: string[], rows: string[][] }[]
result.overlays; // Bounding boxes: { tagType: string, bbox: {...} }[]The Web client also implements DoclingClientBase, so you can convert PDFs (requires unpdf peer dependency):
const result = await client.convert(pdfBuffer, "document.pdf", {
to_formats: ["md"],
});
console.log(result.document.md_content);Convenience methods:
const md = await client.toMarkdown(pdfBuffer, "doc.pdf");
const html = await client.toHtml(pdfBuffer, "doc.pdf");
const text = await client.extractText(pdfBuffer, "doc.pdf");Subscribe to events with on and unsubscribe with off:
client.on("loading", ({ progress, status }) => {
// Model download progress (0-1)
});
client.on("ready", () => {
// Model loaded and ready
});
client.on("status", ({ status }) => {
// Processing status updates
});
client.on("stream", ({ chunk, progress }) => {
// Streaming tokens during inference
process.stdout.write(chunk);
});
client.on("complete", (result) => {
// Processing complete, result is WebOCRResult
});
client.on("error", ({ message, code }) => {
// Error during initialization or processing
});| Event | Data | When |
|---|---|---|
loading |
{ progress: number, status: string } |
During model download |
ready |
undefined |
Model loaded |
status |
{ status: string } |
Processing status change |
stream |
{ chunk: string, progress: number } |
Token generated during inference |
complete |
WebOCRResult |
Processing finished |
error |
{ message: string, code?: string } |
Error occurred |
The model is cached in IndexedDB after the first download:
// Get cache size in bytes
const size = await client.getCacheSize();
console.log(`Cache: ${(size / 1024 / 1024).toFixed(1)} MB`);
// Clear the cache (forces re-download on next initialize())
const cleared = await client.clearCache();The docling-sdk/web entry point exports converter and extractor functions that work without the full client:
Convert a WebOCRDocument (the .json field of WebOCRResult) to different formats:
import {
doclingToHtml,
doclingToMarkdown,
doclingToPlainText,
doclingToJson,
} from "docling-sdk/web";
const html = doclingToHtml(result.json);
const markdown = doclingToMarkdown(result.json);
const text = doclingToPlainText(result.json);
const json = doclingToJson(result.json);Extract structured data from the document:
import { extractTables, tableToCSV, extractOverlays } from "docling-sdk/web";
const tables = extractTables(result.json);
for (const table of tables) {
console.log(table.headers);
console.log(table.rows);
console.log(tableToCSV(table));
}
const overlays = extractOverlays(result.raw);
for (const overlay of overlays) {
console.log(overlay.tagType, overlay.bbox);
}Render PDF pages to images (requires unpdf):
import { renderPdfToImages } from "docling-sdk/web";
const pages = await renderPdfToImages(pdfBuffer);
for (const page of pages) {
console.log(page.pageNumber, page.width, page.height);
// page.dataUrl is a data URL you can pass to processImage()
}import { clearModelCache, getModelCacheSize } from "docling-sdk/web";
const size = await getModelCacheSize();
await clearModelCache();The repository includes a full demo application:
npm run examples:web-ocrThis starts a Vite dev server at http://localhost:5173 with a drag-and-drop interface for testing the Web OCR client.
- Getting Started -- first Web OCR conversion
- Configuration -- Web config options
- Cross-Runtime -- browser entry points
- API Reference -- full method reference