Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 93 additions & 0 deletions docs/examples/agent_skill/docling-document-intelligence/EXAMPLE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Using the Docling agent skill

[Agent Skills](https://agentskills.io/specification) are folders of instructions that AI coding agents (Cursor, Claude Code, GitHub Copilot, etc.) can load when relevant.

## Where this bundle lives

- **Cursor (local):** `~/.cursor/skills/docling-document-intelligence/` (or copy this folder there).
- **Docling repository (docs + PRs):** `docs/examples/agent_skill/docling-document-intelligence/` in [github.com/docling-project/docling](https://github.com/docling-project/docling).

The two trees are kept in sync; use either source.

## Install (copy into your agent’s skills directory)

```bash
# From a checkout of the Docling repo
cp -r docs/examples/agent_skill/docling-document-intelligence ~/.cursor/skills/

# Or copy from another machine / archive into e.g. ~/.claude/skills/
```

No extra config is required beyond installing Python dependencies (below).

## Usage

Open your agent-enabled IDE and ask, for example:

```
Parse report.pdf and give me a structural outline
```

```
Convert https://arxiv.org/pdf/2408.09869 to markdown
```

```
Chunk invoice.pdf for RAG ingestion with 512 token chunks
```

```
Process scanned.pdf using the VLM pipeline
```

The agent should read `SKILL.md`, match the task, and run the appropriate pipeline.

## Running the helper scripts directly

From the **bundle root** (the `docling-document-intelligence` directory):

```bash
pip install -r scripts/requirements.txt

python3 scripts/docling-convert.py report.pdf

python3 scripts/docling-convert.py report.pdf --ocr-engine rapidocr

python3 scripts/docling-convert.py report.pdf --format chunks --max-tokens 512

python3 scripts/docling-convert.py scanned.pdf --pipeline vlm-local

python3 scripts/docling-convert.py doc.pdf \
--pipeline vlm-api \
--vlm-api-url http://localhost:8000/v1/chat/completions \
--vlm-api-model ibm-granite/granite-docling-258M
```

## Evaluate and refine

```bash
python3 scripts/docling-convert.py report.pdf --format json --out /tmp/doc.json
python3 scripts/docling-convert.py report.pdf --format markdown --out /tmp/doc.md
python3 scripts/docling-evaluate.py /tmp/doc.json --markdown /tmp/doc.md
```

If the report shows `warn` or `fail`, follow `recommended_actions`, re-convert,
and optionally append a note to `improvement-log.md` (see `SKILL.md` section 6).

## What the skill covers

| Task | How to ask |
|---|---|
| Parse PDF / DOCX / PPTX / HTML / image | "parse this file" |
| Convert to Markdown | "convert to markdown" |
| Export as structured JSON | "export as JSON" |
| Chunk for RAG | "chunk for RAG", "prepare for ingestion" |
| Analyze structure | "show me the headings and tables" |
| Use VLM pipeline | "use the VLM pipeline", "process scanned PDF" |
| Use remote inference | "use vLLM", "call the API pipeline" |

## Further reading

- [Agent Skills specification](https://agentskills.io/specification)
- [Docling documentation](https://docling-project.github.io/docling/)
- [Docling GitHub](https://github.com/docling-project/docling)
34 changes: 34 additions & 0 deletions docs/examples/agent_skill/docling-document-intelligence/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Docling agent skill (Cursor & compatible assistants)

This folder is an **[Agent Skill](https://agentskills.io/specification)**-style bundle for AI coding assistants: structured instructions (`SKILL.md`), a pipeline reference (`pipelines.md`), helper scripts under `scripts/`, and an evaluator for conversion quality.

It complements the official [Docling documentation](https://docling-project.github.io/docling/) and the [`docling` CLI](https://docling-project.github.io/docling/reference/cli/); use it when you want agents to follow a consistent **convert → export JSON → evaluate → refine** workflow.

The same layout is published in the Docling repo at `docs/examples/agent_skill/docling-document-intelligence/` (for docs and PRs).

## Contents

| Path | Purpose |
|------|---------|
| [`SKILL.md`](SKILL.md) | Full skill instructions (pipelines, chunking, evaluation loop) |
| [`pipelines.md`](pipelines.md) | Standard vs VLM pipelines, OCR engines, API notes |
| [`EXAMPLE.md`](EXAMPLE.md) | Installing into `~/.cursor/skills/`; running scripts |
| [`improvement-log.md`](improvement-log.md) | Optional template for local “what worked” notes |
| [`scripts/docling-convert.py`](scripts/docling-convert.py) | CLI: Markdown / JSON / RAG chunks |
| [`scripts/docling-evaluate.py`](scripts/docling-evaluate.py) | Heuristic quality report on JSON (+ optional Markdown) |
| [`scripts/requirements.txt`](scripts/requirements.txt) | Minimal pip deps for the scripts |

## Quick start (from this directory)

```bash
pip install -r scripts/requirements.txt
python3 scripts/docling-convert.py https://arxiv.org/pdf/2408.09869 --out /tmp/out.md
python3 scripts/docling-convert.py https://arxiv.org/pdf/2408.09869 --format json --out /tmp/out.json
python3 scripts/docling-evaluate.py /tmp/out.json --markdown /tmp/out.md
```

Use `--pipeline vlm-local` or `--pipeline vlm-api` for vision-model pipelines; see `SKILL.md` and `pipelines.md`.

## License

MIT (aligned with [Docling](https://github.com/docling-project/docling)).
Loading
Loading