Skip to content

xxmikexx1/docling-langchain

 
 

Repository files navigation

Docling LangChain integration

PyPI version PyPI - Python Version uv Code style: black Imports: isort Pydantic v2 pre-commit License MIT

A Docling integration for LangChain.

Installation

Simply install langchain-docling from your package manager, e.g. pip:

pip install langchain-docling

Development setup

To develop for Docling Core, you need Python >=3.9 <=3.13 and uv. You can then install from your local clone's root dir:

uv sync

Usage

Basic usage

Basic usage of DoclingLoader looks as follows:

from langchain_docling import DoclingLoader

FILE_PATH = ["https://arxiv.org/pdf/2408.09869"]  # Docling Technical Report

loader = DoclingLoader(file_path=FILE_PATH)
docs = loader.load()

Advanced usage

When initializing a DoclingLoader, you can use the following parameters:

  • file_path: source as single str (URL or local file) or iterable thereof
  • converter (optional): any specific Docling converter instance to use
  • convert_kwargs (optional): any specific kwargs for conversion execution
  • export_type (optional): export mode to use: ExportType.DOC_CHUNKS (default) or ExportType.MARKDOWN
  • md_export_kwargs (optional): any specific Markdown export kwargs (for Markdown mode)
  • chunker (optional): any specific Docling chunker instance to use (for doc-chunk mode)
  • meta_extractor (optional): any specific metadata extractor to use

Neo4j ingestion

The package also provides a helper to ingest PDF files into a Neo4j vector store using Docling's hybrid chunking strategy:

from langchain_openai import OpenAIEmbeddings
from langchain_docling import ingest_pdfs_to_neo4j

vector_store = ingest_pdfs_to_neo4j(
    file_paths=["/path/to/report.pdf"],
    embedding=OpenAIEmbeddings(),
    url="bolt://localhost:7687",
    username="neo4j",
    password="secret",
)

Docs and examples

For more details and usage examples, check out this page.

About

Docling LangChain integration

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%