cmi-docx is a Python library by the Child Mind Institute that extends python-docx with higher-level tooling for .docx file manipulation. It provides two complementary APIs:
- Imperative API -- wrapper classes around
python-docxobjects for find/replace, formatting, insertion, and comments on existing documents. - Declarative API -- an async-first, component-based system for constructing documents from scratch with conditional rendering, lazy evaluation, and template support.
- Find and replace across an entire document (body, headers, footers, and tables), even when the target text is split across multiple runs by Word's internal formatting.
- Style-aware replacement -- apply bold, italic, underline, font size, color, and more to replacement text.
- Paragraph insertion -- insert paragraphs by text, by object, or as images at any position in the document body.
- Run-level formatting -- read and write formatting on individual runs (bold, italic, underline, superscript, subscript, font size, font color).
- Paragraph formatting -- control alignment, line spacing, space before/after, and font properties for entire paragraphs.
- Table and cell formatting -- toggle table sections, set cell background colors, and configure cell borders.
- Word comments -- programmatically add comments to paragraphs, runs, or ranges of runs, with automatic comment preservation during text edits.
- Declarative document construction -- build documents as a tree of
Section,Paragraph,TextRun,Table, andImageRuncomponents, then render to apython-docxDocumentwithawait doc.to_docx(). - Async and lazy evaluation -- declare children as coroutines or callables; they are resolved concurrently via
asyncio.gather. - Conditional rendering -- attach a
conditioncallable to any component to include or exclude it at render time without building its subtree. - Template support -- open an existing
.docxas a template, apply placeholder replacements, and insert new content at a specific paragraph index.
Install from PyPI:
pip install cmi-docxThe imperative API wraps python-docx objects with extension classes that add search, replace, formatting, and insertion capabilities.
import docx
from cmi_docx import ExtendDocument, RunStyle
doc = docx.Document()
paragraph = doc.add_paragraph("Hello {{")
paragraph.add_run("FULL_NAME}}")
extend_doc = ExtendDocument(doc)
extend_doc.replace("{{FULL_NAME}}", "Jane Doe", RunStyle(bold=True))
print(doc.paragraphs[0].text) # "Hello *Jane Doe*"import docx
from cmi_docx import ExtendParagraph, ParagraphStyle
from docx.enum.text import WD_ALIGN_PARAGRAPH
doc = docx.Document()
paragraph = doc.add_paragraph("Formatted paragraph.")
ExtendParagraph(paragraph).format(
ParagraphStyle(
bold=True,
italic=True,
font_size=14,
alignment=WD_ALIGN_PARAGRAPH.CENTER,
)
)import docx
from cmi_docx import ExtendParagraph, RunStyle
doc = docx.Document()
paragraph = doc.add_paragraph("")
paragraph.add_run("Hello ")
paragraph.add_run("world!")
ExtendParagraph(paragraph).insert_run(1, "beautiful ", RunStyle(bold=True))
print(paragraph.text) # "Hello beautiful world!"import docx
from cmi_docx import add_comment
document = docx.Document()
paragraph = document.add_paragraph("This needs review.")
add_comment(document, paragraph, "Reviewer", "Please check this section.")The declarative API lets you build documents as a tree of components. All children are resolved concurrently, and components can be conditionally included or lazily constructed.
import asyncio
from cmi_docx.declarative import Document, Section, Paragraph, TextRun
async def main() -> None:
doc = Document(
sections=[
Section(
children=[
Paragraph(text="Main Heading", heading=1),
Paragraph(
children=[
TextRun(text="Bold text", bold=True),
TextRun(text=" and "),
TextRun(text="italic text", italic=True),
],
),
],
),
],
title="My Document",
creator="Author Name",
)
docx_doc = await doc.to_docx()
docx_doc.save("output.docx")
asyncio.run(main())Components accept coroutines as children, which are resolved concurrently:
import asyncio
from cmi_docx.declarative import Document, Section, Paragraph
async def fetch_paragraph() -> Paragraph:
await asyncio.sleep(0.1) # Simulate an API call
return Paragraph(text="Content fetched asynchronously")
async def main() -> None:
doc = Document(
sections=[
Section(
children=[
Paragraph(text="Static content"),
fetch_paragraph(),
],
),
],
)
docx_doc = await doc.to_docx()
docx_doc.save("output.docx")
asyncio.run(main())Attach a condition callable to skip components without building their subtree:
from cmi_docx.declarative import Document, Section, Paragraph
include_details = False
doc = Document(
sections=[
Section(
children=[
Paragraph(text="Always visible"),
Paragraph(
text="Only shown when details are enabled",
condition=lambda: include_details,
),
],
),
],
)Open an existing .docx as a template, replace placeholders, and insert new content:
import asyncio
from pathlib import Path
from cmi_docx.declarative import Document, DocumentTemplate, Section, Paragraph
async def main() -> None:
doc = Document(
sections=[
Section(
children=[Paragraph(text="Inserted content")],
),
],
)
template = DocumentTemplate(
path=Path("template.docx"),
replacements={"{{NAME}}": "Alice", "{{DATE}}": "2025-01-01"},
paragraph_index=1, # Insert after the first template paragraph
)
docx_doc = await doc.to_docx(template=template)
docx_doc.save("output.docx")
asyncio.run(main())