A deterministic documentation pipeline that:
- builds Markdown and SVG documentation from structured glossary definitions
- synchronizes generated Markdown files to Atlassian Confluence Cloud
The system combines:
- glossary-driven document generation
- incremental documentation builds
- semantic content normalization
- deterministic Confluence synchronization
- attachment handling
- conflict detection
- versioned archive history
The repository consists of two connected components:
Builds documentation files from templates using a centralized glossary (terms.yaml).
Features:
- YAML-based terminology management
- automatic placeholder replacement
- Markdown and SVG processing
- incremental builds
- archive/version history
- dry-run preview mode
- restore/version inspection
Publishes generated Markdown files to Confluence Cloud.
Features:
- deterministic synchronization
- semantic structural diffing
- conflict detection
- attachment synchronization
- dry-run preview mode
- overwrite protection
- normalized HTML comparison
Project structure overview:
- src/ → core Python modules
- res/ → input resources (markdown, images, SVGs)
- build/ → generated output (not versioned)
.
├── Makefile # run from CLI with `make`
├── requirements.txt
├── README.md
├── LICENSE
├── confluence.yaml # Confluence publishing configuration
├── terms.yaml # Terminology and mapping definitions
├── .gitignore
│
├── src/
│ ├── doc-automation/ # Documentation generation pipeline
│ │ └── build_docs.py # Main script to build documentation from ./res/doc-automtion/input
│ │ # as defined in terms.yaml
│ │
│ └── confluence-push/ # Confluence publishing pipeline
│ ├── svg_converter.py # Converts .svg to .png
│ ├── confluence_storage.py # Handles Confluence formats and storage operations
│ └── yaml_publish.py # Publishes content defined in confluence.yaml to Confluence
│
├── res/
│ └── doc-automation/ # Input resources for doc generation
│ ├── add/ # Additional assets (images, figures)
│ │ └── *.png
│ └── input/ # Source documentation files
│ ├── *.md
│ └── *.svg
│
├── build/
│ └── doc-automation/ # Generated build artifacts
│ └── output/ # Final rendered documentation output
Instead of writing raw text:
🧑✈️🧭 Dateneigner (Data Owner). Fokus: Verantwortung & Steuerung
you write structured placeholders:
{{dataOwner.label}}. Fokus: {{dataOwner.focus}}
During the build process, placeholders are resolved using terms.yaml.
Generated Markdown files are then synchronized to Confluence.
This ensures:
- consistent terminology
- centralized domain language
- deterministic publishing
- reproducible documentation
make installThis automatically:
- creates
.venvviavenv - upgrades pip
- installs all dependencies
make helpmake venvmake installmake cleanmake PYTHON=/usr/bin/python3.11 installNote1: make build automatically runs install, which:
- creates the virtual environment if missing
- installs dependencies
- upgrades pip
Note2: all publish commands automatically ensure the virtual environment exists and dependencies are installed.
Example:
terms:
data:
label: Daten
description: Strukturierte oder unstrukturierte Fakten, Messwerte oder Beobachtungen, die als Grundlage von Entscheidugnen oder Analysen dienen. Grundsätzlich sind drei Arten von Daten (numerische Daten wie Messwerte, kategoriale Daten wie Beschreibungen und Metadaten (Daten über Daten)) zu unterscheiden.
slogan: Daten sind eine strategische Ressource des BLW
related:
- dataCustodian
- dataSteward
- dataOwnerRequired fields:
labeldescription
{{term:data}}
{{term:data.label}}
{{data.label}}
make buildor
python ./src/doc-automation/build_docs.pyBehavior:
- processes all
.mdand.svgfiles - replaces placeholders
- archives previous versions
- skips unchanged files
- validates glossary structure
make rebuildBehavior: Recreate venv and rebuild docs.
python ./src/doc-automation/build_docs.py --dry-runor
python ./src/doc-automation/build_docs.py -drBehavior:
- previews changes
- shows diffs
- performs no file modifications
make publish-force
or
python ./src/doc-automation/build_docs.py --forceor
python ./src/doc-automation/build_docs.py -fBehavior:
- continues despite unknown terms
- inserts placeholders for unresolved terms
Example:
[UNKNOWN:term]
python ./src/doc-automation/build_docs.py --restore docs/file.mdor
python ./src/doc-automation/build_docs.py -r docs/file.mdpython ./src/doc-automation/build_docs.py --multi-version-restore docs/file.mdor
python ./src/doc-automation/build_docs.py -R docs/file.mdBefore modification, files are archived to:
build/doc-automation/archive/
Example:
archive/docs/api/
20260410_142301_123456_order.md
20260410_150012_654321_order.md
Features:
- preserved folder structure
- timestamped backups
- full rollback history
The build system only processes changed files using:
- file hash comparison
- cached build state
- glossary change detection
Cache file:
build/doc-automation/.build_cache.json
Create a .env file:
CONFLUENCE_BASE_URL=https://your-domain.atlassian.net/wiki
CONFLUENCE_EMAIL=your-email@example.com
CONFLUENCE_API_TOKEN=your-api-tokenExample confluence.yaml:
pages:
- id: "1413054556"
file: "build/doc-automation/output/example.md"
attachments:
diagram.svg:
path: "./build/doc-automation/output/diagram.svg"
caption: "Architecture diagram"make publishor
python ./src/confluence-push/yaml_publish.py confluence.yamlmake publish-dryor
python ./src/confluence-push/yaml_publish.py confluence.yaml --dry-runmake publish-forceor
python ./src/confluence-push/yaml_publish.py confluence.yaml --forcemake buildRuns:
python ./src/doc-automation/build_docs.pymake allRuns:
- documentation build
- Confluence publish
Equivalent to:
make build
make publishmake all-forceRuns:
- documentation build
- forced Confluence overwrite
Equivalent to:
make build
make publish-forceThe synchronization engine performs semantic structural comparison.
Pipeline:
Markdown
↓
HTML conversion
↓
normalization
↓
block extraction
↓
remote normalization
↓
semantic comparison
Compared block types:
- headings (
h1–h4) - paragraphs (
p) - list items (
li) - table cells (
td,th)
Ignored differences:
- whitespace
- Confluence editor formatting noise
- serialization artifacts
- paragraph/list duplication
Dry-run and conflict detection display semantic diffs:
--- confluence
+++ localThe system compares normalized structure instead of raw HTML.
A conflict occurs when:
- remote normalized content differs from local normalized content
Behavior:
- warning is displayed
- semantic diff is shown
- explicit confirmation is required
No silent overwrites occur.
Attachments use marker syntax:
@attach diagram.svg
or
@attach diagram.svg | Architecture diagram
Features:
- uploaded only when content changes
- deterministic naming via content hash
- SVG → PNG conversion
- attachment reuse when identical
Synchronization state is stored locally:
build/confluence-push/.state.json
Example:
{
"1413054556": {
"hash": "abcdef123456"
}
}The system does not rely on Confluence page properties.
The glossary validator checks:
- required root keys
- required fields
- field types
- unknown references
Strict validation is enabled by default.
Build system:
- Markdown (
.md) - SVG (
.svg)
Confluence synchronization:
- Markdown pages
- image attachments
terms.yaml
↓
placeholder resolution
↓
Markdown/SVG processing
↓
archive creation
↓
incremental build
↓
generated Markdown
↓
Confluence normalization pipeline
↓
semantic diff
↓
conflict handling
↓
Confluence update
- local files are the source of truth
- comparisons must be structural, not textual
- deterministic output is preferred over formatting preservation
- automation should be safe and inspectable
- explicit conflicts are preferred over silent divergence
This repository provides a deterministic documentation pipeline with:
- centralized glossary management
- automated document generation
- incremental builds
- archive/version history
- semantic Confluence synchronization
- conflict-safe publishing
- reproducible documentation output