Incremental Markdown Backup via ETAPI (Python script) #9664

ricolandia · 2026-05-07T19:03:27Z

ricolandia
May 7, 2026

I've started this backup subject in #9612 post

I wrote( no, no.. not me. An AI model ) a small Python script that backs up all your TriliumNext notes as individual .md files, and on subsequent runs, only downloads notes that have actually changed since the last backup.

I've tested, and it is working.

No plugins, no Node.js, just a Python script( .py ), the ETAPI, and cron setup(daily backups).

What it does

First run: exports every note as a .md file, organized in folders mirroring your Trilium note tree
Subsequent runs: queries only notes modified after the last backup timestamp, then skips any note whose dateModified hasn't changed, so large vaults back up in seconds after the first run
Each .md file includes a front matter block with trilium_id, created, and modified — useful if you ever want to re-import or diff versions
Saves a hidden .backup_state.json in the backup folder to track state between runs

Requirements

System (Debian/Ubuntu)

sudo apt update
sudo apt install python3 python3-pip

Python libraries

pip install requests --break-system-packages

If you're on Ubuntu 23+ or Debian 12+, the --break-system-packages flag is required because pip will otherwise refuse to install globally. Alternatively, use a virtualenv (see below).

Optional: virtualenv (cleaner for multiple projects)

python3 -m venv .venv
source .venv/bin/activate
pip install requests

Setup

Download
trilium_backup_incremental.py
Edit the three configuration lines at the top of the file:

SERVER     = "http://localhost:8080"   # your Trilium server address
TOKEN      = "YOUR_ETAPI_TOKEN"        # Settings → ETAPI → generate token
BACKUP_DIR = Path("/home/youruser/Backup_Trilium_MD")  # destination folder

How to get your ETAPI token: in TriliumNext, go to Menu → Options → ETAPI and click Generate new token.

Usage

First run (full backup)

python3 trilium_backup_incremental.py

Output example:

First backup — exporting all notes...
347 note(s) to process...
  [1/347] saved: Home
  [2/347] saved: Journal
  ...
✓ Completed: 347 notes saved, 0 skipped.
Backup at: /home/youruser/Backup_Trilium_MD

Subsequent runs (incremental)

python3 trilium_backup_incremental.py

Output example:

Last backup: 2026-04-20T14:32:00+00:00
Searching for notes modified since then...
12 note(s) to process...
  [1/12] saved: Meeting notes 2026-04-21
  [2/12] no changes: Home
  ...
✓ Completed: 1 notes saved, 11 skipped.

Backup folder structure

Backup_Trilium_MD/
├── .backup_state.json        ← internal state file (hidden)
├── Home.md
├── Journal/
│   ├── 2026-04-20.md
│   └── 2026-04-21.md
├── Projects/
│   ├── Project A.md
│   └── Project B.md
└── ...

Each .md file looks like:

---
title: "Meeting notes 2026-04-21"
trilium_id: abc123def456
created: 2026-04-21 09:00:00.000+0000
modified: 2026-04-21 11:32:00.000+0000
---

Note content here...

Scheduling automatic backups (cron)

To run a backup every day at 2:00 AM:

crontab -e

Add this line (adjust the path):

0 2 * * * python3 /home/youruser/scripts/trilium_backup_incremental.py >> /home/youruser/trilium_backup.log 2>&1

The >> ...log 2>&1 part saves all output to a log file so you can review past runs.

Notes and limitations

Only backs up text, code, and mermaid note types. Canvas notes, renderNotes, relation maps, and other special types are skipped intentionally (their content is not plain text).
HTML → Markdown conversion is basic. Trilium stores text notes as HTML internally. The script does a simple conversion (headings, paragraphs, line breaks). For perfect Markdown fidelity, you could pipe the content through pandoc. I've tried, but something got wrong and I step back.
Attachments are not downloaded. This is a text-only backup. If you need attachments, the native Trilium export (right-click → Export → Markdown + attachments) covers that case.
The state file (.backup_state.json) is hidden by default. To inspect it: cat /your/backup/dir/.backup_state.json

Script

Details

#!/usr/bin/env python3
"""Incremental Trilium backup via ETAPI.

First run: full backup of all notes.
Subsequent runs: only notes modified since the last backup.
"""

from __future__ import annotations

import json
import os
import re
import sys
from datetime import datetime, timezone
from pathlib import Path

try:
    import requests
except ImportError:
    sys.exit("requests not found. Install with: pip install requests --break-system-packages")

# ---------------------------------------------------------------------------
# Configuration
# ---------------------------------------------------------------------------

SERVER    = "http://localhost:8080"
TOKEN     = "YOUR_ETAPI_TOKEN"
BACKUP_DIR = Path("/home/youruser/Backup_Trilium_MD")
STATE_FILE = BACKUP_DIR / ".backup_state.json"

# ---------------------------------------------------------------------------

HEADERS = {"Authorization": TOKEN}


def api_get(path: str, **kwargs) -> dict | list:
    url = f"{SERVER}/etapi{path}"
    r = requests.get(url, headers=HEADERS, **kwargs)
    r.raise_for_status()
    return r.json()


def get_note_meta(note_id: str) -> dict:
    return api_get(f"/notes/{note_id}")


def get_note_content(note_id: str) -> str:
    url = f"{SERVER}/etapi/notes/{note_id}/content"
    r = requests.get(url, headers=HEADERS)
    r.raise_for_status()
    return r.text


def search_notes(query: str) -> list[dict]:
    data = api_get("/notes", params={"search": query, "limit": 10000})
    if isinstance(data, dict):
        return data.get("results", [])
    return data


def get_note_path(note_id: str) -> str:
    parts = []
    current_id = note_id
    visited = set()

    while current_id and current_id != "root" and current_id not in visited:
        visited.add(current_id)
        try:
            meta = get_note_meta(current_id)
        except Exception:
            break
        parts.append(sanitize_filename(meta.get("title", current_id)))
        branches = meta.get("parentBranchIds", [])
        if not branches:
            break
        try:
            branch = api_get(f"/branches/{branches[0]}")
            current_id = branch.get("parentNoteId", "")
        except Exception:
            break

    parts.reverse()
    return "/".join(parts) if parts else note_id


def sanitize_filename(name: str) -> str:
    name = re.sub(r'[<>:"/\\|?*\x00-\x1f]', "_", name)
    return name.strip(". ") or "_"


def html_to_md_basic(html: str) -> str:
    try:
        from html.parser import HTMLParser

        class TextExtractor(HTMLParser):
            def __init__(self):
                super().__init__()
                self.lines = []
                self._in_tag = []

            def handle_starttag(self, tag, attrs):
                self._in_tag.append(tag)
                if tag in ("br", "p", "h1", "h2", "h3", "h4", "li"):
                    self.lines.append("\n")
                if tag.startswith("h") and tag[1:].isdigit():
                    level = int(tag[1:])
                    self.lines.append("#" * level + " ")

            def handle_endtag(self, tag):
                if self._in_tag and self._in_tag[-1] == tag:
                    self._in_tag.pop()

            def handle_data(self, data):
                self.lines.append(data)

        extractor = TextExtractor()
        extractor.feed(html)
        return "".join(extractor.lines)
    except Exception:
        return re.sub(r"<[^>]+>", "", html)


def load_state() -> dict:
    if STATE_FILE.exists():
        with open(STATE_FILE, encoding="utf-8") as f:
            return json.load(f)
    return {"last_backup": None, "backed_up": {}}


def save_state(state: dict) -> None:
    BACKUP_DIR.mkdir(parents=True, exist_ok=True)
    with open(STATE_FILE, "w", encoding="utf-8") as f:
        json.dump(state, f, indent=2, ensure_ascii=False)


def backup_note(note_id: str, meta: dict, state: dict) -> bool:
    note_type = meta.get("type", "text")
    if note_type not in ("text", "code", "mermaid"):
        return False

    try:
        content = get_note_content(note_id)
    except Exception as e:
        print(f"  ⚠ Error fetching content for {note_id}: {e}")
        return False

    title = sanitize_filename(meta.get("title", note_id))
    note_path = get_note_path(note_id)
    folder = BACKUP_DIR / Path(note_path).parent if "/" in note_path else BACKUP_DIR
    folder.mkdir(parents=True, exist_ok=True)

    if meta.get("mime", "") in ("text/html", "") and note_type == "text":
        body = html_to_md_basic(content)
    else:
        body = content

    date_created = meta.get("dateCreated", "")
    date_modified = meta.get("dateModified", "")
    front_matter = (
        f"---\n"
        f"title: \"{title}\"\n"
        f"trilium_id: {note_id}\n"
        f"created: {date_created}\n"
        f"modified: {date_modified}\n"
        f"---\n\n"
    )

    filepath = folder / f"{title}.md"
    with open(filepath, "w", encoding="utf-8") as f:
        f.write(front_matter + body)

    state["backed_up"][note_id] = date_modified
    return True


def main() -> int:
    BACKUP_DIR.mkdir(parents=True, exist_ok=True)
    state = load_state()
    last_backup = state.get("last_backup")
    now = datetime.now(timezone.utc).isoformat()

    if last_backup:
        print(f"Last backup: {last_backup}")
        print("Searching for notes modified since then...")
        query = f'note.dateModified >= "{last_backup[:10]}"'
        try:
            notes = search_notes(query)
        except Exception:
            print("Incremental search failed, falling back to full backup...")
            notes = search_notes("note.type = text")
    else:
        print("First backup — exporting all notes...")
        notes = search_notes("note.type = text")

    if not notes:
        print("No notes found.")
        return 0

    print(f"{len(notes)} note(s) to process...")

    saved = 0
    skipped = 0

    for i, note_stub in enumerate(notes, start=1):
        note_id = note_stub.get("noteId")
        if not note_id:
            continue

        try:
            meta = get_note_meta(note_id)
        except Exception as e:
            print(f"  [{i}/{len(notes)}] ⚠ {note_id}: {e}")
            continue

        date_modified = meta.get("dateModified", "")
        last_saved = state["backed_up"].get(note_id)

        if last_saved and last_saved >= date_modified:
            skipped += 1
            print(f"  [{i}/{len(notes)}] no changes: {meta.get('title', note_id)}", end="\r")
            continue

        ok = backup_note(note_id, meta, state)
        if ok:
            saved += 1
            print(f"  [{i}/{len(notes)}] saved: {meta.get('title', note_id)}")
        else:
            skipped += 1

    state["last_backup"] = now
    save_state(state)

    print(f"\n✓ Completed: {saved} note(s) saved, {skipped} skipped.")
    print(f"Backup at: {BACKUP_DIR}")
    return 0


if __name__ == "__main__":
    raise SystemExit(main())

---

Screenshot:

Feedback welcome.

ricolandia · 2026-05-08T22:41:19Z

ricolandia
May 8, 2026
Author

Version 2 - Corrected script

Main fixes: note_id deduplication in filenames, comprehensive initial fetch, retry queue for failures, and full ISO timestamps for incremental comparison.
Identified a straightforward task and prepared an immediate solution

Summary of changes:

Name collisions (main bug): files are now named Note Title [abc123].md, where abc123 is the noteId. This ensures absolute uniqueness — two notes with the same title in the same folder will no longer overwrite each other.

Comprehensive initial fetch: the first run now fetches text, code, and mermaid content separately and merges the results with deduplication, instead of fetching only text.

Retry queue: .backup_state.json now includes a "failed" key that logs notes that encountered errors. On the next run, these notes are always reprocessed regardless of their dateModified value. If your previous state file doesn't have this key, the script creates it automatically (backward compatibility).

Metadata caching: get_note_meta now uses an in-memory cache (_meta_cache), significantly reducing repetitive API calls during hierarchical path construction.

One thing that remains unchanged: the script still doesn't delete files for notes that were removed in Trilium. If you want that behavior, it's a separate feature to be added later.

The code:

Details

#!/usr/bin/env python3
"""Trilium incremental backup via ETAPI.

First run: performs a full backup of all notes.
Subsequent runs: downloads only notes modified since the last backup.

Each note is saved as an individual .md file, preserving
Trilium's folder structure.

v2 Fixes:
  - Name deduplication: files include the note_id as a suffix to avoid collisions
  - Comprehensive initial fetch: captures text, code, and mermaid content in a single query
  - Retry queue: notes that failed in the previous run are retried
  - Incremental comparison uses full ISO timestamps (not just dates)
  - Parent note metadata caching to reduce API calls

Usage:
  python3 trilium_backup_incremental.py

Scheduling (daily cron at 2 AM):
  0 2 * * * python3 /path/to/trilium_backup_incremental.py
"""

from __future__ import annotations

import json
import os
import re
import sys
from datetime import datetime, timezone
from pathlib import Path

try:
    import requests
except ImportError:
    sys.exit("requests não encontrado. Instale com: pip install requests --break-system-packages")

# ---------------------------------------------------------------------------
# Configuration — edit
# ---------------------------------------------------------------------------

SERVER     = "http:xxx"
TOKEN      = "xxx"
BACKUP_DIR = Path("/home/youruser/Backup_Trilium_MD")
STATE_FILE = BACKUP_DIR / ".backup_state.json"

# ---------------------------------------------------------------------------

HEADERS = {"Authorization": TOKEN}

# Cache em memória para evitar chamadas repetidas de metadados de notas pai
_meta_cache: dict[str, dict] = {}


def api_get(path: str, **kwargs) -> dict | list:
    url = f"{SERVER}/etapi{path}"
    r = requests.get(url, headers=HEADERS, **kwargs)
    r.raise_for_status()
    return r.json()


def get_note_meta(note_id: str) -> dict:
    if note_id not in _meta_cache:
        _meta_cache[note_id] = api_get(f"/notes/{note_id}")
    return _meta_cache[note_id]


def get_note_content(note_id: str) -> str:
    url = f"{SERVER}/etapi/notes/{note_id}/content"
    r = requests.get(url, headers=HEADERS)
    r.raise_for_status()
    return r.text


def search_notes(query: str) -> list[dict]:
    """Busca notas pela query de busca do Trilium."""
    data = api_get("/notes", params={"search": query, "limit": 10000})
    if isinstance(data, dict):
        return data.get("results", [])
    return data


def get_note_path(note_id: str) -> str:
    """Reconstrói o caminho hierárquico da nota (para estrutura de pastas).
    
    Usa o cache de metadados para evitar chamadas repetidas.
    """
    parts = []
    current_id = note_id
    visited: set[str] = set()

    while current_id and current_id != "root" and current_id not in visited:
        visited.add(current_id)
        try:
            meta = get_note_meta(current_id)
        except Exception:
            break
        parts.append(sanitize_filename(meta.get("title", current_id)))
        branches = meta.get("parentBranchIds", [])
        if not branches:
            break
        try:
            branch = api_get(f"/branches/{branches[0]}")
            current_id = branch.get("parentNoteId", "")
        except Exception:
            break

    parts.reverse()
    return "/".join(parts) if parts else note_id


def sanitize_filename(name: str) -> str:
    """Remove caracteres inválidos para nomes de arquivo."""
    name = re.sub(r'[<>:"/\\|?*\x00-\x1f]', "_", name)
    return name.strip(". ") or "_"


def html_to_md_basic(html: str) -> str:
    """Conversão HTML→markdown mínima."""
    try:
        from html.parser import HTMLParser

        class TextExtractor(HTMLParser):
            def __init__(self):
                super().__init__()
                self.lines: list[str] = []
                self._in_tag: list[str] = []

            def handle_starttag(self, tag, attrs):
                self._in_tag.append(tag)
                if tag in ("br", "p", "h1", "h2", "h3", "h4", "li"):
                    self.lines.append("\n")
                if tag.startswith("h") and tag[1:].isdigit():
                    level = int(tag[1:])
                    self.lines.append("#" * level + " ")

            def handle_endtag(self, tag):
                if self._in_tag and self._in_tag[-1] == tag:
                    self._in_tag.pop()

            def handle_data(self, data):
                self.lines.append(data)

        extractor = TextExtractor()
        extractor.feed(html)
        return "".join(extractor.lines)
    except Exception:
        return re.sub(r"<[^>]+>", "", html)


def load_state() -> dict:
    if STATE_FILE.exists():
        with open(STATE_FILE, encoding="utf-8") as f:
            return json.load(f)
    # backed_up: {note_id: dateModified}
    # failed:    {note_id: reason}  — será retentada na próxima rodada
    return {"last_backup": None, "backed_up": {}, "failed": {}}


def save_state(state: dict) -> None:
    BACKUP_DIR.mkdir(parents=True, exist_ok=True)
    # Garante que a chave "failed" sempre existe no arquivo de estado
    state.setdefault("failed", {})
    with open(STATE_FILE, "w", encoding="utf-8") as f:
        json.dump(state, f, indent=2, ensure_ascii=False)


def backup_note(note_id: str, meta: dict, state: dict) -> bool:
    """Faz backup de uma nota individual. Retorna True se salvou."""
    note_type = meta.get("type", "text")
    if note_type not in ("text", "code", "mermaid"):
        return False

    try:
        content = get_note_content(note_id)
    except Exception as e:
        msg = f"Erro ao baixar conteúdo: {e}"
        print(f"  ⚠ {note_id}: {msg}")
        # Registra falha para retry na próxima rodada
        state["failed"][note_id] = msg
        return False

    title = sanitize_filename(meta.get("title", note_id))

    try:
        note_path = get_note_path(note_id)
    except Exception as e:
        print(f"  ⚠ Erro ao reconstruir caminho de {note_id}: {e}. Salvando na raiz.")
        note_path = title

    # Pasta = todos os componentes do caminho menos o último (que é o título da nota)
    if "/" in note_path:
        folder = BACKUP_DIR / Path(note_path).parent
    else:
        folder = BACKUP_DIR
    folder.mkdir(parents=True, exist_ok=True)

    # Converte HTML se necessário
    if meta.get("mime", "") in ("text/html", "") and note_type == "text":
        body = html_to_md_basic(content)
    else:
        body = content

    # -------------------------------------------------------------------
    # CORREÇÃO: sufixo com note_id para evitar colisões entre notas
    # homônimas na mesma pasta.
    # Formato: "Título da Nota [abc123].md"
    # -------------------------------------------------------------------
    filename = f"{title} [{note_id}].md"
    filepath = folder / filename

    date_created  = meta.get("dateCreated", "")
    date_modified = meta.get("dateModified", "")
    front_matter  = (
        f"---\n"
        f"title: \"{title}\"\n"
        f"trilium_id: {note_id}\n"
        f"created: {date_created}\n"
        f"modified: {date_modified}\n"
        f"---\n\n"
    )

    try:
        with open(filepath, "w", encoding="utf-8") as f:
            f.write(front_matter + body)
    except OSError as e:
        msg = f"Erro ao escrever arquivo: {e}"
        print(f"  ⚠ {note_id}: {msg}")
        state["failed"][note_id] = msg
        return False

    # Salvo com sucesso — remove de "failed" se estava lá
    state["backed_up"][note_id] = date_modified
    state["failed"].pop(note_id, None)
    return True


def collect_notes_to_process(state: dict) -> tuple[list[dict], bool]:
    """Decide quais notas buscar e retorna (lista, is_full_backup).
    
    Lógica:
      1. Sem last_backup → backup completo.
      2. Com last_backup → busca incremental por timestamp completo
         + reprocessa notas da fila "failed".
    """
    last_backup = state.get("last_backup")
    failed_ids  = set(state.get("failed", {}).keys())

    if not last_backup:
        print("Primeiro backup — exportando todas as notas...")
        # Busca todos os tipos suportados de uma vez
        notes = (
            search_notes("note.type = text")
            + search_notes("note.type = code")
            + search_notes("note.type = mermaid")
        )
        # Remove duplicatas (podem aparecer em múltiplas queries)
        seen: set[str] = set()
        unique: list[dict] = []
        for n in notes:
            nid = n.get("noteId")
            if nid and nid not in seen:
                seen.add(nid)
                unique.append(n)
        return unique, True

    print(f"Último backup: {last_backup}")
    print("Buscando notas modificadas desde então...")

    # Usa timestamp completo para a comparação, não só a data
    # A API do Trilium aceita ISO 8601 no formato "YYYY-MM-DDTHH:MM:SS.sssZ"
    # mas a query de busca normalmente aceita só a data; usamos a data mais
    # conservadora (dia anterior) para não perder notas por diferença de fuso.
    cutoff_date = last_backup[:10]  # YYYY-MM-DD
    query = f'note.dateModified >= "{cutoff_date}"'

    try:
        notes = search_notes(query)
    except Exception as e:
        print(f"Busca incremental falhou ({e}), fazendo backup completo...")
        notes = (
            search_notes("note.type = text")
            + search_notes("note.type = code")
            + search_notes("note.type = mermaid")
        )

    # Adiciona notas que falharam anteriormente (retry)
    if failed_ids:
        print(f"Retentando {len(failed_ids)} nota(s) com falha anterior...")
        existing_ids = {n.get("noteId") for n in notes}
        for fid in failed_ids:
            if fid not in existing_ids:
                notes.append({"noteId": fid})

    # Dedup
    seen = set()
    unique = []
    for n in notes:
        nid = n.get("noteId")
        if nid and nid not in seen:
            seen.add(nid)
            unique.append(n)

    return unique, False


def main() -> int:
    BACKUP_DIR.mkdir(parents=True, exist_ok=True)
    state = load_state()
    # Garante estrutura mínima do estado (compatibilidade com versão anterior)
    state.setdefault("backed_up", {})
    state.setdefault("failed", {})

    notes, is_full = collect_notes_to_process(state)

    if not notes:
        print("Nenhuma nota encontrada para backup.")
        return 0

    print(f"{len(notes)} nota(s) para processar...")

    saved   = 0
    skipped = 0
    errors  = 0

    now = datetime.now(timezone.utc).isoformat()

    for i, note_stub in enumerate(notes, start=1):
        note_id = note_stub.get("noteId")
        if not note_id:
            continue

        try:
            meta = get_note_meta(note_id)
        except Exception as e:
            print(f"  [{i}/{len(notes)}] ⚠ {note_id}: metadados indisponíveis ({e})")
            state["failed"][note_id] = f"meta indisponível: {e}"
            errors += 1
            continue

        date_modified = meta.get("dateModified", "")
        last_saved    = state["backed_up"].get(note_id)

        # Pula se não mudou desde o último backup E não estava na fila de falhas
        if (
            last_saved
            and last_saved >= date_modified
            and note_id not in state.get("failed", {})
        ):
            skipped += 1
            print(f"  [{i}/{len(notes)}] sem mudança: {meta.get('title', note_id)}", end="\r")
            continue

        ok = backup_note(note_id, meta, state)
        if ok:
            saved += 1
            print(f"  [{i}/{len(notes)}] ✓ salvo: {meta.get('title', note_id)}")
        else:
            errors += 1

    state["last_backup"] = now
    save_state(state)

    print(f"\n✓ Concluído: {saved} salvas, {skipped} sem mudança, {errors} erro(s).")
    if state["failed"]:
        print(f"⚠ {len(state['failed'])} nota(s) com falha serão retentadas no próximo backup:")
        for fid, reason in list(state["failed"].items())[:10]:
            print(f"   {fid}: {reason}")
        if len(state["failed"]) > 10:
            print(f"   ... e mais {len(state['failed']) - 10}")
    print(f"Backup em: {BACKUP_DIR}")
    return 0 if errors == 0 else 1


if __name__ == "__main__":
    raise SystemExit(main())

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trilium Next

Incremental Markdown Backup via ETAPI (Python script) #9664

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Trilium Next

Incremental Markdown Backup via ETAPI (Python script) #9664

Uh oh!

Uh oh!

ricolandia May 7, 2026

What it does

Requirements

System (Debian/Ubuntu)

Python libraries

Optional: virtualenv (cleaner for multiple projects)

Setup

Usage

First run (full backup)

Subsequent runs (incremental)

Backup folder structure

Scheduling automatic backups (cron)

Notes and limitations

Script

Screenshot:

Replies: 1 comment

Uh oh!

Uh oh!

ricolandia May 8, 2026 Author

The code:

ricolandia
May 7, 2026

ricolandia
May 8, 2026
Author