Background
We discussed the current NodularMarkdown design, where each markdown block is persisted as a separate node record and the editor reconstructs a linear markdown view from that hierarchy.
That design appears to be optimized for treating subparts of a markdown document as first-class resources:
- block-level linking / mentioning
- zooming into a sub-heading
- block-level collections / relationships
- subpart search and breadcrumbing
- block-level persistence and structure metadata
However, if those capabilities are not required, the current approach likely carries substantial overhead for large documents.
Problem
For very large markdown documents (for example, documents with 10,000+ lines), the current nodular model likely hurts performance because of both:
- persistence fanout: many records per document
- fetch/load overhead: recursive tree loading and flattening
- mutation churn: per-block writes and structure updates
- rendering overhead: one editor/render component per block
- editing fragility: current inline formatting / cursor / contenteditable issues
Search can still be supported without block-level storage if the document is stored as one canonical resource and we derive lightweight search/index metadata separately.
Proposal
Rebuild markdown editing around a ProseMirror-based architecture and store each document as a single canonical resource.
Editor direction
Use ProseMirror as the document model foundation.
Implementation options:
- ProseMirror directly for maximum control
- Tiptap as a thin headless layer on top of ProseMirror for faster delivery
Current recommendation:
- Use ProseMirror as the architectural base
- Tiptap is acceptable if used mainly as a productivity layer
- Keep our own schema, commands, block browser, context menus, serializers, and custom block definitions
Why this direction
A ProseMirror-based editor is a much better fit for the issues we are seeing today:
- prevents many cursor jumping / selection mapping problems
- avoids brittle DOM-first formatting behavior
- supports structured block and inline schema definitions
- supports custom blocks, commands, node views, slash menus, context menus, and embeds
- is a stronger foundation if we rebuild markdown from scratch while preserving the same UI and block types
Storage direction
If the backend remains PostgreSQL, store documents in PostgreSQL rather than introducing a separate NoSQL/document database.
Recommended canonical storage model:
content jsonb as the canonical ProseMirror/Tiptap document
content_text text as flattened plain text for search/snippets/export helpers
search_vector tsvector derived from content_text
schema_version int
- normal metadata columns such as
title, updated_at, etc.
Storage recommendation
Use PostgreSQL jsonb rather than:
- a separate NoSQL/document database
- markdown text as the only canonical persisted representation
Recommended role split:
- Canonical persisted format: ProseMirror/Tiptap JSON in
jsonb
- Import/export format: Markdown
- Search format: flattened text +
tsvector
Recommended product direction
If block-level addressability is not needed, default to a single-resource document model for long-form markdown.
Reserve nodular / block-addressable documents only if we explicitly need features like:
- block-level linking / mentions
- focused subtrees / heading zoom
- block-level metadata or relationships
- subpart-as-resource workflows
Important caveat
Changing storage alone will not solve all large-document performance issues.
Even with a single persisted document, a 10,000-line editor may still be slow unless we also improve rendering strategy.
We should plan for:
- single-resource persistence
- editor-native schema model
- minimal use of custom node views for text blocks
- node views only for truly interactive/custom blocks
- chunking or virtualization in read mode
- possibly sectional lazy rendering in edit mode for very large documents
Suggested implementation shape
Canonical editor model
- ProseMirror document JSON
Canonical persistence model
- single Postgres row per document
jsonb content field
Search model
- derived plain-text extraction
- Postgres full-text search (
tsvector)
- optional future lightweight heading/snippet index if we need “jump to match” UX without block-level persistence
Markdown role
- import/export boundary format
- not the primary in-memory editing model
Deliverables for this investigation / implementation
- Decide whether to use raw ProseMirror or Tiptap as the implementation layer
- Define a document schema that maps to the current UI and existing block types
- Define which current block types should be regular schema nodes versus custom node views
- Design Postgres persistence schema using
jsonb
- Design markdown import/export pipeline
- Design derived search indexing pipeline
- Evaluate rendering strategy for very large documents
- Define migration strategy from current nodular markdown documents
Acceptance criteria
- We have a documented editor architecture decision
- We have a documented storage schema for Postgres
- We have a clear decision on ProseMirror vs Tiptap usage
- We have a migration plan from current nodular markdown
- We have a plan for large-document rendering/performance testing
- We have a clear separation between canonical editor state, import/export format, and search representation
Open questions
- Should
NON_NODULAR_MARKDOWN evolve into the new primary long-form document type?
- Do we need any block-level addressability at all, or only lightweight derived heading anchors?
- What performance target should we set for 10,000+ line documents?
- Which current custom blocks can be represented as regular ProseMirror nodes without node views?
- Do we need document-level collaboration/versioning requirements in the first phase?
Initial recommendation
Proceed with a design spike around:
- ProseMirror-based editor foundation
- optional Tiptap productivity layer
- Postgres
jsonb canonical storage
- derived plain-text search fields
- large-document rendering strategy
This looks like the most robust path if we are willing to rebuild markdown from scratch while keeping the same UI and block vocabulary.
Background
We discussed the current
NodularMarkdowndesign, where each markdown block is persisted as a separatenoderecord and the editor reconstructs a linear markdown view from that hierarchy.That design appears to be optimized for treating subparts of a markdown document as first-class resources:
However, if those capabilities are not required, the current approach likely carries substantial overhead for large documents.
Problem
For very large markdown documents (for example, documents with 10,000+ lines), the current nodular model likely hurts performance because of both:
Search can still be supported without block-level storage if the document is stored as one canonical resource and we derive lightweight search/index metadata separately.
Proposal
Rebuild markdown editing around a ProseMirror-based architecture and store each document as a single canonical resource.
Editor direction
Use ProseMirror as the document model foundation.
Implementation options:
Current recommendation:
Why this direction
A ProseMirror-based editor is a much better fit for the issues we are seeing today:
Storage direction
If the backend remains PostgreSQL, store documents in PostgreSQL rather than introducing a separate NoSQL/document database.
Recommended canonical storage model:
content jsonbas the canonical ProseMirror/Tiptap documentcontent_text textas flattened plain text for search/snippets/export helperssearch_vector tsvectorderived fromcontent_textschema_version inttitle,updated_at, etc.Storage recommendation
Use PostgreSQL
jsonbrather than:Recommended role split:
jsonbtsvectorRecommended product direction
If block-level addressability is not needed, default to a single-resource document model for long-form markdown.
Reserve nodular / block-addressable documents only if we explicitly need features like:
Important caveat
Changing storage alone will not solve all large-document performance issues.
Even with a single persisted document, a 10,000-line editor may still be slow unless we also improve rendering strategy.
We should plan for:
Suggested implementation shape
Canonical editor model
Canonical persistence model
jsonbcontent fieldSearch model
tsvector)Markdown role
Deliverables for this investigation / implementation
jsonbAcceptance criteria
Open questions
NON_NODULAR_MARKDOWNevolve into the new primary long-form document type?Initial recommendation
Proceed with a design spike around:
jsonbcanonical storageThis looks like the most robust path if we are willing to rebuild markdown from scratch while keeping the same UI and block vocabulary.