- Overview
- Phase 1 - Context Builder
- Phase 2 - String Transformations
- Phase 3 - AST Processing
- Phase 4 - Format Generation
- Pipeline Entry Points
- Performance & Validation
- Extension & Migration Notes
Legal Markdown JS ships with a four-phase processing pipeline that removes
duplicate remark executions and lets every consumer share a cached AST. The
architecture is implemented in src/core/pipeline and was introduced while
working on the incremental pipeline initiative (Issue #122), with Phase 2
formalized in Issue #149. The phases are:
| Phase | Responsibility | Key Outputs |
|---|---|---|
| 1 | Parse frontmatter, resolve force-commands, merge options | ProcessingContext |
| 2 | Apply string transformations before AST parsing | Preprocessed content, field mappings |
| 3 | Run the remark processor exactly once | Cached LegalMarkdownProcessorResult (content, AST, metadata, exports) |
| 4 | Generate requested artefacts without re-processing | HTML, PDF, Markdown, metadata files |
flowchart LR
A[Phase 1\nContext Builder] --> B[Phase 2\nString Transformations]
B --> C[Phase 3\nAST Processing]
C --> D[Phase 4\nFormat Generator]
A -.->|ProcessingContext| B
B -.->|Preprocessed Content| C
C -.->|LegalMarkdownProcessorResult| D
D -->|Generated files| E[CLI / Integrations]
The pipeline modules are exported from src/core/pipeline/index.ts, making them
available to CLI services, integrations and future API consumers.
See Also:
- String Transformations - Detailed Phase 2 documentation
- Remark Integration - Phase 3 AST plugins
Phase 1 lives in src/core/pipeline/context-builder.ts. It is responsible for
transforming raw markdown + CLI options into a normalized ProcessingContext:
- Parses YAML frontmatter via
parseYamlFrontMatter - Extracts, renders and applies force-commands (
extractForceCommands,parseForceCommands,applyForceCommands) - Merges CLI options, force-commands and additional metadata
- Enables field tracking automatically when highlighting is requested
- Provides validation helpers (
validateProcessingContext) and metadata merging (mergeMetadata) for downstream code
The returned ProcessingContext bundles the raw content (with and without
YAML), resolved options, merged metadata and the base path that the remark phase
uses for relative imports. Debug logging is emitted when options.debug is true
to make tracing easier.
Added in Issue #149 - Formalizes the previously undocumented "Phase 1.5"
Phase 2 performs string-level transformations on raw markdown content BEFORE
remark AST parsing. These transformations are implemented in
src/core/pipeline/string-transformations.ts and handle patterns that would be
fragmented and impossible to match once the content is parsed into an AST.
Some features cannot work as remark plugins because the AST fragments multi-line patterns. For example, a multi-line optional clause with markdown formatting:
[l. **Warranties**
The seller provides warranties.]{includeWarranties}When remark parses this into an AST, it splits the pattern across multiple nodes
(paragraph, text, strong, etc.), making it impossible for a plugin to match the
complete [content]{condition} pattern.
Phase 2 applies transformations in a specific order:
- Field Pattern Normalization - Convert custom patterns (like
|field|) to standard{{field}}format - Optional Clauses Processing - Evaluate
[content]{condition}patterns and conditionally include/exclude content - Template Loops Processing - Expand Handlebars blocks (
{{#each}},{{#if}}) with data
- Optional Clauses:
[content]{condition}- Can span multiple lines with markdown formatting - Template Loops:
{{#each items}}...{{/each}}- Handlebars-powered iteration and conditionals - Field Normalization: Ensures all fields use consistent
{{field}}syntax before Handlebars compilation
See String Transformations for detailed documentation and decision tree for when to use string transforms vs AST plugins.
Two optional flags can be enabled to route field tracking through internal tokens instead of legacy inline spans generated directly in Phase 2:
astFieldTrackinglogicBranchHighlighting
When astFieldTracking is enabled:
- Phase 2 emits internal tags (
<lm-field>,<lm-logic-start>,<lm-logic-end>) rather than final<span class="legal-field ...">. - Phase 3 converts those tags into final tracking spans while resolving
remaining
{{...}}fields in the same AST pass. - Legacy span-heuristic guards are bypassed in this route.
When both flags are enabled and field tracking is on, winner conditional
branches (#if / #unless) are annotated with:
data-field="logic.branch.N"(stable DFS order)data-logic-helper="if|unless"data-logic-result="true|false"
Phase 3 uses the existing remark processor from
src/extensions/remark/legal-markdown-processor.ts. The differences introduced
by the pipeline work are:
processLegalMarkdownWithRemarknow returns aLegalMarkdownProcessorResultthat includes the processed markdown, metadata, exported file list and a cached AST (ast?: Root)- Additional metadata gathered in Phase 1 and Phase 2 is passed through with the
additionalMetadataoption so header numbering and other plugins see the consolidated state - Plugin order validation migrated next to the remark plugins. The validator and
registry live in
src/plugins/remark/plugin-order-validator.tsandsrc/plugins/remark/plugin-metadata-registry.ts, keeping ordering rules close to the implementations (see Remark Content Processing)
Because Phase 3 only runs once, all format generation steps consume the same AST and metadata snapshot, eliminating the previous "format × remark runs" combinatorial explosion.
New in Phase 3: The buildRemarkPipeline() function in
src/core/pipeline/pipeline-builder.ts serves as the single source of truth
for determining plugin execution order. This phase-based architecture prevents
subtle ordering bugs like Issue #120 (conditionals evaluating before variables
expand).
How it works:
-
Group by Phase - Plugins are grouped into 5 explicit phases:
- Phase 1: CONTENT_LOADING (imports)
- Phase 2: VARIABLE_EXPANSION (mixins, template fields)
- Phase 3: CONDITIONAL_EVAL (loops, conditionals)
- Phase 4: STRUCTURE_PARSING (headers, cross-references)
- Phase 5: POST_PROCESSING (dates, field tracking)
-
Topological Sort Within Phases - Within each phase, plugins are sorted using Kahn's algorithm based on
runBefore/runAfterconstraints -
Validate Capabilities - Ensures required capabilities (e.g.,
variables:resolved) are provided by earlier plugins -
Environment-Aware Validation - Validation mode adapts automatically:
strict: Development/CI (throws on violations)warn: Production (logs warnings, continues)silent: No validation output
Example Usage:
import { buildRemarkPipeline } from './core/pipeline';
import { GLOBAL_PLUGIN_REGISTRY } from './plugins/remark';
const pipeline = buildRemarkPipeline(
{
enabledPlugins: [
'remarkImports',
'remarkTemplateFields',
'processTemplateLoops',
],
metadata: { author: 'Jane Doe' },
options: { debug: true },
validationMode: 'strict',
},
GLOBAL_PLUGIN_REGISTRY
);
// pipeline.names: ['remarkImports', 'remarkTemplateFields', 'processTemplateLoops']
// pipeline.byPhase: Map { 1 => ['remarkImports'], 2 => ['remarkTemplateFields'], 3 => ['processTemplateLoops'] }
// pipeline.capabilities: Set { 'content:imported', 'fields:expanded', 'variables:resolved', 'conditionals:evaluated' }Critical Guarantee for Issue #120:
The phase-based ordering guarantees that remarkTemplateFields (Phase 2)
always runs before processTemplateLoops (Phase 3), ensuring variables are
expanded before conditionals evaluate. This prevents the bug where conditionals
would evaluate against unexpanded {{variable}} syntax.
The processTemplateLoops transformation (Phase 2) supports dual syntax:
- Handlebars syntax (recommended, standard): Uses native Handlebars engine
- Legacy syntax (deprecated): Custom expression evaluation (to be removed in v4.0.0)
Since v3.5.0, Legal Markdown automatically detects which syntax is used and routes to the appropriate processor. The system logs detailed migration hints when legacy syntax is detected.
Templates using Handlebars syntax benefit from:
- Industry-standard template features
- Subexpressions:
{{formatDate (addYears date 2) "legal"}} - Parent context access:
{{../parentVariable}} - Native loop helpers:
{{@index}},{{@first}},{{@last}} - 30+ registered helpers (date, number, string, math)
See docs/helpers/README.md for complete reference.
The legacy processor includes full expression evaluation for conditional blocks, providing features beyond standard Handlebars (these will be removed in v4.0.0).
Comparison Operators:
==- Equal (loose equality, works with strings and numbers)!=- Not equal>- Greater than<- Less than>=- Greater than or equal<=- Less than or equal
Boolean Operators:
&&- Logical AND (higher precedence)||- Logical OR (lower precedence)
Simple comparison:
---
city:
legal: 'madrid'
---{{#if city.legal == "madrid"}} Clausula de Madrid {{/if}}Numeric comparison:
---
contract:
amount: 50000
---{{#if contract.amount > 10000}} High value contract {{/if}}Complex boolean expression:
---
contract:
amount: 50000
jurisdiction: 'spain'
---{{#if contract.amount > 10000 && contract.jurisdiction == "spain"}} Spanish
high-value contract {{/if}}The evaluation happens in evaluateCondition() which:
- Detects boolean operators (
&&,||) → delegates toevaluateBooleanExpression() - Detects comparison operators → delegates to
evaluateComparisonExpression() - Falls back to truthiness check for simple variables
Value parsing in parseComparisonValue() handles:
- String literals (quoted with
"or') - Numeric literals (integers and decimals)
- Boolean literals (
true,false) - Null literal (
null) - Variable references (resolved via
resolveVariablePath())
This makes Legal Markdown more expressive than Handlebars, which requires helper functions for comparisons.
| Feature | Handlebars | Liquid | Legal Markdown (Legacy) |
|---|---|---|---|
| Comparison ops | ❌ (requires helpers) | ✅ | ✅ |
| Boolean operators | ❌ (requires helpers) | ✅ | ✅ |
| Syntax | {{#if (eq a b)}} |
{% if a == b %} |
{{#if a == b}} |
| Numeric comparisons | {{#if (gt amount 1000)}} |
{% if amount > 1000 %} |
{{#if amount > 1000}} |
Note: This comparison applies to the legacy syntax only. Standard Handlebars syntax is now recommended and uses Handlebars' own conditional system.
Phase 4 is implemented by src/core/pipeline/format-generator.ts and focuses on
artefact creation:
generateAllFormatswrites HTML, PDF, Markdown and metadata exports without re-running the remark phase. It orchestrates Html/Pdf generators and simple file writes from the cached AST and processed markdownprocessAndGenerateFormatsis a convenience helper that executes Phases 2, 3, and 4 together when Phase 1 already provided aProcessingContext- Highlight variants share the same cached AST and differ only by the
includeHighlightingflag passed into the Html/Pdf generators - Output directories are created on demand and comprehensive error messages are thrown when creation fails
- Every invocation returns both the generated file list and timing statistics so callers can present user feedback or feed telemetry
Two services drive the pipeline at runtime:
src/cli/service.ts-generateFormattedOutputWithOptionsperforms the four phases sequentially, ensuring HTML/PDF/Markdown/metadata artefacts come from a single remark pass and archiving reuses the processed contentsrc/cli/interactive/service.ts- the interactive CLI maps user selections to CLI options, builds the processing context, applies string transformations, runs the remark phase once, then callsgenerateAllFormatswith the resulting AST
Both services keep their existing single-format behaviour (plain remark output) for scenarios where only markdown is requested.
- Integration benchmark tests
(
tests/integration/pipeline-3-phase.integration.test.ts) confirm a ~50-75 % reduction in processing time for multi-format runs by comparing the legacy behaviour with the four-phase pipeline - Unit suites in
tests/unit/core/pipeline/context-builder.test.tsandtests/unit/core/pipeline/format-generator.test.tscover metadata merging, error handling and format generation pathways - Additional CLI and remark plugin tests were updated to assert that the AST is
preserved across output modes and plugin order validation is enforced when
requested (
tests/unit/plugins/remark/imports.unit.test.ts,tests/unit/plugins/remark/css-classes.unit.test.ts)
- The pipeline exports are additive; existing consumers that call
processLegalMarkdownWithRemarkdirectly remain supported - New helpers (
buildProcessingContext,applyStringTransformations,generateAllFormats,processAndGenerateFormats) provide clear integration points for upcoming API layers or background workers - Legacy pipeline code paths tracked in
plans/2026-03-03-phase2-phase3-span-refactor-plan.mdcan be migrated incrementally by swapping in the four-phase helpers without touching business logic - Issue #149: The
remarkClausesplugin has been removed. Optional clauses are now processed in Phase 2 (String Transformations), enabling proper handling of multi-line content with markdown formatting