Processing Pipeline Architecture

Overview
Phase 1 - Context Builder
Phase 2 - String Transformations
Phase 3 - AST Processing
Phase 4 - Format Generation
Pipeline Entry Points
Performance & Validation
Extension & Migration Notes

Overview

Legal Markdown JS ships with a four-phase processing pipeline that removes duplicate remark executions and lets every consumer share a cached AST. The architecture is implemented in src/core/pipeline and was introduced while working on the incremental pipeline initiative (Issue #122), with Phase 2 formalized in Issue #149. The phases are:

Phase	Responsibility	Key Outputs
1	Parse frontmatter, resolve force-commands, merge options	`ProcessingContext`
2	Apply string transformations before AST parsing	Preprocessed content, field mappings
3	Run the remark processor exactly once	Cached `LegalMarkdownProcessorResult` (content, AST, metadata, exports)
4	Generate requested artefacts without re-processing	HTML, PDF, Markdown, metadata files

flowchart LR
    A[Phase 1\nContext Builder] --> B[Phase 2\nString Transformations]
    B --> C[Phase 3\nAST Processing]
    C --> D[Phase 4\nFormat Generator]

    A -.->|ProcessingContext| B
    B -.->|Preprocessed Content| C
    C -.->|LegalMarkdownProcessorResult| D
    D -->|Generated files| E[CLI / Integrations]

The pipeline modules are exported from src/core/pipeline/index.ts, making them available to CLI services, integrations and future API consumers.

See Also:

String Transformations - Detailed Phase 2 documentation
Remark Integration - Phase 3 AST plugins

Phase 1 - Context Builder

Phase 1 lives in src/core/pipeline/context-builder.ts. It is responsible for transforming raw markdown + CLI options into a normalized ProcessingContext:

Parses YAML frontmatter via parseYamlFrontMatter
Extracts, renders and applies force-commands (extractForceCommands, parseForceCommands, applyForceCommands)
Merges CLI options, force-commands and additional metadata
Enables field tracking automatically when highlighting is requested
Provides validation helpers (validateProcessingContext) and metadata merging (mergeMetadata) for downstream code

The returned ProcessingContext bundles the raw content (with and without YAML), resolved options, merged metadata and the base path that the remark phase uses for relative imports. Debug logging is emitted when options.debug is true to make tracing easier.

Phase 2 - String Transformations

Added in Issue #149 - Formalizes the previously undocumented "Phase 1.5"

Phase 2 performs string-level transformations on raw markdown content BEFORE remark AST parsing. These transformations are implemented in src/core/pipeline/string-transformations.ts and handle patterns that would be fragmented and impossible to match once the content is parsed into an AST.

Why String Transformations Are Necessary

Some features cannot work as remark plugins because the AST fragments multi-line patterns. For example, a multi-line optional clause with markdown formatting:

[l. **Warranties**

The seller provides warranties.]{includeWarranties}

When remark parses this into an AST, it splits the pattern across multiple nodes (paragraph, text, strong, etc.), making it impossible for a plugin to match the complete [content]{condition} pattern.

Transformation Order

Phase 2 applies transformations in a specific order:

Field Pattern Normalization - Convert custom patterns (like |field|) to standard {{field}} format
Optional Clauses Processing - Evaluate [content]{condition} patterns and conditionally include/exclude content
Template Loops Processing - Expand Handlebars blocks ({{#each}}, {{#if}}) with data

Key Features

Optional Clauses: [content]{condition} - Can span multiple lines with markdown formatting
Template Loops: {{#each items}}...{{/each}} - Handlebars-powered iteration and conditionals
Field Normalization: Ensures all fields use consistent {{field}} syntax before Handlebars compilation

See String Transformations for detailed documentation and decision tree for when to use string transforms vs AST plugins.

Experimental AST-First Field Tracking (Phase 2 -> Phase 3)

Two optional flags can be enabled to route field tracking through internal tokens instead of legacy inline spans generated directly in Phase 2:

astFieldTracking
logicBranchHighlighting

When astFieldTracking is enabled:

Phase 2 emits internal tags (<lm-field>, <lm-logic-start>, <lm-logic-end>) rather than final <span class="legal-field ...">.
Phase 3 converts those tags into final tracking spans while resolving remaining {{...}} fields in the same AST pass.
Legacy span-heuristic guards are bypassed in this route.

When both flags are enabled and field tracking is on, winner conditional branches (#if / #unless) are annotated with:

data-field="logic.branch.N" (stable DFS order)
data-logic-helper="if|unless"
data-logic-result="true|false"

Phase 3 - AST Processing

Phase 3 uses the existing remark processor from src/extensions/remark/legal-markdown-processor.ts. The differences introduced by the pipeline work are:

processLegalMarkdownWithRemark now returns a LegalMarkdownProcessorResult that includes the processed markdown, metadata, exported file list and a cached AST (ast?: Root)
Additional metadata gathered in Phase 1 and Phase 2 is passed through with the additionalMetadata option so header numbering and other plugins see the consolidated state
Plugin order validation migrated next to the remark plugins. The validator and registry live in src/plugins/remark/plugin-order-validator.ts and src/plugins/remark/plugin-metadata-registry.ts, keeping ordering rules close to the implementations (see Remark Content Processing)

Because Phase 3 only runs once, all format generation steps consume the same AST and metadata snapshot, eliminating the previous "format × remark runs" combinatorial explosion.

Pipeline Builder: Single Source of Truth for Plugin Ordering

New in Phase 3: The buildRemarkPipeline() function in src/core/pipeline/pipeline-builder.ts serves as the single source of truth for determining plugin execution order. This phase-based architecture prevents subtle ordering bugs like Issue #120 (conditionals evaluating before variables expand).

How it works:

Group by Phase - Plugins are grouped into 5 explicit phases:
- Phase 1: CONTENT_LOADING (imports)
- Phase 2: VARIABLE_EXPANSION (mixins, template fields)
- Phase 3: CONDITIONAL_EVAL (loops, conditionals)
- Phase 4: STRUCTURE_PARSING (headers, cross-references)
- Phase 5: POST_PROCESSING (dates, field tracking)
Topological Sort Within Phases - Within each phase, plugins are sorted using Kahn's algorithm based on runBefore/runAfter constraints
Validate Capabilities - Ensures required capabilities (e.g., variables:resolved) are provided by earlier plugins
Environment-Aware Validation - Validation mode adapts automatically:
- strict: Development/CI (throws on violations)
- warn: Production (logs warnings, continues)
- silent: No validation output

Example Usage:

import { buildRemarkPipeline } from './core/pipeline';
import { GLOBAL_PLUGIN_REGISTRY } from './plugins/remark';

const pipeline = buildRemarkPipeline(
  {
    enabledPlugins: [
      'remarkImports',
      'remarkTemplateFields',
      'processTemplateLoops',
    ],
    metadata: { author: 'Jane Doe' },
    options: { debug: true },
    validationMode: 'strict',
  },
  GLOBAL_PLUGIN_REGISTRY
);

// pipeline.names: ['remarkImports', 'remarkTemplateFields', 'processTemplateLoops']
// pipeline.byPhase: Map { 1 => ['remarkImports'], 2 => ['remarkTemplateFields'], 3 => ['processTemplateLoops'] }
// pipeline.capabilities: Set { 'content:imported', 'fields:expanded', 'variables:resolved', 'conditionals:evaluated' }

Critical Guarantee for Issue #120:

The phase-based ordering guarantees that remarkTemplateFields (Phase 2) always runs before processTemplateLoops (Phase 3), ensuring variables are expanded before conditionals evaluate. This prevents the bug where conditionals would evaluate against unexpanded {{variable}} syntax.

Template Loop Conditional Evaluation

The processTemplateLoops transformation (Phase 2) supports dual syntax:

Handlebars syntax (recommended, standard): Uses native Handlebars engine
Legacy syntax (deprecated): Custom expression evaluation (to be removed in v4.0.0)

Since v3.5.0, Legal Markdown automatically detects which syntax is used and routes to the appropriate processor. The system logs detailed migration hints when legacy syntax is detected.

Handlebars Engine (New Standard)

Templates using Handlebars syntax benefit from:

Industry-standard template features
Subexpressions: {{formatDate (addYears date 2) "legal"}}
Parent context access: {{../parentVariable}}
Native loop helpers: {{@index}}, {{@first}}, {{@last}}
30+ registered helpers (date, number, string, math)

See docs/helpers/README.md for complete reference.

Legacy Expression Evaluation (Deprecated)

The legacy processor includes full expression evaluation for conditional blocks, providing features beyond standard Handlebars (these will be removed in v4.0.0).

Supported Operators

Comparison Operators:

== - Equal (loose equality, works with strings and numbers)
!= - Not equal
> - Greater than
< - Less than
>= - Greater than or equal
<= - Less than or equal

Boolean Operators:

&& - Logical AND (higher precedence)
|| - Logical OR (lower precedence)

Examples

Simple comparison:

---
city:
  legal: 'madrid'
---

{{#if city.legal == "madrid"}} Clausula de Madrid {{/if}}

Numeric comparison:

---
contract:
  amount: 50000
---

{{#if contract.amount > 10000}} High value contract {{/if}}

Complex boolean expression:

---
contract:
  amount: 50000
  jurisdiction: 'spain'
---

{{#if contract.amount > 10000 && contract.jurisdiction == "spain"}} Spanish
high-value contract {{/if}}

Implementation Details

The evaluation happens in evaluateCondition() which:

Detects boolean operators (&&, ||) → delegates to evaluateBooleanExpression()
Detects comparison operators → delegates to evaluateComparisonExpression()
Falls back to truthiness check for simple variables

Value parsing in parseComparisonValue() handles:

String literals (quoted with " or ')
Numeric literals (integers and decimals)
Boolean literals (true, false)
Null literal (null)
Variable references (resolved via resolveVariablePath())

This makes Legal Markdown more expressive than Handlebars, which requires helper functions for comparisons.

Comparison with Other Template Engines

Feature	Handlebars	Liquid	Legal Markdown (Legacy)
Comparison ops	❌ (requires helpers)	✅	✅
Boolean operators	❌ (requires helpers)	✅	✅
Syntax	`{{#if (eq a b)}}`	`{% if a == b %}`	`{{#if a == b}}`
Numeric comparisons	`{{#if (gt amount 1000)}}`	`{% if amount > 1000 %}`	`{{#if amount > 1000}}`

Note: This comparison applies to the legacy syntax only. Standard Handlebars syntax is now recommended and uses Handlebars' own conditional system.

Phase 4 - Format Generation

Phase 4 is implemented by src/core/pipeline/format-generator.ts and focuses on artefact creation:

generateAllFormats writes HTML, PDF, Markdown and metadata exports without re-running the remark phase. It orchestrates Html/Pdf generators and simple file writes from the cached AST and processed markdown
processAndGenerateFormats is a convenience helper that executes Phases 2, 3, and 4 together when Phase 1 already provided a ProcessingContext
Highlight variants share the same cached AST and differ only by the includeHighlighting flag passed into the Html/Pdf generators
Output directories are created on demand and comprehensive error messages are thrown when creation fails
Every invocation returns both the generated file list and timing statistics so callers can present user feedback or feed telemetry

Pipeline Entry Points

Two services drive the pipeline at runtime:

src/cli/service.ts - generateFormattedOutputWithOptions performs the four phases sequentially, ensuring HTML/PDF/Markdown/metadata artefacts come from a single remark pass and archiving reuses the processed content
src/cli/interactive/service.ts - the interactive CLI maps user selections to CLI options, builds the processing context, applies string transformations, runs the remark phase once, then calls generateAllFormats with the resulting AST

Both services keep their existing single-format behaviour (plain remark output) for scenarios where only markdown is requested.

Performance & Validation

Integration benchmark tests (tests/integration/pipeline-3-phase.integration.test.ts) confirm a ~50-75 % reduction in processing time for multi-format runs by comparing the legacy behaviour with the four-phase pipeline
Unit suites in tests/unit/core/pipeline/context-builder.test.ts and tests/unit/core/pipeline/format-generator.test.ts cover metadata merging, error handling and format generation pathways
Additional CLI and remark plugin tests were updated to assert that the AST is preserved across output modes and plugin order validation is enforced when requested (tests/unit/plugins/remark/imports.unit.test.ts, tests/unit/plugins/remark/css-classes.unit.test.ts)

Extension & Migration Notes

The pipeline exports are additive; existing consumers that call processLegalMarkdownWithRemark directly remain supported
New helpers (buildProcessingContext, applyStringTransformations, generateAllFormats, processAndGenerateFormats) provide clear integration points for upcoming API layers or background workers
Legacy pipeline code paths tracked in plans/2026-03-03-phase2-phase3-span-refactor-plan.md can be migrated incrementally by swapping in the four-phase helpers without touching business logic
Issue #149: The remarkClauses plugin has been removed. Optional clauses are now processed in Phase 2 (String Transformations), enabling proper handling of multi-line content with markdown formatting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Processing Pipeline Architecture

Overview

Phase 1 - Context Builder

Phase 2 - String Transformations

Why String Transformations Are Necessary

Transformation Order

Key Features

Experimental AST-First Field Tracking (Phase 2 -> Phase 3)

Phase 3 - AST Processing

Pipeline Builder: Single Source of Truth for Plugin Ordering

Template Loop Conditional Evaluation

Handlebars Engine (New Standard)

Legacy Expression Evaluation (Deprecated)

Supported Operators

Examples

Implementation Details

Comparison with Other Template Engines

Phase 4 - Format Generation

Pipeline Entry Points

Performance & Validation

Extension & Migration Notes

FilesExpand file tree

03_processing_pipeline.md

Latest commit

History

03_processing_pipeline.md

File metadata and controls

Processing Pipeline Architecture

Overview

Phase 1 - Context Builder

Phase 2 - String Transformations

Why String Transformations Are Necessary

Transformation Order

Key Features

Experimental AST-First Field Tracking (Phase 2 -> Phase 3)

Phase 3 - AST Processing

Pipeline Builder: Single Source of Truth for Plugin Ordering

Template Loop Conditional Evaluation

Handlebars Engine (New Standard)

Legacy Expression Evaluation (Deprecated)

Supported Operators

Examples

Implementation Details

Comparison with Other Template Engines

Phase 4 - Format Generation

Pipeline Entry Points

Performance & Validation

Extension & Migration Notes