- Remark Integration Architecture
- Overview
- Relationship to the Three-Phase Pipeline
- Plugin Suite
- Plugin Metadata Registry & Order Validation
- Legal Markdown Processor Result
- Diagnostics & Tooling
- Testing Coverage
The remark integration is the core AST-processing engine for Legal Markdown JS.
It replaces legacy regex transforms with unified/remark plugins that understand
markdown structure and enables safe manipulation of document content. This
module powers Phase 2 of the processing pipeline and exposes a single entry
point: processLegalMarkdownWithRemark.
Phase 2 of the pipeline (docs/architecture/03_processing_pipeline.md)
delegates all markdown transformation work to the remark processor. The
surrounding phases set the stage:
- Phase 1 (Context Builder) provides merged CLI options, force-commands and
metadata through a
ProcessingContext - Phase 2 (Remark Processor) consumes that context and runs the plugin suite once, caching the resulting AST
- Phase 3 (Format Generator) reuses the cached AST and processed markdown to produce HTML, PDF, Markdown and metadata exports without re-processing
This separation keeps the remark integration focused on AST manipulation while allowing external services (CLI, API, workers) to orchestrate input/output concerns without duplicating work.
Historical Issue: An earlier implementation of generatePdfFormats() called
PdfGenerator.generatePdf(markdown), which internally regenerated HTML from
markdown. This violated the 3-phase pipeline's "process once, output many"
principle, causing:
- Double HTML conversion (once for HTML output, once inside PDF generation)
- Inconsistent output between saved HTML files and embedded PDF HTML
- Path resolution issues for CSS files (especially
highlight.css) - Performance degradation
Solution: Phase 3 now uses a two-step approach:
- Generate HTML once using
HtmlGenerator.generateHtml(markdown, options) - Convert that pre-generated HTML to PDF using
PdfGenerator.generatePdfFromHtml(html, outputPath, options)
This ensures the PDF uses exactly the same HTML that would be saved to disk, with all CSS already resolved and applied.
Verification: See tests/integration/no-reprocessing.integration.test.ts
which uses spies to verify:
generateHtml()is called exactly once per variant (normal/highlight)generatePdfFromHtml()is used instead ofgeneratePdf()- The remark processor runs exactly once regardless of output formats
Remark plugins live under src/plugins/remark. They follow unified conventions,
receive the shared options object, and can rely on the metadata collected during
Phase 1.
The plugin suite is organized into 5 explicit processing phases to ensure deterministic execution order and prevent subtle bugs like variables evaluating after conditionals (Issue #120):
| Phase | Group | Plugins | Description |
|---|---|---|---|
| 1 | Content Loading | remarkImports |
Load and merge external content via @import directives. Must run first to establish complete document AST |
| 2 | Variable Expansion | remarkMixins, remarkTemplateFields |
Expand mixin definitions and resolve {{field}} template variables. Critical: runs BEFORE conditionals |
| 3 | Conditional Evaluation | processTemplateLoops, remarkClauses |
Evaluate conditional logic and loops. Variables are guaranteed to be resolved at this point |
| 4 | Structure Parsing | remarkLegalHeadersParser, remarkHeaders, remarkCrossReferences, remarkCrossReferencesAst |
Parse document structure, legal headers and cross-references. Requires fully-expanded content |
| 5 | Post-Processing | remarkDates, remarkSignatureLines, remarkFieldTracking, remarkDebugAst |
Final transformations, date resolution, field tracking and debugging utilities |
This phase-based architecture ensures that:
- Variables are always expanded before conditional evaluation (fixes Issue #120)
- Imports load content before any transformation plugins run
- Document structure is parsed after all content expansions complete
- Field tracking and reporting happen last to capture the final state
- AST-based imports ensure embedded documents retain structure and HTML
- Template field resolution integrates helper functions and guards against double-processing by modifying nodes directly
- Field tracking differentiates
filled,emptyandlogicfields and is the basis for highlight output and reporting (docs/architecture/05_field_tracking.md) - Header parsing externalises legal-numbering rules so downstream tools can
reference consistent identifiers and CSS classes (see updated table in
docs/output/css-classes.md)
The Legal Markdown processing pipeline includes an automatic plugin order validation system that ensures plugins are executed in the correct sequence. This is critical because some plugins depend on the output of others, and incorrect ordering can lead to processing errors or incorrect document output.
The plugin order validation system uses metadata-driven dependency declarations with topological sorting to determine the correct plugin execution order.
Each plugin registers its metadata including phase assignment and
capabilities in src/plugins/remark/plugin-metadata-registry.ts:
export const PLUGIN_METADATA_LIST: PluginMetadata[] = [
// PHASE 1: CONTENT_LOADING
{
name: 'remarkImports',
phase: ProcessingPhase.CONTENT_LOADING,
description: 'Process @import directives and insert content as AST nodes',
capabilities: ['content:imported', 'metadata:merged'],
runBefore: ['remarkLegalHeadersParser', 'remarkMixins'],
required: false,
version: '2.0.0',
},
// PHASE 2: VARIABLE_EXPANSION
{
name: 'remarkMixins',
phase: ProcessingPhase.VARIABLE_EXPANSION,
description: 'Expand mixin definitions from frontmatter',
requiresPhases: [ProcessingPhase.CONTENT_LOADING],
requiresCapabilities: ['metadata:merged'],
capabilities: ['mixins:expanded'],
runBefore: ['remarkTemplateFields'],
required: false,
},
{
name: 'remarkTemplateFields',
phase: ProcessingPhase.VARIABLE_EXPANSION,
description: 'Expand template fields ({{field}}) with metadata values',
capabilities: ['fields:expanded', 'variables:resolved'],
runAfter: ['remarkMixins'],
required: true,
},
// PHASE 3: CONDITIONAL_EVAL
{
name: 'processTemplateLoops',
phase: ProcessingPhase.CONDITIONAL_EVAL,
description: 'Process conditional loops and template logic',
requiresPhases: [ProcessingPhase.VARIABLE_EXPANSION],
requiresCapabilities: ['variables:resolved'],
capabilities: ['conditionals:evaluated'],
required: true,
},
{
name: 'remarkClauses',
phase: ProcessingPhase.CONDITIONAL_EVAL,
description: 'Process conditional clauses {{#if condition}}...{{/if}}',
required: false,
},
// PHASE 4: STRUCTURE_PARSING
{
name: 'remarkLegalHeadersParser',
phase: ProcessingPhase.STRUCTURE_PARSING,
description: 'Parse legal header syntax (l., ll., lll.) into AST metadata',
requiresPhases: [ProcessingPhase.CONTENT_LOADING],
capabilities: ['headers:parsed'],
runBefore: ['remarkHeaders', 'remarkCrossReferences'],
required: true,
},
{
name: 'remarkCrossReferences',
phase: ProcessingPhase.STRUCTURE_PARSING,
description: 'Process cross-references between document sections',
requiresCapabilities: ['headers:parsed'],
capabilities: ['crossrefs:resolved'],
runAfter: ['remarkLegalHeadersParser'],
runBefore: ['remarkHeaders'],
required: false,
},
{
name: 'remarkHeaders',
phase: ProcessingPhase.STRUCTURE_PARSING,
description: 'Process and number legal headers',
requiresCapabilities: ['headers:parsed'],
capabilities: ['headers:numbered'],
runAfter: ['remarkLegalHeadersParser', 'remarkCrossReferences'],
required: true,
},
// PHASE 5: POST_PROCESSING
{
name: 'remarkDates',
phase: ProcessingPhase.POST_PROCESSING,
description: 'Process date references (@today syntax)',
required: false,
},
{
name: 'remarkFieldTracking',
phase: ProcessingPhase.POST_PROCESSING,
description: 'Track field usage for highlighting and reporting',
requiresCapabilities: ['fields:expanded'],
required: false,
},
];Key Metadata Fields:
phase: Explicit phase assignment (1-5) ensures deterministic orderingcapabilities: Semantic tags describing what the plugin produces (e.g.,'variables:resolved','headers:parsed')requiresPhases: Which phases must complete before this plugin runsrequiresCapabilities: Which capabilities must be available before this plugin runsrunBefore/runAfter: Fine-grained ordering within a phase (legacy constraints, still supported)required: Whether the plugin is mandatory for correct processing
Plugins declare dependencies using a two-tier system:
1. Phase-Level Dependencies (Coarse-Grained)
phase: Mandatory field assigning the plugin to one of 5 phasesrequiresPhases: Array of phases that must complete before this plugin runsrequiresCapabilities: Array of semantic capabilities that must be available (e.g.,'variables:resolved')capabilities: Array of semantic tags this plugin provides (e.g.,'headers:parsed')
2. Plugin-Level Dependencies (Fine-Grained, within phases)
runBefore: Array of plugin names that must run AFTER this pluginrunAfter: Array of plugin names that must run BEFORE this plugin
Example: Phase-based ordering prevents Issue #120
// remarkTemplateFields runs in Phase 2 (VARIABLE_EXPANSION)
{
phase: ProcessingPhase.VARIABLE_EXPANSION,
capabilities: ['variables:resolved']
}
// processTemplateLoops runs in Phase 3 (CONDITIONAL_EVAL)
// and explicitly requires variables to be resolved first
{
phase: ProcessingPhase.CONDITIONAL_EVAL,
requiresPhases: [ProcessingPhase.VARIABLE_EXPANSION],
requiresCapabilities: ['variables:resolved']
}This guarantees that variables always expand before conditionals evaluate,
preventing the subtle bug where conditionals would evaluate against unexpanded
{{variable}} syntax instead of actual values.
These constraints create a dependency graph that the validator resolves using topological sorting within each phase.
The legal-markdown-processor.ts automatically validates plugin order when
validatePluginOrder option is enabled (default in development):
// Validate plugin order if requested
if (options.validatePluginOrder) {
const validationResult = validatePluginOrder(orderedPluginNames);
if (!validationResult.isValid) {
console.warn('[Plugin Order Validation] Issues detected:');
validationResult.violations.forEach(violation => {
console.warn(` - ${violation}`);
});
if (validationResult.hasCriticalViolations) {
throw new Error(
'Critical plugin order violations detected. ' +
'The processing pipeline may produce incorrect results.'
);
}
}
}The PluginOrderValidator class
(src/plugins/remark/plugin-order-validator.ts) provides comprehensive
validation logic including:
- Dependency violations:
runBefore/runAfterconstraint violations - Conflicts: Plugins that cannot be used together
- Circular dependencies: Detects impossible ordering scenarios
- Capability validation: Ensures required capabilities are provided before use
- Phase validation: Verifies plugins don't require later phases
import { PluginOrderValidator } from './plugins/remark/plugin-order-validator';
import { GLOBAL_PLUGIN_REGISTRY } from './plugins/remark/plugin-metadata-registry';
const validator = new PluginOrderValidator(GLOBAL_PLUGIN_REGISTRY);
const result = validator.validate(
['remarkImports', 'remarkTemplateFields', 'remarkHeaders'],
{
throwOnError: false,
logWarnings: true,
debug: false,
}
);
if (!result.valid) {
console.error('Validation errors:', result.errors);
console.warn('Validation warnings:', result.warnings);
console.log('Suggested order:', result.suggestedOrder);
}The validator returns a detailed result object:
interface PluginOrderValidationResult {
valid: boolean; // Overall validation status
errors: PluginOrderError[]; // Critical errors
warnings: PluginOrderWarning[]; // Non-critical warnings
suggestedOrder?: string[]; // Suggested correct order (if validation failed)
}
interface PluginOrderError {
type:
| 'dependency-violation'
| 'conflict'
| 'circular-dependency'
| 'capability-missing' // NEW in Phase 3
| 'phase-dependency'; // NEW in Phase 3
plugin: string;
relatedPlugin?: string;
message: string;
}NEW in Phase 3: The validator now automatically checks:
-
Capability Dependencies: Ensures that when a plugin requires a capability (via
requiresCapabilities), there's at least one earlier plugin in the execution order that provides it (viacapabilities) -
Phase Dependencies: Validates that plugins with
requiresPhasesonly reference earlier phases, preventing impossible dependencies
// Example: Capability validation catches this error
const result = validator.validate(['processTemplateLoops']); // Missing remarkTemplateFields
// Error: Plugin "processTemplateLoops" requires capability "variables:resolved"
// but no earlier plugin provides itContent imports must always run in Phase 1 to establish the complete document:
// ✅ CORRECT - remarkImports in Phase 1
{
phase: ProcessingPhase.CONTENT_LOADING; // Phase 1
}
// ❌ INCORRECT - Would violate phase ordering
{
phase: ProcessingPhase.VARIABLE_EXPANSION; // Phase 2 - TOO LATE
}Critical for Issue #120: Variables MUST expand before conditionals evaluate:
// ✅ CORRECT - Variables in Phase 2, Conditionals in Phase 3
{
name: 'remarkTemplateFields',
phase: ProcessingPhase.VARIABLE_EXPANSION, // Phase 2
capabilities: ['variables:resolved']
}
{
name: 'processTemplateLoops',
phase: ProcessingPhase.CONDITIONAL_EVAL, // Phase 3
requiresCapabilities: ['variables:resolved']
}
// ❌ INCORRECT - Would cause conditionals to evaluate against {{var}} syntax
// (This was the bug in Issue #120)
{
name: 'processTemplateLoops',
phase: ProcessingPhase.VARIABLE_EXPANSION, // Phase 2 - WRONG
}
{
name: 'remarkTemplateFields',
phase: ProcessingPhase.CONDITIONAL_EVAL, // Phase 3 - WRONG
}Header parsing must happen before header numbering, both in Phase 4:
// ✅ CORRECT - Both in Phase 4, but runBefore/runAfter ensures order
{
name: 'remarkLegalHeadersParser',
phase: ProcessingPhase.STRUCTURE_PARSING, // Phase 4
runBefore: ['remarkHeaders']
}
{
name: 'remarkHeaders',
phase: ProcessingPhase.STRUCTURE_PARSING, // Phase 4
runAfter: ['remarkLegalHeadersParser']
}
// ❌ INCORRECT - Wrong phase assignment
{
name: 'remarkHeaders',
phase: ProcessingPhase.VARIABLE_EXPANSION, // Phase 2 - TOO EARLY
}Cross-references must extract |key| patterns before header numbering:
// ✅ CORRECT - Explicit ordering within Phase 4
{
name: 'remarkCrossReferences',
phase: ProcessingPhase.STRUCTURE_PARSING, // Phase 4
runBefore: ['remarkHeaders']
}
{
name: 'remarkHeaders',
phase: ProcessingPhase.STRUCTURE_PARSING, // Phase 4
runAfter: ['remarkCrossReferences']
}Symptom: Console warnings about plugin order violations
Solution:
- Check the violation messages for specific ordering requirements
- Review plugin metadata in
plugin-metadata-registry.ts - Reorder plugins in processor to match suggested order
- Run tests to verify correct behavior
Symptom: Document output doesn't match expectations
Diagnostic Steps:
- Enable plugin order validation:
{ validatePluginOrder: true } - Check console for validation warnings
- Enable debug mode:
{ debug: true } - Review plugin execution order in logs
- Compare with expected order from metadata registry
Example Debug Output:
const result = await processLegalMarkdownWithRemark(content, {
validatePluginOrder: true,
debug: true,
});
// Output:
// [Plugin Order Validation] Checking order: remarkImports, remarkTemplateFields, ...
// [Plugin Order Validation] ✓ Valid order
// [remarkImports] Processing 2 imports
// [remarkTemplateFields] Expanding 5 template fields
// [remarkHeaders] Processing 3 headersSymptom: Error: "Circular dependency detected"
Solution:
- Review plugin metadata for conflicting constraints
- Remove unnecessary
runBefore/runAfterdeclarations - Ensure constraints form a directed acyclic graph (DAG)
Example Fix:
// ❌ INCORRECT - Circular dependency
{
name: 'pluginA',
runAfter: ['pluginB']
},
{
name: 'pluginB',
runAfter: ['pluginA']
}
// ✅ CORRECT - Remove one constraint
{
name: 'pluginA',
runAfter: ['pluginB']
},
{
name: 'pluginB',
// No runAfter needed
}When adding a new plugin to the processing pipeline:
-
Determine the correct phase for your plugin:
- Phase 1 (CONTENT_LOADING): Loads external content
- Phase 2 (VARIABLE_EXPANSION): Expands variables/mixins
- Phase 3 (CONDITIONAL_EVAL): Evaluates conditions/loops
- Phase 4 (STRUCTURE_PARSING): Parses document structure
- Phase 5 (POST_PROCESSING): Final transformations
-
Register plugin metadata in
plugin-metadata-registry.ts:
{
name: 'remarkMyNewPlugin',
phase: ProcessingPhase.STRUCTURE_PARSING, // Choose appropriate phase
description: 'Description of what it does',
// Phase-level dependencies
requiresPhases: [ProcessingPhase.VARIABLE_EXPANSION], // If needed
requiresCapabilities: ['variables:resolved'], // If needed
capabilities: ['my-feature:processed'], // What this provides
// Fine-grained ordering within phase (if needed)
runAfter: ['remarkImports'],
runBefore: ['remarkHeaders'],
required: false,
}- Add plugin to processor in
legal-markdown-processor.ts:
processor.use(remarkMyNewPlugin, {
metadata: result.metadata,
debug: options.debug,
});- Add unit tests to verify plugin behavior:
# Create test file
tests/unit/plugins/remark/my-new-plugin.unit.test.ts
# Test phase assignment, capabilities, and interactions- Run validation to ensure correct ordering:
npm test -- tests/integration/plugin-order-validation.integration.test.ts
npm test -- tests/unit/plugins/remark/plugin-metadata-registry.unit.test.ts- Verify behavior with integration tests showing plugin interactions
- Choose the Right Phase: Assign plugins to the earliest phase where they can safely run
- Use Capabilities: Declare semantic capabilities to make dependencies explicit and self-documenting
- Minimal Constraints: Only declare necessary
runBefore/runAfterconstraints within a phase - Document Dependencies: Explain why ordering matters in plugin description and comments
- Test Phase Assignment: Add unit tests to
plugin-metadata-registry.unit.test.tsfor new plugins - Integration Tests: Add tests for plugin interaction scenarios, especially cross-phase dependencies
- Enable Validation: Always validate in development mode to catch ordering issues early
- Monitor Output: Watch for validation warnings in production logs
Phase Assignment Guidelines:
- If your plugin loads content from files → Phase 1 (CONTENT_LOADING)
- If your plugin expands variables or mixins → Phase 2 (VARIABLE_EXPANSION)
- If your plugin evaluates conditionals or loops → Phase 3 (CONDITIONAL_EVAL)
- If your plugin parses structure (headers, refs) → Phase 4 (STRUCTURE_PARSING)
- If your plugin does final formatting or tracking → Phase 5 (POST_PROCESSING)
processLegalMarkdownWithRemark returns a LegalMarkdownProcessorResult with:
content� processed markdown (respectingnoIndent,noHeaders, etc.)metadata� merged metadata, including values injected during Phase 1ast� cached mdast tree available for downstream format generationexportedFiles� metadata exports written during processingreports� field tracking and diagnostic reports when enabled
Phase 3 consumes this object directly, so any plugin that needs to expose extra information can attach it here without reprocessing documents.
- Enable
debug: trueto see plugin-by-plugin logging, order validation output and timing information throughout the pipeline - Use
remarkDebugAstlocally to inspect AST transformations - Dedicated utilities (
tests/unit/utils/plugin-order-validator.unit.test.ts) cover the validator behaviour, ensuring dependency errors are actionable - Repository scripts continue to expose quick smoke checks for HTML comment preservation, legal header numbering and cross-reference resolution
tests/integration/plugin-order-validation.integration.test.tsasserts that order violations are detected and reported with actionable guidance- Suite-specific unit tests (e.g.
tests/unit/plugins/remark/imports.unit.test.ts,tests/unit/plugins/remark/css-classes.unit.test.ts) verify plugin-specific behaviours such as AST insertion, CSS class application and highlight markup - Field tracking expectations remain documented and enforced in
tests/unit/plugins/remark/html-comments.unit.test.tsand related cases tests/integration/legacy-remark-parity.integration.test.tsverifies functional equivalence between legacy processors and remark pluginstests/integration/imports-with-html.integration.test.tsprovides end-to-end verification of HTML preservation through the import pipeline