Smart Archiving

Legal Markdown JS features an intelligent archiving system that automatically organizes processed files based on content analysis. The system compares original and processed content to determine the optimal archiving strategy, ensuring both templates and results are preserved appropriately.

Overview
Archive Logic
Content Comparison
Archive Scenarios
Configuration and Usage
Archive Features
Use Cases
Best Practices

Overview

The smart archiving system automatically decides whether to save one or two versions of your documents based on whether processing changed the content. This intelligent approach ensures:

Template preservation for reusable documents
Result preservation for processed outputs
Storage efficiency by avoiding unnecessary duplicates
Clear organization with descriptive file naming

Key Benefits

Automatic Decision Making: No manual choices needed
Template Safety: Original templates are never lost
Audit Trail: Complete record of processing results
Storage Optimization: Eliminates redundant file storage

Archive Logic

The smart archiving system makes decisions based on content comparison:

// Smart archiving workflow
const originalContent = readFile('document.md');
const processedContent = await processLegalMarkdown(originalContent);

if (contentsAreIdentical(originalContent, processedContent)) {
  // Archive only the original file
  archive('document.md' → 'archive/document.md');
} else {
  // Archive both versions with clear suffixes
  archive('document.md' → 'archive/document.ORIGINAL.md');   // Template
  writeFile('archive/document.PROCESSED.md', processedContent); // Result
}

Decision Tree

Document Processing
       │
       ▼
Content Comparison
       │
   ┌───┴───┐
   ▼       ▼
Identical Different
   │       │
   ▼       ▼
Single   Dual
Archive  Archive

Content Comparison

The system performs intelligent content comparison that:

Normalizes line endings (handles Windows/Unix differences)
Trims whitespace (ignores formatting differences)
Compares actual content (focuses on meaningful changes)

// Content normalization for comparison
function areContentsIdentical(content1: string, content2: string): boolean {
  const normalize = (content: string) => content.replace(/\r\n/g, '\n').trim();
  return normalize(content1) === normalize(content2);
}

Comparison Features

Cross-platform compatibility: Handles different line ending formats
Whitespace tolerance: Ignores insignificant formatting changes
Content focus: Detects meaningful content modifications
Binary safety: Works with various file encodings

Archive Scenarios

Scenario 1: Static Documents

For documents where processing doesn't change the content (e.g., documents with only frontmatter):

---
title: Legal Notice
client: Acme Corp
date: 2024-01-01
---
# Legal Notice

This is a static legal notice.

Archive Result:

archive/legal-notice.md  # Single file - template is preserved

The document contains no template variables, imports, or conditional content, so processing produces identical output.

Scenario 2: Template Documents

For documents with imports, mixins, or variable substitution:

---
title: Service Agreement
client: Acme Corp
effective_date: 2024-01-01
---
# {{title}}

@import clauses/standard-terms.md

This agreement between Legal Services Inc. and {{client}}
is effective {{formatDate(effective_date, "MMMM Do, YYYY")}}.

[Confidentiality clause applies]{client.confidentiality_required}

Archive Result:

archive/service-agreement.ORIGINAL.md   # Template file
archive/service-agreement.PROCESSED.md  # Processed result

The document contains template variables and imports, so processing produces different output that needs to be preserved separately.

Scenario 3: Conflict Resolution

When files with the same name already exist in the archive:

# First document
legal-md contract.md --archive-source ./archive
# Creates: archive/contract.md

# Second document with same name
legal-md contract.md --archive-source ./archive
# Creates: archive/contract_1.md (automatic renaming)

# With different content (template processing)
legal-md template.md --archive-source ./archive
# Creates: archive/template.ORIGINAL_1.md
#          archive/template.PROCESSED_1.md

Scenario 4: Complex Templates

For documents with multiple template features:

---
clients:
  - name: "Acme Corp"
    premium: true
  - name: "Beta Inc"
    premium: false
base_rate: 500
---

@import headers/legal-header.md

{{#clients}}
## Service Agreement for {{name}}

Rate: {{formatCurrency(base_rate, "USD")}}
{{#if premium}}
- Premium support included
- 24/7 availability
{{/if}}

{{/clients}}

Archive Result: Dual archiving (template + processed) due to multiple dynamic elements.

Configuration and Usage

Basic Usage

# Enable smart archiving with default directory
legal-md document.md --archive-source

# Custom archive directory
legal-md document.md --archive-source ./completed

# With PDF generation
legal-md document.md --pdf --highlight --archive-source ./processed

Environment Configuration

# Set default archive directory
export ARCHIVE_DIR="./processed-documents"

# Use default directory
legal-md document.md --archive-source

Programmatic Usage

import { CliService } from 'legal-markdown-js';

const cliService = new CliService({
  archiveSource: './archive',
});

// Smart archiving happens automatically
await cliService.processFile('template.md', 'output.md');

Advanced Configuration

# Archive with metadata export
legal-md template.md --archive-source ./archive --export-yaml --export-json

# Archive with custom CSS and formatting
legal-md template.md --archive-source ./archive --css custom.css --pdf --highlight

# Force archiving for debugging
legal-md template.md --archive-source ./debug --debug

Archive Features

Core Features

Intelligent Decision Making: Automatically determines whether to archive one or two files
Template Preservation: Keeps original templates intact for reuse
Result Preservation: Saves processed content for reference
Clear Naming: Uses .ORIGINAL and .PROCESSED suffixes for clarity
Conflict Resolution: Automatic renaming when files already exist
Error Handling: Graceful handling of archive failures
Cross-Platform: Works consistently across different operating systems

File Naming Convention

Scenario	Original File	Archive Result
No changes	`document.md`	`archive/document.md`
With changes	`document.md`	`archive/document.ORIGINAL.md` and `archive/document.PROCESSED.md`
Name conflict	`document.md`	`archive/document_1.md`
Complex conflict	`document.md`	`archive/document.ORIGINAL_1.md` and `archive/document.PROCESSED_1.md`

Metadata Preservation

When archiving, the system preserves:

Original YAML frontmatter in .ORIGINAL files
Processed metadata in accompanying files
File timestamps and basic metadata
Directory structure relationships

Use Cases

Document Template Management

# Process multiple templates
for template in templates/*.md; do
  legal-md "$template" --pdf --archive-source ./completed
done
# Templates with imports → dual archiving
# Static templates → single archiving

This approach allows you to:

Maintain template libraries
Preserve processing results
Track template evolution
Support template reuse

Workflow Integration

# Process and archive in production pipeline
legal-md contract-template.md --pdf --highlight --archive-source ./production-archive
# Preserves both template (for future use) and result (for records)

Benefits for workflows:

Audit compliance: Complete processing history
Template reuse: Original templates remain available
Result tracking: Processed outputs are preserved
Quality assurance: Compare inputs and outputs

Quality Assurance

# Archive for review and compliance
legal-md legal-document.md --pdf --export-json --archive-source ./compliance
# Smart archiving helps maintain audit trail

QA applications:

Compliance audits: Complete document history
Process validation: Verify template processing
Change tracking: Monitor document evolution
Backup strategy: Automated document preservation

Batch Processing

# Process entire directories with archiving
find ./contracts -name "*.md" -exec legal-md {} --pdf --archive-source ./processed-contracts \;

Batch benefits:

Efficient processing: Handle multiple documents
Consistent archiving: Uniform organization
Storage optimization: Intelligent space usage
Scalable workflows: Handle large document sets

Development and Testing

# Archive with debug information
legal-md test-template.md --archive-source ./debug --debug --export-yaml

# Compare processing results
diff ./debug/test-template.ORIGINAL.md ./debug/test-template.PROCESSED.md

Development advantages:

Template debugging: Compare inputs and outputs
Processing verification: Validate template logic
Performance testing: Track processing changes
Regression testing: Ensure consistent results

Best Practices

1. Archive Directory Organization

Structure your archive directories logically:

archives/
├── production/        # Live document processing
├── staging/          # Testing and validation
├── development/      # Template development
└── compliance/       # Audit and regulatory

2. Archive Naming Strategy

Use descriptive archive directory names:

# ✅ Good - descriptive purposes
--archive-source ./completed-contracts
--archive-source ./processed-invoices
--archive-source ./compliance-archive

# ❌ Avoid - generic names
--archive-source ./archive
--archive-source ./files
--archive-source ./done

3. Regular Archive Maintenance

# Periodic cleanup of old archives
find ./archives -name "*.md" -mtime +365 -delete

# Compress old archives
tar -czf archives-2023.tar.gz ./archives/2023/

4. Integration with Version Control

# Archive to version-controlled directory
legal-md template.md --archive-source ./git-tracked-archives

# Exclude from git if too large
echo "large-archives/" >> .gitignore

5. Backup Strategy

# Regular archive backups
rsync -av ./archives/ ./backup/archives/

# Cloud backup integration
aws s3 sync ./archives/ s3://legal-document-archives/

6. Monitoring and Logging

# Log archiving operations
legal-md template.md --archive-source ./archive --debug 2>&1 | tee archive.log

# Monitor archive directory sizes
du -sh ./archives/*/

7. Access Control

# Set appropriate permissions
chmod 750 ./archives/
chmod 640 ./archives/*.md

# Restrict access to sensitive archives
chown legal-team:legal-group ./archives/compliance/

Integration with Other Features

With Export Features

# Archive with metadata export
legal-md contract.md --archive-source ./archive --export-yaml --export-json

Result:

archive/contract.ORIGINAL.md
archive/contract.PROCESSED.md
archive/contract.yaml
archive/contract.json

With PDF Generation

# Archive with PDF output
legal-md template.md --pdf --archive-source ./archive

PDF files are also archived alongside markdown files.

With Force Commands

Templates with force commands are intelligently archived:

---
title: 'Auto-configured Document'
force_commands: '--pdf --highlight --archive-source ./auto-archive'
---

The force commands trigger archiving automatically.

Troubleshooting

Common Issues

Archive directory creation fails:

# Ensure parent directory exists
mkdir -p ./path/to/archive
legal-md document.md --archive-source ./path/to/archive

Permission denied:

# Check directory permissions
ls -la ./archive/
chmod 755 ./archive/

Disk space issues:

# Check available space
df -h ./archive/
# Clean old archives if needed

Debug Mode

# Enable debug output for archiving
legal-md document.md --archive-source ./archive --debug

This shows:

Content comparison results
Archive decision reasoning
File operation details
Error messages and warnings

FilesExpand file tree

smart-archiving.md

Latest commit

History