Automated analysis tool for comparing old and new documentation sites during migration.
npm install# Using default URLs
node analyze.js
# Specify custom URLs
node analyze.js https://old-site.com https://new-site.com
# Specify custom sitemap URL (3rd argument)
node analyze.js https://old-site.com https://new-site.com https://old-site.com/custom-sitemap.xml
# Specify custom heading selectors (4th argument)
node analyze.js https://docs.merge.dev https://merge.ferndocs.com "" "h1,h2,h3,.custom-heading"Custom Heading Selectors (Priority Fallback System):
The script tries selectors in order from left to right and uses the first match found.
Example: h1,h2,h3,.custom-heading
- First tries to find
h1(standard heading) - If no
h1, triesh2 - If no
h2, triesh3 - If none found, falls back to
.custom-heading
Default: h1,h2,h3
Sitemap Auto-Detection:
The script first tries common sitemap locations:
/sitemap.xml/sitemap_index.xml/sitemap-index.xml/sitemaps/sitemap.xml/sitemap/sitemap.xml/docs/sitemap.xml
If not found, it prompts you to enter the sitemap URL manually.
3-Tier Analysis:
- Critical (404s): Pages that don't exist on the new site
- Warnings:
- Wrong redirect destinations (H1 doesn't match)
- Content mismatches (heading structure differs by >40%)
- Pass: Pages that migrated successfully
Performance:
- Adjust concurrency in
analyze.jsby changingCONCURRENCYconstant (line 11) - Default: 25 concurrent pages
- Higher values = faster but more resource intensive
The tool generates migration-report-[timestamp].html with:
- Summary statistics
- Critical issues to fix immediately
- Warnings to investigate
- Passed pages with match percentages
- Direct links to open any page on old/new site
- Heading comparisons for content mismatches
The report automatically opens in your browser when analysis completes.
Interactive Features:
- Dismiss False Positives: Click "Dismiss (False Positive)" on warnings that aren't real issues
- Confirm Issues: Click "Confirm Issue" to mark items that need attention
- Filter Views: Use buttons to filter (Show All, Active Only, Dismissed Only, Confirmed Only)
- Persistent State: Your dismissals/confirmations are saved in localStorage and persist across page reloads
Passed pages are cached to migration-cache.json and skipped on subsequent runs.
Use --full to force a complete rescan (e.g., after deploying fixes):
node analyze.js https://old-site.com https://new-site.com "" "" --fullDuring analysis, a live-updating report is available:
- File:
migration-report-live.html - Auto-refreshes every 5 seconds
- Shows results as they come in
- Automatically deleted when analysis completes
The tool captures screenshots for issues:
- Directory:
screenshots/ - Format: JPEG at 50% quality (small file size)
- Naming:
[path]-old.jpgand[path]-new.jpg - Captured for all warnings and critical issues
- Viewable in the HTML report via "Show/Hide Screenshots" toggle
Generates a structured JSON file for AI-assisted redirect fixing:
- File:
migration-data.json - Contains all warnings and critical issues with:
- Page paths and URLs
- Heading comparisons
- Screenshot file paths
- Includes instructions for an LLM to suggest corrected redirects
- Output format: YAML redirects list
Example usage with an AI:
# After running analysis, feed the JSON to an LLM
cat migration-data.json | claude "Review these issues and suggest redirects"Generates a ready-to-use redirects configuration:
- File:
redirects-[timestamp].mdx - Contains three sections:
- Verified Redirects: Working correctly (content matches)
- Needs Review: Redirects with heading mismatches
- Missing Redirects: 404s that need destinations
- Includes implementation snippets for Fern (
fern.config.yml)