A lightweight, zero-dependency Node.js module for detecting text format types from strings. Accurately identifies plain text, markdown, ASCII art, code blocks, HTML, JSON, and XML.
- π― Accurate Detection: Uses multiple heuristics to identify text formats
- π Zero Dependencies: Pure JavaScript implementation
- π¦ ESM Support: Modern ES Module syntax
- π§ͺ Well Tested: Comprehensive test coverage with Mocha and Chai
- π¨ ASCII Art Detection: Recognizes box drawing, Unicode art, ANSI sequences
- π Markdown Support: Detects headings, lists, links, code blocks, and more
- π Multiple Formats: Supports plain, markdown, ascii, code, html, json, xml
pnpm add @profullstack/text-type-detectionOr with npm:
npm install @profullstack/text-type-detectionimport { detectTextFormat } from '@profullstack/text-type-detection';
const text = '# Hello World\n\nThis is a markdown document.';
const result = detectTextFormat(text);
console.log(result);
// {
// text_format: 'markdown',
// asciiArt: 0.000,
// markdown: 0.180,
// reasons: {
// ascii: [],
// markdown: ['heading'],
// codePenaltyApplied: false
// },
// stats: { lines: 3, mean: 20.67, std: 12.34, symD: 0.05, alpha: 0.75 }
// }const plainText = 'This is just a simple sentence.';
const result = detectTextFormat(plainText);
console.log(result.text_format); // 'plain'const markdown = `# Title
## Subtitle
- Item 1
- Item 2
[Link](https://example.com)`;
const result = detectTextFormat(markdown);
console.log(result.text_format); // 'markdown'
console.log(result.reasons.markdown); // ['heading', 'list', 'link']const asciiArt = `βββββββββββββββ
β Hello β
β World β
βββββββββββββββ`;
const result = detectTextFormat(asciiArt);
console.log(result.text_format); // 'ascii'
console.log(result.asciiArt); // 0.680 (high score)const code = `\`\`\`javascript
function hello() {
console.log("Hello");
}
\`\`\``;
const result = detectTextFormat(code);
console.log(result.text_format); // 'code'const html = '<div class="container"><p>Hello World</p></div>';
const result = detectTextFormat(html);
console.log(result.text_format); // 'html'const json = '{"name": "John", "age": 30}';
const result = detectTextFormat(json);
console.log(result.text_format); // 'json'const xml = '<?xml version="1.0"?><root><item>test</item></root>';
const result = detectTextFormat(xml);
console.log(result.text_format); // 'xml'Analyzes the input text and returns a detection result object.
text(string): The text to analyze
An object with the following properties:
-
text_format(string): The detected format type'plain'- Plain text'markdown'- Markdown formatted text'ascii'- ASCII art or box drawing'code'- Code block (fenced)'html'- HTML markup'json'- JSON data'xml'- XML document
-
asciiArt(number): ASCII art confidence score (0.000 - 1.000) -
markdown(number): Markdown confidence score (0.000 - 1.000) -
reasons(object): Detailed detection reasonsascii(array): List of ASCII art indicators foundmarkdown(array): List of markdown features foundcodePenaltyApplied(boolean): Whether code penalty was applied
-
stats(object): Statistical analysis of the textlines(number): Number of linesmean(number): Mean line lengthstd(number): Standard deviation of line lengthssymD(number): Symbol density ratioalpha(number): Alphanumeric character ratio
The module detects ASCII art using multiple indicators:
- Unicode Art Characters: Box drawing (U+2500-U+257F), block elements (U+2580-U+259F), Braille patterns (U+2800-U+28FF), geometric shapes (U+25A0-U+25FF)
- Border Lines: Lines composed primarily of border characters (+, -, |, _, =, etc.)
- Consistent Width: Lines with similar lengths (low standard deviation)
- Symbol Density: High ratio of symbols to alphanumeric characters
- Character Runs: Repeated sequences of the same character
- ANSI Sequences: Terminal color codes and formatting
- Trailing Spaces: Intentional spacing for alignment
Markdown features detected include:
- Headings: ATX-style (#) and Setext-style (underlined)
- Lists: Unordered (-/*/+) and ordered (1.)
- Links:
[text](url)format - Images:
format - Code: Inline backticks and fenced code blocks
- Blockquotes: Lines starting with >
- Tables: Pipe-separated columns
- Emphasis: Bold (**) and italic (*)
- Horizontal Rules: ---, ***, ___
- Front Matter: YAML metadata blocks
Code blocks are identified by:
- Fenced code blocks (``` or ~~~)
- Indentation patterns
- Programming language keywords
- Syntax characters (semicolons, braces, brackets)
# Clone the repository
git clone https://github.com/profullstack/text-type-detection.git
cd text-type-detection
# Install dependencies
pnpm install# Run all tests
pnpm test
# Run tests in watch mode
pnpm run test:watch# Run ESLint
pnpm run lint
# Fix linting issues
pnpm run lint:fix
# Format code with Prettier
pnpm run format- Node.js >= 20.0.0
- ESM support
MIT
Contributions are welcome! Please feel free to submit a Pull Request.
Profullstack, Inc. https://profullstack.com