Skip to content

A lightweight, zero-dependency Node.js module for detecting text format types from strings. Accurately identifies plain text, markdown, ASCII art, code blocks, HTML, JSON, and XML.

Notifications You must be signed in to change notification settings

profullstack/text-type-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

@profullstack/text-type-detection

A lightweight, zero-dependency Node.js module for detecting text format types from strings. Accurately identifies plain text, markdown, ASCII art, code blocks, HTML, JSON, and XML.

Features

  • 🎯 Accurate Detection: Uses multiple heuristics to identify text formats
  • πŸš€ Zero Dependencies: Pure JavaScript implementation
  • πŸ“¦ ESM Support: Modern ES Module syntax
  • πŸ§ͺ Well Tested: Comprehensive test coverage with Mocha and Chai
  • 🎨 ASCII Art Detection: Recognizes box drawing, Unicode art, ANSI sequences
  • πŸ“ Markdown Support: Detects headings, lists, links, code blocks, and more
  • πŸ” Multiple Formats: Supports plain, markdown, ascii, code, html, json, xml

Installation

pnpm add @profullstack/text-type-detection

Or with npm:

npm install @profullstack/text-type-detection

Usage

Basic Example

import { detectTextFormat } from '@profullstack/text-type-detection';

const text = '# Hello World\n\nThis is a markdown document.';
const result = detectTextFormat(text);

console.log(result);
// {
//   text_format: 'markdown',
//   asciiArt: 0.000,
//   markdown: 0.180,
//   reasons: {
//     ascii: [],
//     markdown: ['heading'],
//     codePenaltyApplied: false
//   },
//   stats: { lines: 3, mean: 20.67, std: 12.34, symD: 0.05, alpha: 0.75 }
// }

Detecting Different Formats

Plain Text

const plainText = 'This is just a simple sentence.';
const result = detectTextFormat(plainText);
console.log(result.text_format); // 'plain'

Markdown

const markdown = `# Title

## Subtitle

- Item 1
- Item 2

[Link](https://example.com)`;

const result = detectTextFormat(markdown);
console.log(result.text_format); // 'markdown'
console.log(result.reasons.markdown); // ['heading', 'list', 'link']

ASCII Art

const asciiArt = `β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Hello     β”‚
β”‚   World     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜`;

const result = detectTextFormat(asciiArt);
console.log(result.text_format); // 'ascii'
console.log(result.asciiArt); // 0.680 (high score)

Code Block

const code = `\`\`\`javascript
function hello() {
  console.log("Hello");
}
\`\`\``;

const result = detectTextFormat(code);
console.log(result.text_format); // 'code'

HTML

const html = '<div class="container"><p>Hello World</p></div>';
const result = detectTextFormat(html);
console.log(result.text_format); // 'html'

JSON

const json = '{"name": "John", "age": 30}';
const result = detectTextFormat(json);
console.log(result.text_format); // 'json'

XML

const xml = '<?xml version="1.0"?><root><item>test</item></root>';
const result = detectTextFormat(xml);
console.log(result.text_format); // 'xml'

API

detectTextFormat(text: string): DetectionResult

Analyzes the input text and returns a detection result object.

Parameters

  • text (string): The text to analyze

Returns

An object with the following properties:

  • text_format (string): The detected format type

    • 'plain' - Plain text
    • 'markdown' - Markdown formatted text
    • 'ascii' - ASCII art or box drawing
    • 'code' - Code block (fenced)
    • 'html' - HTML markup
    • 'json' - JSON data
    • 'xml' - XML document
  • asciiArt (number): ASCII art confidence score (0.000 - 1.000)

  • markdown (number): Markdown confidence score (0.000 - 1.000)

  • reasons (object): Detailed detection reasons

    • ascii (array): List of ASCII art indicators found
    • markdown (array): List of markdown features found
    • codePenaltyApplied (boolean): Whether code penalty was applied
  • stats (object): Statistical analysis of the text

    • lines (number): Number of lines
    • mean (number): Mean line length
    • std (number): Standard deviation of line lengths
    • symD (number): Symbol density ratio
    • alpha (number): Alphanumeric character ratio

Detection Heuristics

ASCII Art Detection

The module detects ASCII art using multiple indicators:

  • Unicode Art Characters: Box drawing (U+2500-U+257F), block elements (U+2580-U+259F), Braille patterns (U+2800-U+28FF), geometric shapes (U+25A0-U+25FF)
  • Border Lines: Lines composed primarily of border characters (+, -, |, _, =, etc.)
  • Consistent Width: Lines with similar lengths (low standard deviation)
  • Symbol Density: High ratio of symbols to alphanumeric characters
  • Character Runs: Repeated sequences of the same character
  • ANSI Sequences: Terminal color codes and formatting
  • Trailing Spaces: Intentional spacing for alignment

Markdown Detection

Markdown features detected include:

  • Headings: ATX-style (#) and Setext-style (underlined)
  • Lists: Unordered (-/*/+) and ordered (1.)
  • Links: [text](url) format
  • Images: ![alt](url) format
  • Code: Inline backticks and fenced code blocks
  • Blockquotes: Lines starting with >
  • Tables: Pipe-separated columns
  • Emphasis: Bold (**) and italic (*)
  • Horizontal Rules: ---, ***, ___
  • Front Matter: YAML metadata blocks

Code Detection

Code blocks are identified by:

  • Fenced code blocks (``` or ~~~)
  • Indentation patterns
  • Programming language keywords
  • Syntax characters (semicolons, braces, brackets)

Development

Setup

# Clone the repository
git clone https://github.com/profullstack/text-type-detection.git
cd text-type-detection

# Install dependencies
pnpm install

Running Tests

# Run all tests
pnpm test

# Run tests in watch mode
pnpm run test:watch

Linting and Formatting

# Run ESLint
pnpm run lint

# Fix linting issues
pnpm run lint:fix

# Format code with Prettier
pnpm run format

Requirements

  • Node.js >= 20.0.0
  • ESM support

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Author

Profullstack, Inc. https://profullstack.com

Repository

https://github.com/profullstack/text-type-detection

About

A lightweight, zero-dependency Node.js module for detecting text format types from strings. Accurately identifies plain text, markdown, ASCII art, code blocks, HTML, JSON, and XML.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published