Advanced Features

Deep dive into glin-profanity's advanced detection capabilities.

Leetspeak Detection

Detect obfuscated profanity using character substitutions.

Configuration

import { Filter } from 'glin-profanity';

const filter = new Filter({
  detectLeetspeak: true,
  leetspeakLevel: 'moderate' // 'basic' | 'moderate' | 'aggressive'
});

filter.isProfane('f4ck');    // true
filter.isProfane('sh1t');    // true
filter.isProfane('a$$');     // true

Leetspeak Levels

Level	Detects	Examples
`basic`	Common substitutions	`4→a`, `3→e`, `1→i`, `0→o`
`moderate`	Extended patterns	`@→a`, `$→s`, `!→i`, `7→t`
`aggressive`	Complex obfuscation	`ph→f`, `><→x`, visual lookalikes

Unicode Normalization

Handle Unicode homoglyphs and lookalike characters.

Configuration

const filter = new Filter({
  normalizeUnicode: true
});

// Detects Cyrillic/Greek lookalikes
filter.isProfane('fսck');  // true (Armenian 'ս')
filter.isProfane('shіt');  // true (Cyrillic 'і')

Normalization Process

Input: "fսck" (with Armenian 'ս')
  ↓ Unicode Normalization
Normalized: "fuck"
  ↓ Dictionary Lookup
Result: Profanity detected

ML-Powered Toxicity Detection

TensorFlow.js integration for context-aware detection.

Setup

npm install @tensorflow/tfjs @tensorflow-models/toxicity

Usage

import { loadToxicityModel, checkToxicity } from 'glin-profanity/ml';

// Load model (one-time)
await loadToxicityModel({ threshold: 0.9 });

// Check text
const result = await checkToxicity("You're terrible at this game");

console.log(result);
// {
//   toxic: true,
//   categories: {
//     toxicity: 0.92,
//     insult: 0.85,
//     identity_attack: 0.12,
//     threat: 0.08,
//     obscene: 0.15,
//     sexual_explicit: 0.03
//   }
// }

Hybrid Approach

Combine dictionary + ML for comprehensive detection:

import { Filter } from 'glin-profanity';
import { checkToxicity, loadToxicityModel } from 'glin-profanity/ml';

await loadToxicityModel();

async function moderateContent(text: string) {
  const filter = new Filter({ languages: ['english'] });

  // Fast dictionary check first
  const dictResult = filter.checkProfanity(text);

  if (dictResult.containsProfanity) {
    return { flagged: true, source: 'dictionary', ...dictResult };
  }

  // ML check for edge cases
  const mlResult = await checkToxicity(text);

  if (mlResult.toxic) {
    return { flagged: true, source: 'ml', ...mlResult };
  }

  return { flagged: false };
}

Result Caching

LRU cache for high-throughput applications.

Configuration

const filter = new Filter({
  cacheResults: true,
  maxCacheSize: 1000  // LRU cache limit
});

// First call: computed
filter.checkProfanity("hello world");  // ~0.5ms

// Subsequent calls: cached
filter.checkProfanity("hello world");  // ~0.01ms

Cache Management

// Check cache size
console.log(filter.getCacheSize());  // 1

// Clear cache
filter.clearCache();
console.log(filter.getCacheSize());  // 0

Severity Levels

Categorize matches by detection confidence.

Configuration

import { Filter, SeverityLevel } from 'glin-profanity';

const filter = new Filter({
  severityLevels: true,
  minSeverity: SeverityLevel.EXACT  // Only exact matches
});

const result = filter.checkProfanity("This is sh1t");

console.log(result.severityMap);
// { "sh1t": SeverityLevel.FUZZY }

Severity Filtering

Level	Value	Use Case
`EXACT`	1	Strict moderation, zero tolerance
`FUZZY`	2	Balanced detection, allows some edge cases

Custom Word Lists

Extend or restrict the dictionary.

Adding Custom Words

const filter = new Filter({
  customWords: ['brandname', 'competitor', 'secretproject']
});

filter.isProfane('brandname');  // true

Whitelisting Words

const filter = new Filter({
  ignoreWords: ['damn', 'hell']  // Allow these words
});

filter.isProfane('damn');  // false
filter.isProfane('hell');  // false

Dynamic Updates

// Runtime modifications
filter.addWord('newbadword');
filter.removeWord('allowedword');

Multi-Language Detection

Check text across multiple language dictionaries simultaneously.

Specific Languages

const filter = new Filter({
  languages: ['english', 'spanish', 'french']
});

All Languages

const filter = new Filter({
  allLanguages: true  // Check all 23 supported languages
});

Performance Considerations

Config	Dictionary Size	Speed
Single language	~500 words	21M ops/sec
3 languages	~1,500 words	18M ops/sec
All languages	~8,000 words	12M ops/sec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Advanced Features

Leetspeak Detection

Configuration

Leetspeak Levels

Unicode Normalization

Configuration

Normalization Process

ML-Powered Toxicity Detection

Setup

Usage

Hybrid Approach

Result Caching

Configuration

Cache Management

Severity Levels

Configuration

Severity Filtering

Custom Word Lists

Adding Custom Words

Whitelisting Words

Dynamic Updates

Multi-Language Detection

Specific Languages

All Languages

Performance Considerations

See Also

Uh oh!

FilesExpand file tree

advanced-features.md

Latest commit

History

advanced-features.md

File metadata and controls

Advanced Features

Leetspeak Detection

Configuration

Leetspeak Levels

Unicode Normalization

Configuration

Normalization Process

ML-Powered Toxicity Detection

Setup

Usage

Hybrid Approach

Result Caching

Configuration

Cache Management

Severity Levels

Configuration

Severity Filtering

Custom Word Lists

Adding Custom Words

Whitelisting Words

Dynamic Updates

Multi-Language Detection

Specific Languages

All Languages

Performance Considerations

See Also