You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: v3.1 performance enhancements and bug fixes (#105)
* ⚡ Lazy load profanity dictionaries for faster startup
Refactored `DictionaryLoader` to load language dictionaries on demand.
This significantly reduces import time and memory usage when only specific languages are needed.
- Extracted language file mapping to `LANGUAGE_FILES` constant.
- Removed eager loading in `__init__`.
- Implemented `_load_dictionary` for lazy loading.
- Updated `get_words` and `get_all_words` to use lazy loading.
- Added `tests/test_dictionary_lazy.py` to verify lazy loading behavior.
Co-authored-by: thegdsks <39922405+thegdsks@users.noreply.github.com>
* Optimize checkPhraseContext to use matchWord for targeted phrase matching
- Modified `checkPhraseContext` in `packages/js/src/nlp/contextAnalyzer.ts` to filter phrases by `matchWord`.
- Added test case `packages/js/tests/context-optimization.test.ts` to verify the fix.
- This prevents unrelated positive phrases (e.g., "the bomb") from whitelisting other profanities (e.g., "shit").
Co-authored-by: thegdsks <39922405+thegdsks@users.noreply.github.com>
* feat(context-analyzer): use contextWords for detailed reasoning
- Update generateReason to include found positive/negative indicators in the return string.
- Remove unused variable lint suppression.
Co-authored-by: thegdsks <39922405+thegdsks@users.noreply.github.com>
* feat: cache compiled regexes in Filter class for performance optimization
Co-authored-by: thegdsks <39922405+thegdsks@users.noreply.github.com>
* feat: implement matchWord support for domain-specific filtering in ContextAnalyzer
Updates `ContextAnalyzer.isDomainWhitelisted` to use the `matchWord` argument.
Introduces `GAMING_ACCEPTABLE_WORDS` to restrict whitelisting in gaming contexts
to only specific acceptable words (e.g. 'kill', 'shoot', 'badass'), rather than
whitelisting all profanity when gaming terms are present.
Adds regression tests verifying the fix.
Co-authored-by: thegdsks <39922405+thegdsks@users.noreply.github.com>
* Optimize regex compilation in Filter class using caching
This change introduces a cache for compiled regex patterns in the `Filter` class.
Previously, `_get_regex` would re-escape and re-compile regex patterns for every word
checked, even if the word had been processed before. This optimization stores the
compiled regex in `self._regex_cache` keyed by the word, avoiding redundant computations.
Performance Benchmark (50 iterations):
- is_profane: ~14.11ms -> ~11.60ms (~17.8% improvement)
- check_profanity: ~14.44ms -> ~11.45ms (~20.7% improvement)
Co-authored-by: thegdsks <39922405+thegdsks@users.noreply.github.com>
* fix: add .npmrc with legacy-peer-deps=true to fix CI build
Co-authored-by: thegdsks <39922405+thegdsks@users.noreply.github.com>
* ci: fix npm ci dependency conflict by using legacy peer deps
Updates the JS CI workflow to use `npm ci --legacy-peer-deps` to bypass
conflict between @tensorflow/tfjs (4.x) and @tensorflow-models/toxicity (1.2.2).
Co-authored-by: thegdsks <39922405+thegdsks@users.noreply.github.com>
* Optimize checkPhraseContext to use matchWord and fix CI dependencies
- Modified `checkPhraseContext` in `packages/js/src/nlp/contextAnalyzer.ts` to filter phrases by `matchWord`.
- Added test case `packages/js/tests/context-optimization.test.ts` to verify the fix.
- Added `overrides` to root `package.json` to resolve `@tensorflow-models/toxicity` peer dependency conflicts causing CI failures.
- Updated `package-lock.json` to reflect resolved dependencies.
Co-authored-by: thegdsks <39922405+thegdsks@users.noreply.github.com>
* feat(context-analyzer): use contextWords for detailed reasoning
- Update generateReason to include found positive/negative indicators in the return string.
- Remove unused variable lint suppression.
- Fix CI dependency conflict by adding overrides for @tensorflow/tfjs packages.
Co-authored-by: thegdsks <39922405+thegdsks@users.noreply.github.com>
* fix(python): bundle dictionaries in package for pip install
Fixes#70
- Copy language dictionaries into glin_profanity/data/dictionaries/
- Update dictionary loader to use bundled path with fallback
- Update pyproject.toml to include dictionary files in wheel
- Dictionaries now work when installed via pip
* fix: apply copilot review suggestions
- Fix unreachable NEGATIVE_PHRASES branch in contextAnalyzer.ts
(phrase.includes(matchWord) was always false for prefix phrases)
- Make test assertions explicit in repro_issue.test.ts
- Use dynamic language count in test_dictionary_lazy.py
* fix: address CodeRabbit review suggestions
- pyproject.toml: Use force-include instead of shared-data for bundling
- contextAnalyzer.ts: Normalize domain whitelist entries to lowercase
- dutch.json: Remove trailing 'g' artifacts from 20+ words
- globalWhitelist.json: Remove duplicate "Analytics" entry
- italian.json: Fix encoding artifact in "fare una" entry
- japanese.json: Remove generic words causing false positives (嫌い, 女の子)
- norwegian.json: Rename from Norwegian.json for consistency
- spanish.json: Fix typo "sesinato" → "asesinato"
- turkish.json: Remove false positives (allah, ana, coca cola, cola)
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
0 commit comments