allenai · sergeyf · Feb 12, 2026 · Feb 12, 2026 · Feb 12, 2026 · Feb 12, 2026
diff --git a/.pytest_failures.txt b/.pytest_failures.txt
@@ -0,0 +1,52 @@
+ACL Chinese name tests	'Tong Zhang'	True	Tong Zhang	True	Zhang Tong
+ACL Chinese name tests	'Bei Yu'	True	Bei Yu	True	Yu Bei
+ACL Chinese name tests	'Fei Yu'	True	Fei Yu	True	Yu Fei
+ACL order preservation tests	'Hao Fei'	True	Hao Fei	True	Fei Hao
+ACL order preservation tests	'Hao-Ran Wei'	True	Hao-Ran Wei	True	Wei Hao-Ran
+ACL order preservation tests	'Haoran Jin'	True	Hao-Ran Jin	True	Jin Haoran
+ACL order preservation tests	'Haoran Que'	True	Hao-Ran Que	True	Que Haoran
+ACL order preservation tests	'Haoran Ye'	True	Hao-Ran Ye	True	Ye Haoran
+ACL order preservation tests	'Junjie Fang'	True	Jun-Jie Fang	True	Fang Junjie
+ACL order preservation tests	'Junjie Peng'	True	Jun-Jie Peng	True	Peng Junjie
+ACL order preservation tests	'Junjie Ye'	True	Jun-Jie Ye	True	Ye Junjie
+ACL order preservation tests	'Kun Kuang'	True	Kun Kuang	True	Kuang Kun
+ACL order preservation tests	'Lecheng Zheng'	True	Lecheng Zheng	True	Zheng Lecheng
+ACL order preservation tests	'Qianlong Du'	True	Qian-Long Du	True	Du Qianlong
+ACL order preservation tests	'Qianlong Wang'	True	Qian-Long Wang	True	Wang Qianlong
+ACL order preservation tests	'Xinlei Chen'	True	Xin-Lei Chen	True	Chen Xinlei
+ACL order preservation tests	'Xinlei He'	True	Xin-Lei He	True	He Xinlei
+ACL order preservation tests	'Yao Shu'	True	Yao Shu	True	Shu Yao
+ACL order preservation tests	'Yuwen Wang'	True	Yuwen Wang	True	Wang Yuwen
+ACL order preservation tests	'Yuxuan Dong'	True	Yuxuan Dong	True	Dong Yuxuan
+ACL order preservation tests	'Yuxuan Gu'	True	Yuxuan Gu	True	Gu Yuxuan
+Basic Chinese name tests	'Feng Cha'	True	Cha Feng	True	Feng Cha
+Basic Chinese name tests	'He Cha'	True	Cha He	True	He Cha
+Basic Chinese name tests	'Hu Cha'	True	Cha Hu	True	Hu Cha
+Basic Chinese name tests	'Li Gong'	True	Gong Li	True	Li Gong
+Basic Chinese name tests	'Gao Wei'	True	Wei Gao	True	Gao Wei
+Basic Chinese name tests	'Kong Kung'	True	Kung Kong	True	Kong Kung
+Basic Chinese name tests	'Lu Xun'	True	Xun Lu	True	Lu Xun
+Basic Chinese name tests	'Qin Shi'	True	Shi Qin	True	Qin Shi
+Basic Chinese name tests	'Xun Zhou'	True	Xun Zhou	True	Zhou Xun
+Basic Chinese name tests	'Zhou Xun'	True	Xun Zhou	True	Zhou Xun
+Compound name tests	'Leung Ka Fai'	True	Ka-Fai Leung	True	Leung-Ka Fai
+Miscellaneous tests	'Jin Hua'	True	Hua Jin	True	Jin Hua
+Miscellaneous tests	'Miao Yu'	True	Miao Yu	True	Yu Miao
+Miscellaneous tests	'Yu Miao'	True	Miao Yu	True	Yu Miao
+Miscellaneous tests	'Wen Jing'	True	Jing Wen	True	Wen Jing
+Miscellaneous tests	'Jing Wen'	True	Jing Wen	True	Wen Jing
+ML ranker test data tests	'Gui Rui'	True	Rui Gui	True	Gui Rui
+ML ranker test data tests	'Shu Yao'	True	Yao Shu	True	Shu Yao
+ML ranker test data tests	'Huang Yu Chang'	True	Yu-Chang Huang	True	Huang-Yu Chang
+ML ranker test data tests	'Jia Jian Feng'	True	Jian-Feng Jia	True	Jia-Jian Feng
+ML ranker test data tests	'Fan Jia Liang'	True	Jia-Liang Fan	True	Fan-Jia Liang
+ML ranker test data tests	'Wei Wen Xing'	True	Wen-Xing Wei	True	Wei-Wen Xing
+ML ranker test data tests	'Xi Zhao'	True	Zhao Xi	True	Xi Zhao
+ML ranker test data tests	'Fu Meng Ting'	True	Meng-Ting Fu	True	Fu-Meng Ting
+ML ranker test data tests	'ke chen'	True	Ke Chen	True	Chen Ke
+ML ranker test data tests	'mi zhang'	True	Mi Zhang	True	Zhang Mi
+ML ranker test data tests	'xu feng'	True	Feng Xu	True	Xu Feng
+ML ranker test data tests	'yang guang'	True	Guang Yang	True	Yang Guang
+Mixed scripts tests	'Zhou（Mary）Li'	True	Li Zhou	True	Zhou Li
+Name formatting tests	'JinHua'	True	Hua Jin	True	Jin Hua
+Name formatting tests	'LinShu'	True	Shu Lin	True	Lin Shu
diff --git a/README.md b/README.md
@@ -111,7 +111,7 @@ Formatted Output
 ### 5. Performance
 
 *   **High-Performance with Caching**
-    *   The library is benchmarked to be very fast, capable of processing over 10,000 diverse names per second, and uses caching to significantly speed up the processing of repeated names.
+    *   The library is benchmarked to be very fast, capable of processing over 3,000 diverse names per second, and uses caching to significantly speed up the processing of repeated names.
 
 ## How It Works
 
@@ -320,6 +320,36 @@ for result in results:
     print(f"Processed: {result.result}")
 ```
 
+### Persistent Multi-Process Processing
+
+For high-throughput workloads, you can keep a persistent process pool alive and
+reuse worker processes across multiple calls. This avoids repeated process
+start-up overhead and works on Windows/macOS/Linux via `spawn`.
+
+```python
+from sinonym.detector import ChineseNameDetector
+
+def main():
+    detector = ChineseNameDetector()
+    names_a = ["Li Wei", "Wang Weiming", "Zhang Ming"]
+    names_b = ["Xin Liu", "Yang Li", "Chen Huang"]
+
+    # Reuse workers across many calls
+    with detector.create_persistent_multiprocess_pool(max_workers=6, chunk_size=64) as pool:
+        results_a = pool.normalize_names(names_a)
+        results_b = pool.normalize_names(names_b)
+
+    # One-off convenience wrapper (creates and closes a temporary pool)
+    single_batch = detector.process_name_batch_multiprocess(names_a, max_workers=6, chunk_size=64)
+    return results_a, results_b, single_batch
+
+if __name__ == "__main__":
+    main()
+```
+
+Use the `if __name__ == "__main__":` guard in scripts to ensure safe process
+spawning on Windows and macOS.
+
 ### When to Use Batch Processing
 
 *   **Academic Papers**: Author lists typically follow consistent formatting
@@ -394,14 +424,14 @@ If you'd like to contribute to Sinonym, here’s how to set up your development
 First, clone the repository:
 
 ```bash
-git clone https://github.com/yourusername/sinonym.git
+git clone https://github.com/allenai/sinonym.git
 cd sinonym
 ```
 
 Then, install the development dependencies:
 
 ```bash
-uv sync --extra dev
+uv sync --active --all-extras --dev
 ```
 
 ### Running Tests
@@ -422,6 +452,10 @@ uv run ruff check . --fix
 uv run ruff format .
 ```
 
+### Benchmarking & Profiling
+
+See [scripts/README.md](scripts/README.md) for benchmark, profiling, and test status scripts.
+
 ## License
 
 Sinonym is licensed under the Apache 2.0 License. See the `LICENSE` file for more details.

diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 
 [project]
 name = "sinonym"
-version = "0.2.2"
+version = "0.2.3"
 description = "Chinese Name Detection and Normalization Module"
 readme = "README.md"
 requires-python = ">=3.10"

diff --git a/scripts/README.md b/scripts/README.md
@@ -1,126 +1,65 @@
-# Sinonym Scripts Directory
+# Sinonym Scripts
 
-This directory contains utility scripts for data generation, model training, and testing of the Sinonym library.
+Utility scripts for benchmarking, profiling, testing, and model training.
 
-## Scripts Overview
+## Active Scripts
 
-### 1. `train_ml_classifier_for_chinese_vs_japanese.py` ✅ ACTIVE
-**Purpose**: Train the machine learning classifier that distinguishes Chinese names from Japanese names when written in Chinese characters.
+### `check_test_status.py`
+Runs the full test suite and reports individual test case failures with detailed diagnostics. Runs performance tests separately, then exits 0/1 based on whether the failure count matches the expected baseline (`EXPECTED_FAILURES = 52`). Improvements (fewer failures) also pass; regressions fail.
 
-**Status**: ✅ **Successfully implemented and integrated**
-
-**What it does**:
-- Downloads Chinese (1.2M) and Japanese (180K) name corpora from GitHub
-- Filters names to keep only those written in Chinese/Japanese characters (kanji)
-- Trains a scikit-learn Pipeline with:
-  - TF-IDF character n-gram features (1-3 grams, max 5000 features)
-  - 20 linguistic heuristic features (Japanese markers, character patterns, etc.)
-  - Logistic Regression classifier with balanced class weights
-- Saves the trained model as `data/chinese_japanese_classifier.joblib`
-- Achieves 99.5% accuracy on test data
-
-**Dependencies**:
-- scikit-learn, numpy, scipy, joblib
-- `sinonym.ml_model_components.EnhancedHeuristicFlags` (custom feature extractor)
-
-**Output**: 
-- `data/chinese_japanese_classifier.skops` - The trained model used in production
-- `data/model_features.json` - Feature vocabulary metadata
-
-**Usage**:
 ```bash
-python scripts/train_ml_classifier_for_chinese_vs_japanese.py
+uv run python scripts/check_test_status.py
 ```
 
----
-
-### 2. `generate_chinese_name_corpus_data.py` ❌ ABANDONED
-**Purpose**: Generate training data for an ML-based name parsing disambiguation model.
-
-**Status**: ❌ **Historical - Abandoned effort**
+### `benchmark_stable.py`
+Median-based performance benchmark gate. Spawns isolated worker subprocesses (fresh process per run) with controlled `PYTHONHASHSEED` and thread environment variables. Reports mean/median/stddev/CV of throughput and supports a `--min-median-names-per-sec` gate that exits non-zero on failure.
 
-**What it was supposed to do**:
-- Download 200K Chinese names from the Chinese Names Corpus
-- Romanize Chinese names to pinyin (without tones)
-- Generate all possible surname/given name parse candidates
-- Create ground truth labels based on Chinese name structure rules
-- Extract features for each parse (log probabilities, ranks, ratios)
-- Save training data for an ML model to choose the best parse
-
-**Why it was abandoned**:
-- The ML parsing model "didn't work well" (as noted in code comments)
-- The rule-based parsing system in `sinonym.services.parsing` works sufficiently well
-- The complexity of training data generation and feature engineering didn't justify the marginal improvements
-
-**Output files (still present but unused)**:
-- `data/ml_parsing_training_data.json` - 199K training examples with parse candidates
-- `data/ml_parsing_metadata.json` - Statistics about the training data
-
----
-
-### 3. `generate_acl_data.py` ❌ ABANDONED
-**Purpose**: Process ACL 2025 conference authors to create additional training examples for the parsing model.
+```bash
+uv run python scripts/benchmark_stable.py --runs 5 --names 3000 --warmup 3000
+uv run python scripts/benchmark_stable.py --runs 7 --min-median-names-per-sec 5000
+```
 
-**Status**: ❌ **Historical - Part of abandoned ML parsing effort**
+### `profile_hotspots.py`
+Hotspot time-share profiler. Warms caches on deterministic test names, runs one `cProfile` pass, then reports top functions and modules ranked by internal time (`tottime`) share. Use `--sinonym-only` to filter out third-party/stdlib noise.
 
-**What it does**:
-- Loads author names from `data/acl_2025_authors.txt`
-- Uses the ChineseNameDetector to identify Chinese names
-- Converts ACL format names (Given Surname) to training examples
-- Generates parse candidates with features for ML training
+```bash
+uv run python scripts/profile_hotspots.py --names 3000 --warmup 3000 --sinonym-only
+```
 
-**Why it exists**:
-- Attempted to augment the ML parsing training data with real academic names
-- ACL authors represent a different distribution (romanized, Western ordering)
-- Was meant to improve the never-implemented parsing model
+### `profile_run.py`
+Quick single-process profiling script. Generates deterministic test names, warms caches, takes 5 pure timing measurements (no profiling overhead) for accurate throughput stats, then runs one `cProfile` pass for a top-25 function breakdown. Good for a fast sanity check during development.
 
-**Output**:
-- `data/acl_training_examples.json` - Training examples from ACL authors
-- Would have updated `ml_parsing_train_split.json` (file doesn't exist)
+```bash
+uv run python scripts/profile_run.py
+```
 
----
+### `profile_threaded.py`
+Multi-threaded performance and thread-safety validation. Tests `normalize_name` throughput across 1/2/4/8 threads using a shared `ChineseNameDetector` instance, verifies that multi-threaded results are identical to single-threaded results, and reports speedup and CV per thread count.
 
-## Summary
+```bash
+uv run python scripts/profile_threaded.py
+```
 
-### Active Scripts
-- **`train_ml_classifier_for_chinese_vs_japanese.py`** - The only actively used script that trains the Chinese vs Japanese classifier
+### `profile_multiprocess.py`
+Persistent multi-process throughput and parity check. Compares single-process throughput to a spawn-based persistent process pool, verifies that outputs are identical for a deterministic workload, and reports median speedup.
 
-### Historical/Abandoned Scripts  
-- **`generate_chinese_name_corpus_data.py`** - Abandoned ML parsing model data generation
-- **`generate_acl_data.py`** - Abandoned ACL author data processing for ML parsing
+```bash
+uv run python scripts/profile_multiprocess.py --names 12000 --warmup 3000 --runs 3 --workers 6 --chunk-size 64
+```
 
-## Data Flow
+### `train_ml_classifier_for_chinese_vs_japanese.py`
+Trains the Chinese-vs-Japanese name classifier used in production. Downloads Chinese (~1.2M) and Japanese (~180K) name corpora, trains a scikit-learn pipeline (TF-IDF character n-grams + 20 linguistic heuristic features + logistic regression), and saves the model to `data/chinese_japanese_classifier.skops`.
 
+```bash
+uv run python scripts/train_ml_classifier_for_chinese_vs_japanese.py
 ```
-Chinese/Japanese Corpora (GitHub)
-           ↓
-train_ml_classifier_for_chinese_vs_japanese.py
-           ↓
-chinese_japanese_classifier.joblib ← [ACTIVELY USED BY LIBRARY]
-           +
-model_features.json
-
-
-Chinese Names Corpus (GitHub)
-           ↓
-generate_chinese_name_corpus_data.py
-           ↓
-ml_parsing_training_data.json ← [ABANDONED, NOT USED]
-           +
-ml_parsing_metadata.json
-
-
-ACL 2025 Authors
-           ↓
-generate_acl_data.py
-           ↓
-acl_training_examples.json ← [ABANDONED, NOT USED]
-```
 
-## Notes
+## Abandoned Scripts
+
+These remain for historical reference but are not used by the library. The rule-based parser in `sinonym.services.parsing` replaced the ML approach.
 
-The scripts demonstrate two different ML efforts:
-1. **Successful**: Chinese vs Japanese classification for names written in Chinese characters
-2. **Abandoned**: ML-based parsing disambiguation to choose between multiple valid name parses
+### `generate_chinese_name_corpus_data.py`
+Was intended to generate training data for an ML-based name parsing disambiguation model. Downloads 200K Chinese names, romanizes them, generates all possible surname/given-name parses, and creates labeled training examples. The ML parsing model did not outperform the rule-based system.
 
-The abandoned parsing model efforts remain in the codebase for historical reference but are not integrated into the library. The rule-based parsing in `sinonym.services.parsing.NameParsingService` handles name parsing instead.
+### `generate_acl_data.py`
+Supplementary data generator for the abandoned ML parsing effort. Processes ACL 2025 conference author names to create additional training examples in a different distribution (romanized, Western ordering).