RwandaNameGenderModel is a machine learning model that predicts gender based on Rwandan names — whether a first name, surname, or both in any order. It uses a character-level n-gram approach with a logistic regression classifier to provide fast, interpretable, and highly accurate predictions — achieving 96%+ accuracy on both validation and test sets.
- Type: Classic ML (Logistic Regression)
- Input: Rwandan name (flexible: single or full name)
- Vectorization: Character-level n-grams (2–3 chars)
- Framework: scikit-learn
- Training Set: 66,735 names (out of 83,419)
- Validation/Test Accuracy: ~96.6%
RwandaNameGenderModel/
├── dataset/
│ └── rwandan_names.csv
├── model/
│ ├── logistic_model.joblib
│ └── vectorizer.joblib
├── logs/
│ └── metrics_log.txt
├── train.py
├── inference.py
├── README.md
└── requirements.txt
pip install -r requirements.txt
python train.py
Run interactive inference with:
python inference.py
from joblib import load
model = load("model/logistic_model.joblib")
vectorizer = load("model/vectorizer.joblib")
def predict_gender(name):
X = vectorizer.transform([name])
return model.predict(X)[0]
# Flexible input: first name, surname, or both (any order)
predict_gender("Gabriel") # Output: "male"
predict_gender("Baziramwabo") # Output: "male"
predict_gender("Baziramwabo Gabriel") # Output: "male"
predict_gender("Gabriel Baziramwabo") # Output: "male"
Dataset | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
Validation | 96.72% | 96.90% | 96.53% | 96.72% |
Test | 96.64% | 96.94% | 96.34% | 96.64% |
Metrics are logged in both logs/metrics_log.txt
and TensorBoard format.
- Demographic analysis
- Smart form processing
- Voice assistant personalization
- NLP preprocessing for Rwandan corpora
This model predicts binary gender based on patterns in names and may not reflect self-identified gender. It should not be used in sensitive contexts without consent.
This project is maintained by Gabriel Baziramwabo and is open for research and educational use. For commercial use, please contact the author.
We welcome improvements and multilingual extensions. Fork this repo, improve, and submit a PR!