Skip to content

A lightweight machine learning model for gender prediction based on Rwandan names using character-level n-gram features and logistic regression.

Notifications You must be signed in to change notification settings

benax-rw/RwandaNameGenderModel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RwandaNameGenderModel

RwandaNameGenderModel is a machine learning model that predicts gender based on Rwandan names — whether a first name, surname, or both in any order. It uses a character-level n-gram approach with a logistic regression classifier to provide fast, interpretable, and highly accurate predictions — achieving 96%+ accuracy on both validation and test sets.


🧠 Model Overview

  • Type: Classic ML (Logistic Regression)
  • Input: Rwandan name (flexible: single or full name)
  • Vectorization: Character-level n-grams (2–3 chars)
  • Framework: scikit-learn
  • Training Set: 66,735 names (out of 83,419)
  • Validation/Test Accuracy: ~96.6%

📁 Project Structure

RwandaNameGenderModel/
├── dataset/
│   └── rwandan_names.csv
├── model/
│   ├── logistic_model.joblib
│   └── vectorizer.joblib
├── logs/
│   └── metrics_log.txt
├── train.py
├── inference.py
├── README.md
└── requirements.txt

🚀 Quickstart

1. Install requirements

pip install -r requirements.txt

2. Train the model

python train.py

3. Predict gender from a name using script

Run interactive inference with:

python inference.py

4. Predict gender from a name using Python code

from joblib import load

model = load("model/logistic_model.joblib")
vectorizer = load("model/vectorizer.joblib")

def predict_gender(name):
    X = vectorizer.transform([name])
    return model.predict(X)[0]

# Flexible input: first name, surname, or both (any order)
predict_gender("Gabriel")                 # Output: "male"
predict_gender("Baziramwabo")             # Output: "male"
predict_gender("Baziramwabo Gabriel")     # Output: "male"
predict_gender("Gabriel Baziramwabo")     # Output: "male"

📈 Performance

Dataset Accuracy Precision Recall F1-Score
Validation 96.72% 96.90% 96.53% 96.72%
Test 96.64% 96.94% 96.34% 96.64%

Metrics are logged in both logs/metrics_log.txt and TensorBoard format.


🌍 Use Cases

  • Demographic analysis
  • Smart form processing
  • Voice assistant personalization
  • NLP preprocessing for Rwandan corpora

🛡️ Ethical Note

This model predicts binary gender based on patterns in names and may not reflect self-identified gender. It should not be used in sensitive contexts without consent.


📄 License

This project is maintained by Gabriel Baziramwabo and is open for research and educational use. For commercial use, please contact the author.


🤝 Contributing

We welcome improvements and multilingual extensions. Fork this repo, improve, and submit a PR!


🔗 Links

About

A lightweight machine learning model for gender prediction based on Rwandan names using character-level n-gram features and logistic regression.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages