By Ali Hakim Taşkıran and Ege Aybars Bozkurt
This repository contains a compact, interpretable language model based on a deep Residual GRU architecture, trained on the Wikitext dataset. It includes a real-time interactive GUI (app/) for text generation, designed for simplicity, transparency, and local execution—no internet or cloud required after setup.
- Architecture: 12-layer Residual GRU with embedding and hidden dimensions of 512 and 1024, respectively.
- Parameters: ~120M (lightweight compared to transformer-based LLMs).
- Training Data: Wikitext-103 (Wikipedia-derived text).
- Features:
- Residual connections between GRU layers for improved gradient flow.
- Dropout and layer-wise regularization.
- Log loss and log(log()) loss training in order to reduce unigrm confidence and amplify gradients associated with uncommon tokens.
- Output: Autoregressive token-by-token generation with streaming inference.
- File:
model.ptcontains the final trained weights (PyTorch state dict). - Tokenizer: Byte-level BPE via Hugging Face
tokenizers, serialized aswikitext_tokenizer.pkl.
🔍 Why GRU?
This model prioritizes architectural simplicity, debuggability, and low-resource deployment—ideal for research, education, or edge devices where transformers are overkill.
- Python ≥ 3.8
- Packages:
torch,tokenizers,tkinter
Install dependencies:
pip install torch tokenizers✅ On Ubuntu/Debian, ensure Tkinter is available:
sudo apt install python3-tk
The model checkpoint is not included in the repo to avoid duplication, but you can download it from:
Save it directly into the app/ folder.
You can also download it via command line (requires gdown):
cd app
pip install gdown
gdown "1sZwhT6fII8g_-DMkG7AKkrPWlVQQ3mqX" -O model.pyFrom the app/ directory:
cd app
python infer-GUI.pyThe interface will launch with:
- Device selector (CPU / CUDA)
- Prompt input box
- Controls for temperature, top-k, max tokens, and seed
- Real-time streaming output
Type a prompt like "Quantum computing enables..." and click ▶ Generate.
- Code (
infer-GUI.py): MIT License (feel free to use/modify). - Model weights (
model.pt): For research and personal use only. - Tokenizer: Derived from Wikitext; redistribution follows original dataset terms.
⚠️ Do not redistributemodel.ptwithout explicit permission.
- Inspired by classical RNN language modeling (Mikolov et al.)
- Tokenizer built with Hugging Face
tokenizers - GUI powered by Python’s built-in
tkinterfor zero-dependency deployment