Skip to content

hiwakurdy/CNN_PROJ_Ku_Ar_v2

Repository files navigation

# CNN_PROJ_KU_AR ## Printed Kurdish/Arabic Script Classification This workspace refactors the original repository into a reproducible experiment layout for printed Kurdish vs Arabic script classification. ### Environment ```powershell conda activate spade ``` ### Default synthetic dataset roots - Kurdish: `E:\DSs\MLP\code\TRDG\data\text\350k\imgs\ku_350` - Arabic: `E:\DSs\MLP\code\TRDG\data\text\350k\imgs\ar_350` These are already wired into `configs/publication/data.yaml` and `experiments/configs/baseline.yaml`. ### Core commands Train the baseline and export figures/tables: ```powershell python scripts\train_baseline.py --config experiments\configs\baseline.yaml ``` Run the full nested CV protocol: ```powershell python scripts\run_full_cv.py --config experiments\configs\baseline.yaml ``` Run every ablation suite: ```powershell python scripts\run_all_ablations.py ``` Train and evaluate all model families plus the baseline's combined CNN+KNN/SVM/RF summaries: ```powershell python scripts\run_all_models.py --config experiments\configs\all_models.yaml ``` Run the full all-model suite and then every ablation suite: ```powershell python scripts\run_everything.py ``` If you want the baseline command to run CV first and then use the recommended hyperparameters for the final held-out test: ```powershell python scripts\train_baseline.py --config experiments\configs\baseline.yaml --run-cv-first ``` ### What is implemented - Fixed held-out test split with no train/validation/test leakage. - Stratified 5-fold CV on the training portion only. - Synthetic TRDG loader plus KHATT and IFN/ENIT adapter modules. - OCR augmentations: resize/normalize, blur, rotation, perspective, affine, optional Mixup. - CNN baseline with `32 -> 64 -> 128` conv blocks and configurable `GroupNorm`, `BatchNorm`, or `LayerNorm`-style normalization. - Embedding baselines on the 128-d penultimate representation: CNN head, KNN, SVM, and RandomForest. - Automatic metrics, summaries, plots, tables, checkpoints, and paper-ready placeholders. ### Main layout - `src/data/datasets.py` - `src/data/splits.py` - `src/data/samplers.py` - `src/data/augmentations.py` - `src/data/loaders/` - `src/models/` - `src/training/` - `scripts/run_all_models.py` - `scripts/run_everything.py` - `src/postprocess/` - `src/utils/` - `experiments/configs/` - `outputs/` - `best/report/publication/` ### Notes - The `synthetic_real_mix` ablation is wired for KHATT and IFN/ENIT, but you still need to replace the placeholder real dataset roots in `experiments/configs/synthetic_real_mix.yaml` if those datasets live somewhere else on your machine. - Legacy entrypoints from the earlier refactor remain in place where practical, but the new commands above are the intended workflow. # CNN_PROJ_Ku_Ar_v2

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors