GitHub - hiwakurdy/CNN_PROJ_Ku_Ar_v2

# CNN_PROJ_KU_AR ## Printed Kurdish/Arabic Script Classification This workspace refactors the original repository into a reproducible experiment layout for printed Kurdish vs Arabic script classification. ### Environment ```powershell conda activate spade ``` ### Default synthetic dataset roots - Kurdish: `E:\DSs\MLP\code\TRDG\data\text\350k\imgs\ku_350` - Arabic: `E:\DSs\MLP\code\TRDG\data\text\350k\imgs\ar_350` These are already wired into `configs/publication/data.yaml` and `experiments/configs/baseline.yaml`. ### Core commands Train the baseline and export figures/tables: ```powershell python scripts\train_baseline.py --config experiments\configs\baseline.yaml ``` Run the full nested CV protocol: ```powershell python scripts\run_full_cv.py --config experiments\configs\baseline.yaml ``` Run every ablation suite: ```powershell python scripts\run_all_ablations.py ``` Train and evaluate all model families plus the baseline's combined CNN+KNN/SVM/RF summaries: ```powershell python scripts\run_all_models.py --config experiments\configs\all_models.yaml ``` Run the full all-model suite and then every ablation suite: ```powershell python scripts\run_everything.py ``` If you want the baseline command to run CV first and then use the recommended hyperparameters for the final held-out test: ```powershell python scripts\train_baseline.py --config experiments\configs\baseline.yaml --run-cv-first ``` ### What is implemented - Fixed held-out test split with no train/validation/test leakage. - Stratified 5-fold CV on the training portion only. - Synthetic TRDG loader plus KHATT and IFN/ENIT adapter modules. - OCR augmentations: resize/normalize, blur, rotation, perspective, affine, optional Mixup. - CNN baseline with `32 -> 64 -> 128` conv blocks and configurable `GroupNorm`, `BatchNorm`, or `LayerNorm`-style normalization. - Embedding baselines on the 128-d penultimate representation: CNN head, KNN, SVM, and RandomForest. - Automatic metrics, summaries, plots, tables, checkpoints, and paper-ready placeholders. ### Main layout - `src/data/datasets.py` - `src/data/splits.py` - `src/data/samplers.py` - `src/data/augmentations.py` - `src/data/loaders/` - `src/models/` - `src/training/` - `scripts/run_all_models.py` - `scripts/run_everything.py` - `src/postprocess/` - `src/utils/` - `experiments/configs/` - `outputs/` - `best/report/publication/` ### Notes - The `synthetic_real_mix` ablation is wired for KHATT and IFN/ENIT, but you still need to replace the placeholder real dataset roots in `experiments/configs/synthetic_real_mix.yaml` if those datasets live somewhere else on your machine. - Legacy entrypoints from the earlier refactor remain in place where practical, but the new commands above are the intended workflow. # CNN_PROJ_Ku_Ar_v2

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.tmp_tests		.tmp_tests
best/report		best/report
chatting		chatting
configs		configs
data/smoke_pub		data/smoke_pub
docs		docs
experiments		experiments
outputs		outputs
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
TODO_PUBLICATION.md		TODO_PUBLICATION.md
find_duplicates.py		find_duplicates.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages