Kurdish-vs-Arabic word-image classification project with CNN-based models, projection features, contextual postprocessing, and saved local/external evaluation artifacts.
If a supervisor or reviewer opens this repository, the best reading order is:
docs/SUPERVISOR_GUIDE.mddocs/model_architectures.mdoutputs/ann_v1_2class_01/all/report_all.mdoutputs/kparts_sweep_evarest_real_eval_fixed/ARTICLE_REPORT.md
The supervisor guide explains:
- where the project starts from annotated JSON data,
- which file is used for what,
- where the legacy
K=3,4stage ends, - where the newer
K=2..10sweep begins, - and where the final local and EvArEST results are stored.
src/: reusable training, prediction, preprocessing, and model codescripts/: local dataset preparation and external dataset helpersexperiments/: evaluation, KNN/SVM analysis, article reports, andK=2..10sweep driversdata/anotatd_lines/: local annotated source images and JSON filesoutputs/checkpoints/: public best checkpointsoutputs/training/: per-run configs, histories, summaries, and checkpointsoutputs/ann_v1_2class_01/: legacy local annotated evaluation outputsoutputs/kparts_sweep_evarest_real_eval_fixed/: saved EvArEST evaluation outputs
The earlier version mainly compared manually chosen projection settings,
especially K_parts=3 and K_parts=4, on your local annotated data. The main
saved outputs for this phase are in outputs/ann_v1_2class_01/.
The newer version changes the goal from fixed K=3,4 comparison to a
systematic search over K_parts=2..10. The main saved outputs for this phase
are the run folders in outputs/training/ and the external evaluation reports
in outputs/kparts_sweep_evarest_real_eval_fixed/.
- Local annotations start in
data/anotatd_lines/outs/*.json. scripts/prepare_local_datasets.pyconverts annotations into cropped class-folder datasets.src/train.pytrains models and writes run artifacts tooutputs/training/.- Legacy fixed-
Kmodels are evaluated on local annotated data. - The newer
K=2..10sweep is driven byexperiments/run_kparts_sweep.py. - KNN and SVM are applied as contextual postprocessing during evaluation.
- External Arabic-only transfer is tested on EvArEST through
experiments/evaluate_kparts_sweep_real_data.py.
src/config.py: runtime defaults for data paths and model settingssrc/dataset.py: class-folder loading, preprocessing, and train/val/test splittingsrc/train.py: main training entrypointscripts/prepare_local_datasets.py: local annotation JSON to cropped training foldersexperiments/run_kparts_sweep.py:K_parts=2..10sweep driverexperiments/report_kparts_sweep.py: sweep summary generatorexperiments/evaluate_ann_v1_2class.py: legacy local annotated evaluationexperiments/evaluate_ann_v1_2class_svm_01.py: local annotated evaluation with SVM postprocessingexperiments/evaluate_kparts_sweep_real_data.py: external real-data evaluation for sweep runs
This repository preserves code, saved checkpoints, saved run configurations,
saved histories, and saved evaluation outputs. However, data/prepared/ is not
currently present in this checkout, so exact retraining requires rebuilding or
restoring the prepared cropped datasets first.
These commands assume PowerShell from the repo root.
C:\ProgramData\miniconda3\python.exe -m venv .venv
.\.venv\Scripts\python.exe -m pip install --upgrade pip
.\.venv\Scripts\python.exe -m pip install -r requirements.txt
.\.venv\Scripts\python.exe scripts\prepare_local_datasets.pyRun a prediction with a saved checkpoint:
.\.venv\Scripts\python.exe src\predict.py `
--img "data\prepared\3class\arabic\C_F0_S26_W2__ann_00001.png" `
--model projections `
--num-classes 3Run batch prediction on a folder:
.\.venv\Scripts\python.exe src\predict_batch.py `
--dir data\anotatd_lines\outs\2_classes\v1_2classes\images `
--out outputs\tst_results `
--model projections `
--num-classes 2Train on the prepared local 2-class dataset:
.\.venv\Scripts\python.exe src\train.py --model projectionsRun the newer K=2..10 sweep:
.\.venv\Scripts\python.exe experiments\run_kparts_sweep.py