CNN_PROJ_KU_AR

Kurdish-vs-Arabic word-image classification project with CNN-based models, projection features, contextual postprocessing, and saved local/external evaluation artifacts.

Start Here

If a supervisor or reviewer opens this repository, the best reading order is:

docs/SUPERVISOR_GUIDE.md
docs/model_architectures.md
outputs/ann_v1_2class_01/all/report_all.md
outputs/kparts_sweep_evarest_real_eval_fixed/ARTICLE_REPORT.md

The supervisor guide explains:

where the project starts from annotated JSON data,
which file is used for what,
where the legacy K=3,4 stage ends,
where the newer K=2..10 sweep begins,
and where the final local and EvArEST results are stored.

What This Repo Contains

src/: reusable training, prediction, preprocessing, and model code
scripts/: local dataset preparation and external dataset helpers
experiments/: evaluation, KNN/SVM analysis, article reports, and K=2..10 sweep drivers
data/anotatd_lines/: local annotated source images and JSON files
outputs/checkpoints/: public best checkpoints
outputs/training/: per-run configs, histories, summaries, and checkpoints
outputs/ann_v1_2class_01/: legacy local annotated evaluation outputs
outputs/kparts_sweep_evarest_real_eval_fixed/: saved EvArEST evaluation outputs

Project Phases

Legacy phase

The earlier version mainly compared manually chosen projection settings, especially K_parts=3 and K_parts=4, on your local annotated data. The main saved outputs for this phase are in outputs/ann_v1_2class_01/.

New phase

The newer version changes the goal from fixed K=3,4 comparison to a systematic search over K_parts=2..10. The main saved outputs for this phase are the run folders in outputs/training/ and the external evaluation reports in outputs/kparts_sweep_evarest_real_eval_fixed/.

Workflow In One View

Local annotations start in data/anotatd_lines/outs/*.json.
scripts/prepare_local_datasets.py converts annotations into cropped class-folder datasets.
src/train.py trains models and writes run artifacts to outputs/training/.
Legacy fixed-K models are evaluated on local annotated data.
The newer K=2..10 sweep is driven by experiments/run_kparts_sweep.py.
KNN and SVM are applied as contextual postprocessing during evaluation.
External Arabic-only transfer is tested on EvArEST through experiments/evaluate_kparts_sweep_real_data.py.

Important Files

src/config.py: runtime defaults for data paths and model settings
src/dataset.py: class-folder loading, preprocessing, and train/val/test splitting
src/train.py: main training entrypoint
scripts/prepare_local_datasets.py: local annotation JSON to cropped training folders
experiments/run_kparts_sweep.py: K_parts=2..10 sweep driver
experiments/report_kparts_sweep.py: sweep summary generator
experiments/evaluate_ann_v1_2class.py: legacy local annotated evaluation
experiments/evaluate_ann_v1_2class_svm_01.py: local annotated evaluation with SVM postprocessing
experiments/evaluate_kparts_sweep_real_data.py: external real-data evaluation for sweep runs

Reproducibility Note

This repository preserves code, saved checkpoints, saved run configurations, saved histories, and saved evaluation outputs. However, data/prepared/ is not currently present in this checkout, so exact retraining requires rebuilding or restoring the prepared cropped datasets first.

Minimal Setup

These commands assume PowerShell from the repo root.

C:\ProgramData\miniconda3\python.exe -m venv .venv
.\.venv\Scripts\python.exe -m pip install --upgrade pip
.\.venv\Scripts\python.exe -m pip install -r requirements.txt
.\.venv\Scripts\python.exe scripts\prepare_local_datasets.py

Example Commands

Run a prediction with a saved checkpoint:

.\.venv\Scripts\python.exe src\predict.py `
  --img "data\prepared\3class\arabic\C_F0_S26_W2__ann_00001.png" `
  --model projections `
  --num-classes 3

Run batch prediction on a folder:

.\.venv\Scripts\python.exe src\predict_batch.py `
  --dir data\anotatd_lines\outs\2_classes\v1_2classes\images `
  --out outputs\tst_results `
  --model projections `
  --num-classes 2

Train on the prepared local 2-class dataset:

.\.venv\Scripts\python.exe src\train.py --model projections

Run the newer K=2..10 sweep:

.\.venv\Scripts\python.exe experiments\run_kparts_sweep.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
best/report		best/report
chatting		chatting
data		data
docs		docs
experiments		experiments
outputs		outputs
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
find_duplicates.py		find_duplicates.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CNN_PROJ_KU_AR

Start Here

What This Repo Contains

Project Phases

Legacy phase

New phase

Workflow In One View

Important Files

Reproducibility Note

Minimal Setup

Example Commands

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CNN_PROJ_KU_AR

Start Here

What This Repo Contains

Project Phases

Legacy phase

New phase

Workflow In One View

Important Files

Reproducibility Note

Minimal Setup

Example Commands

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages