Phase 2a: YOLO Detection Foundation — full implementation

# Phase 2a: YOLO Detection Foundation Implementation Plan

> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.

**Goal:** Replace all OpenCV template-matching in `bot/vision.py` with a YOLOv11 model that detects every game element — buildings, defenses, hero pads, UI buttons, and screen state indicators — while keeping the existing public API unchanged so no callers need to change.

**Architecture:** A new `bot/detector.py` module wraps Ultralytics YOLO inference behind a clean `Detector` class with a `predict(frame) → list[Detection]` interface. `bot/vision.py` is rewritten to use this detector while keeping its public functions (`find_button`, `detect_screen_state`, `get_troop_slots`, `find_popup`, `validate_critical_templates`) identical in signature. Digit OCR (for loot/resource reading) stays unchanged — YOLO handles detection, template matching handles digit recognition.

**Tech Stack:** Ultralytics YOLOv11 (`ultralytics>=8.3`), HuggingFace `datasets` library for public CoC dataset, Roboflow (web UI, free tier) for manual labeling of UI classes, Python 3.13, pytest

---

## File Structure

### New files
- `bot/detector.py` — `Detection` dataclass + `Detector` class wrapping YOLO inference
- `training/download_dataset.py` — downloads `keremberke/clash-of-clans-object-detection` from HuggingFace, converts COCO→YOLOv11 format
- `training/train.py` — fine-tunes YOLOv11n on a dataset YAML, copies best weights to `models/coc.pt`
- `training/capture_frames.py` — saves screenshots + state metadata during live bot runs for building the UI class dataset
- `training/merge_datasets.py` — merges public buildings dataset + manually labeled UI dataset into `datasets/full/`
- `models/.gitkeep` — placeholder so the directory is committed (weights are gitignored)
- `tests/test_detector.py` — unit tests for `bot/detector.py`
- `tests/test_vision_yolo.py` — unit tests for rewritten `bot/vision.py`

### Modified files
- `requirements.txt` — add `ultralytics>=8.3`, `datasets>=2.0`
- `.gitignore` — add `models/*.pt`, `datasets/`, `runs/`
- `bot/settings.py` — add `yolo_model_path`, `yolo_confidence_threshold` defaults
- `bot/vision.py` — full rewrite of internals (public API unchanged)

---

## Task 1: Dependencies + `.gitignore`

**Files:**
- Modify: `requirements.txt`
- Modify: `.gitignore`

- [ ] **Step 1: Add new dependencies to requirements.txt**

```text
opencv-python>=4.8.0
numpy>=1.20.0
pytesseract>=0.3.10
Pillow>=9.0.0
PySide6>=6.6.0,<6.11
packaging>=21.0
ultralytics>=8.3
datasets>=2.0
```

- [ ] **Step 2: Add model and training artifacts to `.gitignore`**

Append to `.gitignore`:
```gitignore
# YOLO model weights (large binaries — download/train locally)
models/*.pt

# Training datasets (downloaded/generated — do not commit)
datasets/

# Ultralytics training run outputs
runs/
```

- [ ] **Step 3: Install new dependencies**

```bash
.venv/bin/pip install ultralytics>=8.3 "datasets>=2.0"
```

Expected: installs ultralytics (pulls torch), datasets, huggingface-hub. Takes 1-3 minutes.

- [ ] **Step 4: Verify ultralytics works**

```bash
.venv/bin/python -c "from ultralytics import YOLO; print('ultralytics OK')"
```

Expected: `ultralytics OK` (may print version info).

- [ ] **Step 5: Create `models/` directory with placeholder**

```bash
mkdir -p models && touch models/.gitkeep
```

- [ ] **Step 6: Commit**

```bash
git add requirements.txt .gitignore models/.gitkeep
git commit -m "feat: add ultralytics + datasets dependencies for YOLO phase 2a"
```

---

## Task 2: Settings — add YOLO keys

**Files:**
- Modify: `bot/settings.py` (DEFAULTS dict, around line 146)

- [ ] **Step 1: Add YOLO settings to DEFAULTS in `bot/settings.py`**

In the DEFAULTS dict, after the `"stream_buffer_size": 60,` line, add:

```python
    # YOLO detection model
    "yolo_model_path": "models/coc.pt",
    "yolo_confidence_threshold": 0.45,
```

- [ ] **Step 2: Verify settings loads without error**

```bash
.venv/bin/python -c "from bot.settings import Settings; s = Settings(); print(s.get('yolo_model_path'))"
```

Expected: `models/coc.pt`

- [ ] **Step 3: Commit**

```bash
git add bot/settings.py
git commit -m "feat: add yolo_model_path and yolo_confidence_threshold settings"
```

---

## Task 3: `bot/detector.py` — Detection + Detector

**Files:**
- Create: `bot/detector.py`
- Test: `tests/test_detector.py`

- [ ] **Step 1: Write the failing tests first**

Create `tests/test_detector.py`:

```python
"""Unit tests for bot/detector.py — mocks YOLO so no model file is needed."""
import numpy as np
import pytest
from unittest.mock import MagicMock, patch


def _make_frame(h=1440, w=2560):
    return np.zeros((h, w, 3), dtype=np.uint8)


def _mock_yolo_result(cls_id: int, cls_name: str, conf: float, xyxy: list):
    """Build a minimal mock YOLO result object."""
    mock_box = MagicMock()
    mock_box.cls = [cls_id]
    mock_box.conf = [conf]
    mock_box.xyxy = [np.array(xyxy, dtype=float)]

    mock_result = MagicMock()
    mock_result.boxes = [mock_box]
    mock_result.names = {cls_id: cls_name}
    return mock_result


class TestDetection:
    def test_center_midpoint(self):
        from bot.detector import Detection
        d = Detection(cls="btn_attack", conf=0.9, x1=100, y1=200, x2=200, y2=300)
        assert d.center == (150, 250)

    def test_area(self):
        from bot.detector import Detection
        d = Detection(cls="btn_attack", conf=0.9, x1=0, y1=0, x2=100, y2=50)
        assert d.area == 5000

    def test_bbox_tuple(self):
        from bot.detector import Detection
        d = Detection(cls="x", conf=0.5, x1=10, y1=20, x2=30, y2=40)
        assert d.bbox == (10, 20, 30, 40)


class TestDetectorPredict:
    def test_predict_returns_detection(self):
        from bot.detector import Detector, Detection

        mock_result = _mock_yolo_result(0, "btn_attack", 0.9, [100, 200, 200, 300])
        mock_model = MagicMock(return_value=[mock_result])
        mock_model.names = {0: "btn_attack"}

        with patch("bot.detector.YOLO", return_value=mock_model):
            detector = Detector("fake.pt", confidence=0.5)

        dets = detector.predict(_make_frame())
        assert len(dets) == 1
        assert dets[0].cls == "btn_attack"
        assert dets[0].conf == pytest.approx(0.9)
        assert dets[0].x1 == 100
        assert dets[0].y2 == 300

    def test_predict_empty_result(self):
        from bot.detector import Detector

        mock_result = MagicMock()
        mock_result.boxes = []
        mock_result.names = {}
        mock_model = MagicMock(return_value=[mock_result])

        with patch("bot.detector.YOLO", return_value=mock_model):
            detector = Detector("fake.pt")

        assert detector.predict(_make_frame()) == []

    def test_predict_multiple_detections(self):
        from bot.detector import Detector

        r1 = _mock_yolo_result(0, "btn_attack", 0.9, [0, 0, 100, 50])
        r2 = _mock_yolo_result(1, "btn_next_base", 0.8, [200, 200, 300, 250])

        # Two results in the same inference call
        mock_result_combined = MagicMock()
        box1 = MagicMock()
        box1.cls = [0]; box1.conf = [0.9]; box1.xyxy = [np.array([0, 0, 100, 50])]
        box2 = MagicMock()
        box2.cls = [1]; box2.conf = [0.8]; box2.xyxy = [np.array([200, 200, 300, 250])]
        mock_result_combined.boxes = [box1, box2]
        mock_result_combined.names = {0: "btn_attack", 1: "btn_next_base"}
        mock_model = MagicMock(return_value=[mock_result_combined])

        with patch("bot.detector.YOLO", return_value=mock_model):
            detector = Detector("fake.pt")

        dets = detector.predict(_make_frame())
        assert len(dets) == 2
        assert {d.cls for d in dets} == {"btn_attack", "btn_next_base"}


class TestDetectorFind:
    def _make_detector_with_dets(self, dets):
        from bot.detector import Detector
        detector = Detector.__new__(Detector)
        detector._confidence = 0.5
        detector._model = MagicMock()
        from unittest.mock import patch
        detector._predict_mock = dets
        return detector

    def test_find_returns_highest_confidence(self):
        from bot.detector import Detector, Detection
        detector = Detector.__new__(Detector)
        detector._confidence = 0.5
        det_low = Detection("btn_attack", 0.6, 0, 0, 10, 10)
        det_high = Detection("btn_attack", 0.9, 50, 50, 60, 60)
        det_other = Detection("btn_next_base", 0.95, 100, 100, 110, 110)

        with patch.object(detector, "predict", return_value=[det_low, det_high, det_other]):
            result = detector.find(_make_frame(), "btn_attack")

        assert result is det_high

    def test_find_returns_none_when_absent(self):
        from bot.detector import Detector, Detection
        detector = Detector.__new__(Detector)
        det = Detection("btn_attack", 0.9, 0, 0, 10, 10)

        with patch.object(detector, "predict", return_value=[det]):
            result = detector.find(_make_frame(), "btn_next_base")

        assert result is None

    def test_find_any_returns_first_match(self):
        from bot.detector import Detector, Detection
        detector = Detector.__new__(Detector)
        det1 = Detection("btn_okay", 0.8, 0, 0, 10, 10)
        det2 = Detection("btn_close", 0.85, 50, 50, 60, 60)

        with patch.object(detector, "predict", return_value=[det1, det2]):
            result = detector.find_any(_make_frame(), "btn_close", "btn_okay", "btn_later")

        # Should return highest-conf among the matching classes
        assert result is det2

    def test_find_all_filters_by_class(self):
        from bot.detector import Detector, Detection
        detector = Detector.__new__(Detector)
        dets = [
            Detection("troop_slot", 0.9, 0, 0, 50, 50),
            Detection("troop_slot", 0.85, 60, 0, 110, 50),
            Detection("btn_attack", 0.9, 200, 200, 250, 250),
        ]

        with patch.object(detector, "predict", return_value=dets):
            result = detector.find_all(_make_frame(), "troop_slot")

        assert len(result) == 2
        assert all(d.cls == "troop_slot" for d in result)
```

- [ ] **Step 2: Run tests to verify they fail**

```bash
.venv/bin/python -m pytest tests/test_detector.py -v 2>&1 | head -30
```

Expected: `ModuleNotFoundError: No module named 'bot.detector'`

- [ ] **Step 3: Create `bot/detector.py`**

```python
"""
YOLO-based game element detector.

Wraps Ultralytics YOLOv11 inference with a minimal interface:
  Detector.predict(frame)  → list[Detection]
  Detector.find(frame, cls)  → Detection | None
  Detector.find_any(frame, *classes)  → Detection | None
  Detector.find_all(frame, cls)  → list[Detection]
"""

import logging
import numpy as np
from dataclasses import dataclass

logger = logging.getLogger("coc.detector")

# Imported at module level so tests can patch it cleanly
from ultralytics import YOLO


@dataclass
class Detection:
    cls: str
    conf: float
    x1: int
    y1: int
    x2: int
    y2: int

    @property
    def center(self) -> tuple[int, int]:
        return ((self.x1 + self.x2) // 2, (self.y1 + self.y2) // 2)

    @property
    def area(self) -> int:
        return (self.x2 - self.x1) * (self.y2 - self.y1)

    @property
    def bbox(self) -> tuple[int, int, int, int]:
        return (self.x1, self.y1, self.x2, self.y2)


class Detector:
    """Load a YOLOv11 model and run inference on game screenshots."""

    def __init__(self, model_path: str, confidence: float = 0.45):
        logger.info("Loading YOLO model from %s", model_path)
        self._model = YOLO(model_path)
        self._confidence = confidence
        logger.info(
            "Model loaded — %d classes, first 5: %s",
            len(self._model.names),
            list(self._model.names.values())[:5],
        )

    def predict(self, frame: np.ndarray) -> list[Detection]:
        """Run inference on a BGR frame. Returns all detections above confidence."""
        results = self._model(frame, conf=self._confidence, verbose=False)
        detections: list[Detection] = []
        for r in results:
            if r.boxes is None:
                continue
            names = r.names
            for box in r.boxes:
                cls_id = int(box.cls[0])
                detections.append(Detection(
                    cls=names[cls_id],
                    conf=float(box.conf[0]),
                    x1=int(box.xyxy[0][0]),
                    y1=int(box.xyxy[0][1]),
                    x2=int(box.xyxy[0][2]),
                    y2=int(box.xyxy[0][3]),
                ))
        return detections

    def find(self, frame: np.ndarray, cls: str) -> "Detection | None":
        """Return the highest-confidence detection of a given class, or None."""
        matches = [d for d in self.predict(frame) if d.cls == cls]
        return max(matches, key=lambda d: d.conf) if matches else None

    def find_any(self, frame: np.ndarray, *classes: str) -> "Detection | None":
        """Return the highest-confidence detection among multiple classes."""
        matches = [d for d in self.predict(frame) if d.cls in classes]
        return max(matches, key=lambda d: d.conf) if matches else None

    def find_all(self, frame: np.ndarray, cls: str) -> list[Detection]:
        """Return all detections of a given class."""
        return [d for d in self.predict(frame) if d.cls == cls]
```

- [ ] **Step 4: Run tests to verify they pass**

```bash
.venv/bin/python -m pytest tests/test_detector.py -v
```

Expected: all 13 tests pass.

- [ ] **Step 5: Commit**

```bash
git add bot/detector.py tests/test_detector.py
git commit -m "feat: add Detector class wrapping YOLOv11 inference (bot/detector.py)"
```

---

## Task 4: Download public dataset

**Files:**
- Create: `training/download_dataset.py`

The public HuggingFace dataset (`keremberke/clash-of-clans-object-detection`) has 125 images, 16 building/defense classes, in COCO format. This task downloads it and converts to YOLOv11 TXT format.

- [ ] **Step 1: Create `training/download_dataset.py`**

```python
#!/usr/bin/env python3
"""
Download the public Clash of Clans dataset from HuggingFace and convert
from COCO JSON format to YOLOv11 TXT format.

Source: keremberke/clash-of-clans-object-detection (CC BY 4.0, 125 images)
16 classes: buildings and defensive structures.

Usage:
    python training/download_dataset.py
    python training/download_dataset.py --output datasets/public
"""

import argparse
from pathlib import Path


# Class list from the HuggingFace dataset (index = YOLO class ID)
PUBLIC_CLASSES = [
    "ad", "airsweeper", "bombtower", "canon", "clancastle",
    "eagle", "inferno", "kingpad", "mortar", "queenpad",
    "rcpad", "scattershot", "th13", "wardenpad", "wizztower", "xbow",
]


def download_and_convert(output_dir: str = "datasets/public") -> None:
    from datasets import load_dataset

    output = Path(output_dir)
    for split in ("train", "validation", "test"):
        (output / split / "images").mkdir(parents=True, exist_ok=True)
        (output / split / "labels").mkdir(parents=True, exist_ok=True)

    print("Downloading from HuggingFace (keremberke/clash-of-clans-object-detection)...")
    ds = load_dataset("keremberke/clash-of-clans-object-detection", name="full")

    split_map = {"train": "train", "validation": "validation", "test": "test"}
    for hf_split, out_split in split_map.items():
        split_data = ds[hf_split]
        print(f"  Converting {hf_split} ({len(split_data)} images)...")

        for idx, example in enumerate(split_data):
            img = example["image"]  # PIL Image
            w, h = img.size

            img_path = output / out_split / "images" / f"{idx:05d}.jpg"
            img.save(str(img_path), quality=95)

            annotations = example["objects"]
            lines = []
            for bbox, cat_id in zip(annotations["bbox"], annotations["category"]):
                # COCO bbox: [x_topleft, y_topleft, width, height] in pixels
                bx, by, bw, bh = bbox
                cx = (bx + bw / 2) / w
                cy = (by + bh / 2) / h
                nw = bw / w
                nh = bh / h
                lines.append(f"{cat_id} {cx:.6f} {cy:.6f} {nw:.6f} {nh:.6f}")

            lbl_path = output / out_split / "labels" / f"{idx:05d}.txt"
            lbl_path.write_text("\n".join(lines))

    # Write dataset.yaml for training
    yaml_path = output / "dataset.yaml"
    yaml_path.write_text(
        f"# Public CoC baseline dataset (buildings + defenses only)\n"
        f"# Source: keremberke/clash-of-clans-object-detection (HuggingFace, CC BY 4.0)\n\n"
        f"path: {output.resolve()}\n"
        f"train: train/images\n"
        f"val: validation/images\n"
        f"test: test/images\n\n"
        f"nc: {len(PUBLIC_CLASSES)}\n"
        f"names: {PUBLIC_CLASSES}\n"
    )

    total = sum(len(ds[s]) for s in split_map)
    print(f"\nDataset ready at {output}/")
    print(f"  Total: {total} images, {len(PUBLIC_CLASSES)} classes")
    print(f"  Classes: {PUBLIC_CLASSES}")
    print(f"  Config:  {yaml_path}")


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Download public CoC YOLO dataset")
    parser.add_argument("--output", default="datasets/public",
                        help="Output directory (default: datasets/public)")
    args = parser.parse_args()
    download_and_convert(args.output)
```

- [ ] **Step 2: Run the download**

```bash
.venv/bin/python training/download_dataset.py
```

Expected output (takes 2-5 minutes depending on connection):
```
Downloading from HuggingFace (keremberke/clash-of-clans-object-detection)...
  Converting train (88 images)...
  Converting validation (24 images)...
  Converting test (13 images)...

Dataset ready at datasets/public/
  Total: 125 images, 16 classes
  Classes: ['ad', 'airsweeper', ...]
  Config:  datasets/public/dataset.yaml
```

- [ ] **Step 3: Verify output structure**

```bash
ls datasets/public/train/images/ | head -5
ls datasets/public/train/labels/ | head -5
head -3 datasets/public/train/labels/00000.txt
```

Expected: image files `00000.jpg`..., label files `00000.txt`..., each label line like `3 0.512345 0.234567 0.123456 0.098765`

- [ ] **Step 4: Commit**

```bash
git add training/download_dataset.py
git commit -m "feat: add training/download_dataset.py for public CoC HuggingFace dataset"
```

---

## Task 5: Training script + train baseline model

**Files:**
- Create: `training/train.py`

- [ ] **Step 1: Create `training/train.py`**

```python
#!/usr/bin/env python3
"""
Fine-tune YOLOv11n on a Clash of Clans dataset.

Starts from the COCO-pretrained YOLOv11n weights, fine-tunes on the
provided dataset, and copies the best checkpoint to models/coc.pt.

Usage:
    # Baseline (public buildings only, 16 classes):
    python training/train.py --data datasets/public/dataset.yaml --epochs 50

    # Full model (all classes including UI buttons):
    python training/train.py --data datasets/full/dataset.yaml --epochs 100

    # Resume from last checkpoint:
    python training/train.py --data datasets/full/dataset.yaml --resume
"""

import argparse
import shutil
from pathlib import Path


def train(
    data_yaml: str,
    epochs: int = 50,
    model_size: str = "n",
    resume: bool = False,
    batch: int = 16,
    imgsz: int = 640,
) -> None:
    from ultralytics import YOLO

    last_ckpt = Path("runs/detect/coc/weights/last.pt")
    if resume and last_ckpt.exists():
        print(f"Resuming from {last_ckpt}")
        model = YOLO(str(last_ckpt))
    else:
        base = f"yolo11{model_size}.pt"
        print(f"Starting from {base} (COCO-pretrained)")
        model = YOLO(base)

    model.train(
        data=data_yaml,
        epochs=epochs,
        imgsz=imgsz,
        batch=batch,
        name="coc",
        project="runs/detect",
        patience=20,
        save=True,
        plots=True,
        exist_ok=True,
    )

    best = Path("runs/detect/coc/weights/best.pt")
    if best.exists():
        Path("models").mkdir(exist_ok=True)
        dest = Path("models/coc.pt")
        shutil.copy(best, dest)
        print(f"\nBest weights → {dest}")
    else:
        print("WARNING: best.pt not found — check runs/detect/coc/weights/")


if __name__ == "__main__":
    p = argparse.ArgumentParser()
    p.add_argument("--data", default="datasets/public/dataset.yaml")
    p.add_argument("--epochs", type=int, default=50)
    p.add_argument("--model", default="n", choices=["n", "s", "m"],
                   help="YOLOv11 size: n=nano, s=small, m=medium")
    p.add_argument("--batch", type=int, default=16)
    p.add_argument("--imgsz", type=int, default=640)
    p.add_argument("--resume", action="store_true")
    args = p.parse_args()
    train(args.data, args.epochs, args.model, args.resume, args.batch, args.imgsz)
```

- [ ] **Step 2: Train the baseline model (16 building classes)**

```bash
.venv/bin/python training/train.py \
    --data datasets/public/dataset.yaml \
    --epochs 50 \
    --model n
```

Expected: Ultralytics training output with per-epoch mAP. Finishes with:
```
Best weights → models/coc.pt
```
Takes ~10-20 minutes on CPU, ~3 minutes on Apple Silicon MPS.

- [ ] **Step 3: Verify baseline model detects buildings**

```bash
.venv/bin/python -c "
from bot.detector import Detector
import cv2, numpy as np

d = Detector('models/coc.pt')
# Test on a blank image — should return empty
blank = np.zeros((1440, 2560, 3), dtype=np.uint8)
dets = d.predict(blank)
print(f'Blank image: {len(dets)} detections (expected 0)')
print('Detector loaded OK')
"
```

Expected: `Blank image: 0 detections (expected 0)` and `Detector loaded OK`

- [ ] **Step 4: Commit training script**

```bash
git add training/train.py
git commit -m "feat: add training/train.py for YOLOv11 fine-tuning"
```

---

## Task 6: Data capture pipeline

**Files:**
- Create: `training/capture_frames.py`

This script runs alongside the bot, saving screenshots every N seconds with metadata. The saved screenshots will be labeled in Roboflow (Task 7) to add UI button and screen state classes.

- [ ] **Step 1: Create `training/capture_frames.py`**

```python
#!/usr/bin/env python3
"""
Capture screenshots during live bot runs for building the YOLO training dataset.

Saves each screenshot as a JPG + JSON metadata file. Screenshots should then
be uploaded to Roboflow (free tier) for manual annotation of UI classes
not covered by the public building dataset.

Two ways to use:
  1. Standalone: run alongside the bot in a separate terminal
     python training/capture_frames.py --interval 5 --output datasets/captured

  2. Programmatic: call capture_one() from key bot events
     from training.capture_frames import capture_one
     capture_one("scouting")  # saves one screenshot with hint label

Output per screenshot:
  datasets/captured/20260327_143201_123456_scouting.jpg
  datasets/captured/20260327_143201_123456_scouting.json  ← state + hint
"""

import os
import json
import time
import threading
import argparse
from pathlib import Path
from datetime import datetime

_output_dir = Path("datasets/captured")
_stop_event = threading.Event()


def capture_one(hint: str = "auto", output_dir: Path | None = None) -> Path | None:
    """Save one screenshot + metadata. Returns the image path, or None on error."""
    try:
        from bot.screen import screenshot
        from bot.vision import detect_screen_state
        import cv2

        out = output_dir or _output_dir
        out.mkdir(parents=True, exist_ok=True)

        img = screenshot()
        state = str(detect_screen_state(img))
        ts = datetime.now().strftime("%Y%m%d_%H%M%S_%f")
        stem = f"{ts}_{hint}"

        img_path = out / f"{stem}.jpg"
        cv2.imwrite(str(img_path), img, [cv2.IMWRITE_JPEG_QUALITY, 95])

        meta = {"timestamp": ts, "state": state, "hint": hint}
        (out / f"{stem}.json").write_text(json.dumps(meta, indent=2))

        return img_path
    except Exception as e:
        print(f"[capture] Error: {e}")
        return None


def _capture_loop(interval: float, output_dir: Path) -> None:
    print(f"[capture] Saving to {output_dir}/ every {interval}s  (Ctrl+C to stop)")
    count = 0
    while not _stop_event.is_set():
        path = capture_one("auto", output_dir)
        if path:
            count += 1
            print(f"[capture] #{count}  {path.name}")
        time.sleep(interval)


def run_standalone(interval: float = 5.0, output_dir: str = "datasets/captured") -> None:
    """Run capture loop in the foreground until Ctrl+C."""
    _stop_event.clear()
    out = Path(output_dir)
    try:
        _capture_loop(interval, out)
    except KeyboardInterrupt:
        print(f"\n[capture] Stopped. {len(list(out.glob('*.jpg')))} screenshots saved.")


if __name__ == "__main__":
    p = argparse.ArgumentParser(description="Capture bot screenshots for YOLO labeling")
    p.add_argument("--interval", type=float, default=5.0,
                   help="Seconds between captures (default: 5)")
    p.add_argument("--output", default="datasets/captured",
                   help="Output directory (default: datasets/captured)")
    args = p.parse_args()
    run_standalone(args.interval, args.output)
```

- [ ] **Step 2: Verify the script runs without import errors**

```bash
.venv/bin/python -c "import training.capture_frames; print('import OK')"
```

Expected: `import OK` (bot stream not started — that's fine, errors only on `capture_one()` calls).

- [ ] **Step 3: Run capture during a live bot session**

With the bot running (`./start.sh` → click Start), open a second terminal and run:

```bash
.venv/bin/python training/capture_frames.py --interval 5 --output datasets/captured
```

Let it run for **at least one full attack cycle** (village → scouting → battle → results → village). Stop with Ctrl+C.

Expected: 30-100 screenshots saved to `datasets/captured/`, covering all game states.

- [ ] **Step 4: Verify captures have good state coverage**

```bash
python -c "
import json
from pathlib import Path
from collections import Counter

meta_files = list(Path('datasets/captured').glob('*.json'))
states = Counter(json.loads(f.read_text())['state'] for f in meta_files)
for state, count in sorted(states.items()):
    print(f'  {state}: {count}')
print(f'Total: {sum(states.values())}')
"
```

Expected: coverage across `GameState.VILLAGE`, `GameState.SCOUTING`, `GameState.BATTLE_ACTIVE`, `GameState.RESULTS`. Aim for at least 20 images per state before proceeding.

- [ ] **Step 5: Commit**

```bash
git add training/capture_frames.py
git commit -m "feat: add training/capture_frames.py for live screenshot collection"
```

---

## Task 7: Label UI classes in Roboflow + export

**Files:** None (manual labeling step)

The public dataset covers buildings but has no UI buttons or screen state indicators. This task labels those classes on the captured screenshots.

**Classes to label** (18 new classes, on top of the 16 public building classes):

| Class | What it looks like |
|---|---|
| `btn_attack` | Green "Attack!" sword button (village screen, bottom-left) |
| `btn_find_match` | "Find a Match" button (attack menu) |
| `btn_start_battle` | Green "Battle!" button (army screen, bottom-right) |
| `btn_next_base` | "Next" button with gold coin cost (scouting screen, bottom-right) |
| `btn_return_home` | "Return Home" button (results screen) |
| `btn_end_battle` | Red "End Battle" / "Surrender" button |
| `btn_confirm` | Confirmation checkmark button (upgrade dialogs) |
| `btn_close` | X close button (popups) |
| `btn_okay` | "OK" button (popups) |
| `btn_later` | "Later" button (popups) |
| `hud_village` | The village resource bar (top of village screen — gold + elixir HUD) |
| `hud_scouting` | The scouting HUD overlay (loot values visible in top-left) |
| `hud_results` | Stars/results screen overlay |
| `hud_army` | The army selection screen HUD |
| `loot_gold` | Gold icon + value in scouting screen (top-left) |
| `loot_elixir` | Elixir icon + value in scouting screen (top-left) |
| `loot_gem` | Green gem cost indicator (upgrade confirmations) |
| `troop_slot` | Individual troop icon in the bottom deployment bar |

**Steps:**

- [ ] **Step 1: Upload captured screenshots to Roboflow**

1. Go to [roboflow.com](https://roboflow.com) → Create free account
2. Create new project: "CoC-UI-Classes", type "Object Detection"
3. Upload all screenshots from `datasets/captured/*.jpg`

- [ ] **Step 2: Add the 18 class labels in Roboflow**

In Roboflow project settings → Classes, add each class from the table above.

- [ ] **Step 3: Label at least 15 examples of each button class**

Use Roboflow's annotation tool. Draw tight bounding boxes around each UI element.

**Tips:**
- `btn_attack` appears on every village screenshot
- `btn_next_base` appears on every scouting screenshot — label it in every scouting image
- `hud_results`, `btn_return_home` appear together on results screens
- For `troop_slot`, label each visible troop icon in the bottom bar (there will be 4-8 per battle screenshot)
- Skip `hud_village` / `hud_scouting` / `hud_army` if they're ambiguous — prioritize the button classes

- [ ] **Step 4: Export in YOLOv11 format**

In Roboflow: Generate → Export → Format: "YOLOv11" → Download ZIP.

Extract to `datasets/labeled/`:
```bash
mkdir -p datasets/labeled
unzip ~/Downloads/CoC-UI-Classes.v1.yolov11.zip -d datasets/labeled/
```

Verify structure:
```bash
ls datasets/labeled/
# Expected: train/ valid/ test/ data.yaml
```

- [ ] **Step 5: No commit needed** (datasets/ is gitignored)

---

## Task 8: Merge datasets + train full model

**Files:**
- Create: `training/merge_datasets.py`

- [ ] **Step 1: Create `training/merge_datasets.py`**

```python
#!/usr/bin/env python3
"""
Merge the public building dataset (16 classes) with the manually labeled
UI dataset (18 classes) into a unified dataset for the full model.

The public dataset uses class IDs 0-15.
The labeled dataset uses class IDs 0-17 (Roboflow resets to 0).
This script remaps labeled IDs → 16-33 so they don't conflict.

Usage:
    python training/merge_datasets.py \
        --public datasets/public \
        --labeled datasets/labeled \
        --output datasets/full
"""

import shutil
import argparse
from pathlib import Path


PUBLIC_CLASSES = [
    "ad", "airsweeper", "bombtower", "canon", "clancastle",
    "eagle", "inferno", "kingpad", "mortar", "queenpad",
    "rcpad", "scattershot", "th13", "wardenpad", "wizztower", "xbow",
]

# Must match the class order used in Roboflow when labeling.
# Check datasets/labeled/data.yaml for the exact order Roboflow assigned.
LABELED_CLASSES = [
    "btn_attack", "btn_find_match", "btn_start_battle", "btn_next_base",
    "btn_return_home", "btn_end_battle", "btn_confirm", "btn_close",
    "btn_okay", "btn_later", "hud_village", "hud_scouting", "hud_results",
    "hud_army", "loot_gold", "loot_elixir", "loot_gem", "troop_slot",
]

ALL_CLASSES = PUBLIC_CLASSES + LABELED_CLASSES


def _copy_split(src_img: Path, src_lbl: Path, dst_img: Path, dst_lbl: Path,
                prefix: str, class_offset: int) -> int:
    """Copy images and labels (with class ID remapping) to destination."""
    if not src_img.exists():
        return 0
    count = 0
    for img_path in sorted(src_img.glob("*.jpg")):
        shutil.copy(img_path, dst_img / f"{prefix}_{img_path.name}")
        lbl_path = src_lbl / img_path.with_suffix(".txt").name
        dst_lbl_path = dst_lbl / f"{prefix}_{lbl_path.name}"
        if lbl_path.exists():
            lines = [l for l in lbl_path.read_text().strip().split("\n") if l.strip()]
            remapped = []
            for line in lines:
                parts = line.split()
                parts[0] = str(int(parts[0]) + class_offset)
                remapped.append(" ".join(parts))
            dst_lbl_path.write_text("\n".join(remapped))
        else:
            dst_lbl_path.write_text("")
        count += 1
    return count


def merge(public_dir: str, labeled_dir: str, output_dir: str) -> None:
    pub = Path(public_dir)
    lab = Path(labeled_dir)
    out = Path(output_dir)

    for split in ("train", "val"):
        (out / split / "images").mkdir(parents=True, exist_ok=True)
        (out / split / "labels").mkdir(parents=True, exist_ok=True)

    total = 0

    # Public dataset: class IDs 0-15, no remapping needed
    # train split + test split both go to train (small dataset — maximize training data)
    for src_split, dst_split in [("train", "train"), ("test", "train")]:
        n = _copy_split(
            pub / src_split / "images", pub / src_split / "labels",
            out / dst_split / "images", out / dst_split / "labels",
            prefix=f"pub_{src_split}", class_offset=0,
        )
        total += n
    n = _copy_split(
        pub / "validation" / "images", pub / "validation" / "labels",
        out / "val" / "images", out / "val" / "labels",
        prefix="pub_val", class_offset=0,
    )
    total += n

    # Labeled dataset: Roboflow class IDs start at 0, remap to 16+
    labeled_offset = len(PUBLIC_CLASSES)
    for src_split, dst_split in [("train", "train"), ("test", "train")]:
        n = _copy_split(
            lab / src_split / "images", lab / src_split / "labels",
            out / dst_split / "images", out / dst_split / "labels",
            prefix=f"lab_{src_split}", class_offset=labeled_offset,
        )
        total += n
    for src_split in ("valid", "validation"):
        src_img = lab / src_split / "images"
        if src_img.exists():
            n = _copy_split(
                src_img, lab / src_split / "labels",
                out / "val" / "images", out / "val" / "labels",
                prefix="lab_val", class_offset=labeled_offset,
            )
            total += n
            break

    # Write unified dataset.yaml
    yaml_path = out / "dataset.yaml"
    yaml_path.write_text(
        f"# Full CoC dataset: public buildings (16 classes) + labeled UI (18 classes)\n\n"
        f"path: {out.resolve()}\n"
        f"train: train/images\n"
        f"val: val/images\n\n"
        f"nc: {len(ALL_CLASSES)}\n"
        f"names: {ALL_CLASSES}\n"
    )

    print(f"Merged {total} images → {out}/")
    print(f"Classes ({len(ALL_CLASSES)}): {ALL_CLASSES[:5]}... +{len(ALL_CLASSES) - 5} more")
    print(f"Config: {yaml_path}")


if __name__ == "__main__":
    p = argparse.ArgumentParser()
    p.add_argument("--public", default="datasets/public")
    p.add_argument("--labeled", default="datasets/labeled")
    p.add_argument("--output", default="datasets/full")
    args = p.parse_args()
    merge(args.public, args.labeled, args.output)
```

- [ ] **Step 2: Run the merge**

```bash
.venv/bin/python training/merge_datasets.py \
    --public datasets/public \
    --labeled datasets/labeled \
    --output datasets/full
```

Expected:
```
Merged ~300 images → datasets/full/
Classes (34): ['ad', 'airsweeper', ...]... +29 more
Config: datasets/full/dataset.yaml
```

- [ ] **Step 3: Verify class IDs in merged labels**

```bash
# Public image should have IDs 0-15
head -3 datasets/full/train/labels/pub_train_00000.txt

# Labeled image should have IDs 16-33
head -3 datasets/full/train/labels/lab_train_*.txt | head -5
```

- [ ] **Step 4: Train the full model**

```bash
.venv/bin/python training/train.py \
    --data datasets/full/dataset.yaml \
    --epochs 100 \
    --model n
```

Expected output ends with:
```
Best weights → models/coc.pt
```
Takes ~30-60 minutes on CPU, ~10 minutes on Apple Silicon MPS.

- [ ] **Step 5: Verify the full model detects buttons**

```bash
.venv/bin/python -c "
from bot.detector import Detector
# Just verify the model loads and reports 34 classes
d = Detector('models/coc.pt')
print(f'Classes: {len(d._model.names)}')
print('Full model OK')
"
```

Expected: `Classes: 34` and `Full model OK`

- [ ] **Step 6: Commit training scripts**

```bash
git add training/merge_datasets.py
git commit -m "feat: add training/merge_datasets.py to unify public + labeled UI datasets"
```

---

## Task 9: Rewrite `bot/vision.py` internals

**Files:**
- Modify: `bot/vision.py` (full rewrite — same public API, YOLO internals)
- Test: `tests/test_vision_yolo.py`

The public API that must remain unchanged:
- `find_button(img, button_name) → tuple[int,int] | None`
- `detect_screen_state(img) → GameState`
- `read_enemy_loot(img) → tuple[int,int]`
- `read_resources_from_village(img) → tuple[int,int]`
- `get_deploy_corner(img) → list[tuple[int,int]]`
- `get_troop_slots(img) → list[tuple[int,int]]`
- `find_popup(img) → tuple[int,int] | None`
- `validate_critical_templates() → None`
- `auto_capture_template(img, button_name) → bool`

- [ ] **Step 1: Write failing tests for the new vision.py API**

Create `tests/test_vision_yolo.py`:

```python
"""
Tests for the YOLO-backed bot/vision.py public API.
All tests mock bot.vision._get_detector so no model file is needed.
"""
import numpy as np
import pytest
from unittest.mock import MagicMock, patch
from bot.state_machine import GameState


def _make_frame(h=1440, w=2560):
    return np.zeros((h, w, 3), dtype=np.uint8)


def _mock_detector(detected_classes: list[str], center: tuple[int, int] = (150, 250)):
    """Return a mock Detector whose predict() returns one detection per class."""
    from bot.detector import Detection
    x1, y1 = center[0] - 50, center[1] - 25
    x2, y2 = center[0] + 50, center[1] + 25
    dets = [Detection(cls=c, conf=0.9, x1=x1, y1=y1, x2=x2, y2=y2)
            for c in detected_classes]

    mock = MagicMock()
    mock.predict.return_value = dets
    mock.find.side_effect = lambda frame, cls: next(
        (d for d in dets if d.cls == cls), None)
    mock.find_any.side_effect = lambda frame, *classes: (
        max((d for d in dets if d.cls in classes), key=lambda d: d.conf, default=None))
    mock.find_all.side_effect = lambda frame, cls: [d for d in dets if d.cls == cls]
    return mock


class TestDetectScreenState:
    def _check(self, classes, expected_state):
        from bot.vision import detect_screen_state
        with patch("bot.vision._get_detector", return_value=_mock_detector(classes)):
            assert detect_screen_state(_make_frame()) == expected_state

    def test_results_hud(self):
        self._check(["hud_results"], GameState.RESULTS)

    def test_results_return_home_button(self):
        self._check(["btn_return_home"], GameState.RESULTS)

    def test_scouting_next_base(self):
        self._check(["btn_next_base"], GameState.SCOUTING)

    def test_scouting_end_battle_plus_loot(self):
        self._check(["btn_end_battle", "loot_gold"], GameState.SCOUTING)

    def test_scouting_end_battle_plus_elixir(self):
        self._check(["btn_end_battle", "loot_elixir"], GameState.SCOUTING)

    def test_battle_active_end_battle_no_loot(self):
        self._check(["btn_end_battle"], GameState.BATTLE_ACTIVE)

    def test_army_start_battle(self):
        self._check(["btn_start_battle"], GameState.ARMY)

    def test_attack_menu_find_match(self):
        self._check(["btn_find_match"], GameState.ATTACK_MENU)

    def test_village_attack_button(self):
        self._check(["btn_attack"], GameState.VILLAGE)

    def test_village_hud(self):
        self._check(["hud_village"], GameState.VILLAGE)

    def test_unknown_no_detections(self):
        self._check([], GameState.UNKNOWN)

    def test_results_takes_priority_over_end_battle(self):
        # If both hud_results and btn_end_battle visible, should be RESULTS
        self._check(["hud_results", "btn_end_battle"], GameState.RESULTS)


class TestFindButton:
    def test_returns_center_for_known_button(self):
        from bot.vision import find_button
        mock_d = _mock_detector(["btn_attack"], center=(300, 400))
        with patch("bot.vision._get_detector", return_value=mock_d):
            result = find_button(_make_frame(), "attack_button")
        assert result == (300, 400)

    def test_returns_none_when_button_not_detected(self):
        from bot.vision import find_button
        mock_d = _mock_detector([])
        with patch("bot.vision._get_detector", return_value=mock_d):
            result = find_button(_make_frame(), "attack_button")
        assert result is None

    def test_returns_none_for_unknown_button_name(self):
        from bot.vision import find_button
        mock_d = _mock_detector(["btn_attack"])
        with patch("bot.vision._get_detector", return_value=mock_d):
            result = find_button(_make_frame(), "totally_unknown_button")
        assert result is None

    def test_next_base_maps_to_btn_next_base(self):
        from bot.vision import find_button
        mock_d = _mock_detector(["btn_next_base"], center=(2300, 1200))
        with patch("bot.vision._get_detector", return_value=mock_d):
            result = find_button(_make_frame(), "next_base")
        assert result == (2300, 1200)


class TestFindPopup:
    def test_finds_close_x(self):
        from bot.vision import find_popup
        mock_d = _mock_detector(["btn_close"], center=(100, 100))
        with patch("bot.vision._get_detector", return_value=mock_d):
            result = find_popup(_make_frame())
        assert result == (100, 100)

    def test_finds_okay(self):
        from bot.vision import find_popup
        mock_d = _mock_detector(["btn_okay"], center=(200, 300))
        with patch("bot.vision._get_detector", return_value=mock_d):
            result = find_popup(_make_frame())
        assert result == (200, 300)

    def test_returns_none_when_no_popup(self):
        from bot.vision import find_popup
        mock_d = _mock_detector([])
        with patch("bot.vision._get_detector", return_value=mock_d):
            result = find_popup(_make_frame())
        assert result is None


class TestGetTroopSlots:
    def test_returns_centers_sorted_by_x(self):
        from bot.vision import get_troop_slots
        from bot.detector import Detection
        dets = [
            Detection("troop_slot", 0.9, 200, 1300, 250, 1350),
            Detection("troop_slot", 0.85, 100, 1300, 150, 1350),
            Detection("troop_slot", 0.8, 300, 1300, 350, 1350),
        ]
        mock_d = MagicMock()
        mock_d.find_all.return_value = dets
        with patch("bot.vision._get_detector", return_value=mock_d):
            slots = get_troop_slots(_make_frame())
        xs = [s[0] for s in slots]
        assert xs == sorted(xs)
        assert len(slots) == 3

    def test_returns_empty_when_no_slots(self):
        from bot.vision import get_troop_slots
        mock_d = MagicMock()
        mock_d.find_all.return_value = []
        with patch("bot.vision._get_detector", return_value=mock_d):
            assert get_troop_slots(_make_frame()) == []


class TestGetDeployCorner:
    def test_returns_correct_number_of_points(self):
        from bot.vision import get_deploy_corner
        import bot.config as config
        points = get_deploy_corner(_make_frame())
        assert len(points) == config.DEPLOY_NUM_POINTS

    def test_points_are_within_frame_bounds(self):
        from bot.vision import get_deploy_corner
        frame = _make_frame(h=1440, w=2560)
        for x, y in get_deploy_corner(frame):
            assert 0 <= x <= 2560
            assert 0 <= y <= 1440


class TestValidateCriticalTemplates:
    def test_passes_when_detector_loads(self):
        from bot.vision import validate_critical_templates
        mock_d = MagicMock()
        with patch("bot.vision._get_detector", return_value=mock_d):
            validate_critical_templates()  # should not raise

    def test_raises_when_detector_fails(self):
        from bot.vision import validate_critical_templates
        with patch("bot.vision._get_detector", side_effect=FileNotFoundError("model missing")):
            with pytest.raises(FileNotFoundError, match="model"):
                validate_critical_templates()


class TestAutoCaptureLegacy:
    def test_returns_false_noop(self):
        from bot.vision import auto_capture_template
        frame = _make_frame()
        result = auto_capture_template(frame, "next_base")
        assert result is False
```

- [ ] **Step 2: Run tests to verify they fail**

```bash
.venv/bin/python -m pytest tests/test_vision_yolo.py -v 2>&1 | head -40
```

Expected: most tests fail because `bot.vision._get_detector` doesn't exist yet.

- [ ] **Step 3: Rewrite `bot/vision.py`**

Replace the entire file with:

```python
"""
Vision module — YOLO-driven detection for all game screens.

Replaces OpenCV template matching with YOLOv11 inference.
Public API is unchanged — all callers (battle.py, main.py, screen.py)
require no modifications.

Digit OCR (read_enemy_loot, read_resources_from_village) is retained
unchanged as it is effective for reading game font digits.
"""

import cv2
import numpy as np
import logging
import threading
import pytesseract
from bot.state_machine import GameState

logger = logging.getLogger("coc.vision")

from bot.config import (
    TEMPLATE_THRESHOLD,
    GOLD_REGION, ELIXIR_REGION,
    ENEMY_LOOT_X_RANGE, ENEMY_LOOT_Y_RANGE, ENEMY_LOOT_STRIP_HEIGHT,
    ENEMY_LOOT_Y_STEP, ENEMY_LOOT_Y_DEDUP, ENEMY_LOOT_SCALES,
    DEPLOY_NUM_POINTS, DEPLOY_X_START, DEPLOY_X_END, DEPLOY_Y_START, DEPLOY_Y_END,
)


# ─── YOLO DETECTOR (lazy singleton) ───────────────────────────

_detector = None
_detector_lock = threading.Lock()

# Maps legacy button names used by battle.py / main.py → YOLO class names
_BUTTON_CLASS_MAP = {
    "attack_button":   "btn_attack",
    "find_match":      "btn_find_match",
    "start_battle":    "btn_start_battle",
    "stars_screen":    "hud_results",
    "return_home":     "btn_return_home",
    "next_base":       "btn_next_base",
    "end_battle":      "btn_end_battle",
    "confirm_upgrade": "btn_confirm",
    "gem_cost":        "loot_gem",
    "close_x":         "btn_close",
    "okay_button":     "btn_okay",
    "later_button":    "btn_later",
    "loot_gold":       "loot_gold",
    "loot_elixir":     "loot_elixir",
}


def _get_detector():
    """Return the lazy-loaded Detector singleton (double-checked locking)."""
    global _detector
    if _detector is not None:
        return _detector
    with _detector_lock:
        if _detector is not None:
            return _detector
        from bot.settings import Settings
        settings = Settings()
        model_path = settings.get("yolo_model_path", "models/coc.pt")
        confidence = settings.get("yolo_confidence_threshold", 0.45)
        from bot.detector import Detector
        _detector = Detector(model_path, confidence)
    return _detector


# ─── SCREEN STATE DETECTION ───────────────────────────────────

def detect_screen_state(img):
    """
    Determine which screen is currently visible.
    Returns a GameState enum. Enum values are backward-compatible strings.
    """
    dets = _get_detector().predict(img)
    cls_set = {d.cls for d in dets}

    # Results screen overlays everything — check first
    if "hud_results" in cls_set or "btn_return_home" in cls_set:
        return GameState.RESULTS

    # Next base button = scouting an enemy base
    if "btn_next_base" in cls_set:
        return GameState.SCOUTING

    # End battle + loot = scouting; end battle alone = in-battle
    if "btn_end_battle" in cls_set:
        if "loot_gold" in cls_set or "loot_elixir" in cls_set:
            return GameState.SCOUTING
        return GameState.BATTLE_ACTIVE

    # Army selection screen
    if "btn_start_battle" in cls_set:
        return GameState.ARMY

    # Attack menu
    if "btn_find_match" in cls_set:
        return GameState.ATTACK_MENU

    # Village
    if "btn_attack" in cls_set or "hud_village" in cls_set:
        return GameState.VILLAGE

    return GameState.UNKNOWN


# ─── BUTTON FINDING ───────────────────────────────────────────

def find_button(img, button_name):
    """Find a button by legacy name. Returns (x, y) center or None."""
    yolo_cls = _BUTTON_CLASS_MAP.get(button_name)
    if yolo_cls is None:
        logger.warning("find_button: unknown button name '%s'", button_name)
        return None
    det = _get_detector().find(img, yolo_cls)
    return det.center if det else None


def find_popup(img):
    """Detect any dismissable popup. Returns (x, y) to tap, or None."""
    for cls in ("btn_close", "btn_okay", "btn_later"):
        det = _get_detector().find(img, cls)
        if det:
            return det.center
    return None


# ─── DEPLOYMENT ───────────────────────────────────────────────

def get_deploy_corner(img):
    """Return a list of (x, y) deployment points along the top-left battle edge."""
    h, w = img.shape[:2]
    points = []
    for i in range(DEPLOY_NUM_POINTS):
        t = i / (DEPLOY_NUM_POINTS - 1)
        x = int(w * DEPLOY_X_START + t * (w * (DEPLOY_X_END - DEPLOY_X_START)))
        y = int(h * DEPLOY_Y_START - t * (h * (DEPLOY_Y_START - DEPLOY_Y_END)))
        points.append((x, y))
    return points


def get_troop_slots(img):
    """Detect troop slot icons in the bottom deployment bar. Returns [(x,y)] sorted by x."""
    dets = _get_detector().find_all(img, "troop_slot")
    slots = [d.center for d in dets]
    slots.sort(key=lambda s: s[0])
    return slots


# ─── DIGIT OCR (unchanged — for loot and resource reading) ────

_DIGIT_TEMPLATES = None


def _load_digit_templates():
    """Load digit templates 0-9 from templates/digits/ as grayscale."""
    global _DIGIT_TEMPLATES
    from bot.settings import BASE_WIDTH, BASE_HEIGHT
    from bot.config import SCREEN_WIDTH, SCREEN_HEIGHT
    from bot.utils import load_template

    rx = SCREEN_WIDTH / BASE_WIDTH
    ry = SCREEN_HEIGHT / BASE_HEIGHT
    need_scale = (rx != 1.0 or ry != 1.0)

    _DIGIT_TEMPLATES = {}
    for d in range(10):
        t = load_template(f"templates/digits/{d}.png")
        if t is not None:
            if need_scale:
                h, w = t.shape[:2]
                t = cv2.resize(t, (max(1, int(w * rx)), max(1, int(h * ry))),
                               interpolation=cv2.INTER_AREA)
            _DIGIT_TEMPLATES[d] = cv2.cvtColor(t, cv2.COLOR_BGR2GRAY)
    if _DIGIT_TEMPLATES:
        logger.info("Loaded digit templates: %s", sorted(_DIGIT_TEMPLATES.keys()))


def _read_number_template(crop, extra_scales=None, is_gray=False):
    """Read a number from a cropped image using digit template matching."""
    global _DIGIT_TEMPLATES
    if _DIGIT_TEMPLATES is None:
        _load_digit_templates()
    if not _DIGIT_TEMPLATES:
        return _ocr_number(crop)

    gray = crop if is_gray else (
        cv2.cvtColor(crop, cv2.COLOR_BGR2GRAY) if len(crop.shape) == 3 else crop
    )

    all_matches = []
    for digit, template in _DIGIT_TEMPLATES.items():
        for scale in (extra_scales or [0.9, 1.0, 1.1]):
            th, tw = template.shape[:2]
            sh, sw = int(th * scale), int(tw * scale)
            if sh > gray.shape[0] or sw > gray.shape[1]:
                continue
            scaled = cv2.resize(template, (sw, sh))
            result = cv2.matchTemplate(gray, scaled, cv2.TM_CCOEFF_NORMED)
            for y, x in zip(*np.where(result >= TEMPLATE_THRESHOLD)):
                all_matches.append((x, digit, result[y, x], sw))

    all_matches.sort(key=lambda m: -m[2])
    digits_found = []
    for x, digit, conf, sw in all_matches:
        if not any(abs(x - ex) < sw * 0.4 for ex, _, _ in digits_found):
            digits_found.append((x, digit, conf))

    digits_found.sort(key=lambda d: d[0])
    number_str = "".join(str(d) for _, d, _ in digits_found)
    return int(number_str) if number_str else None


def _ocr_number(crop):
    """Fallback: OCR a crop to extract a number."""
    if crop.size == 0:
        return 0
    gray = cv2.cvtColor(crop, cv2.COLOR_BGR2GRAY) if len(crop.shape) == 3 else crop
    big = cv2.resize(gray, None, fx=4, fy=4, interpolation=cv2.INTER_CUBIC)
    _, thresh = cv2.threshold(big, 180, 255, cv2.THRESH_BINARY)
    thresh = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, np.ones((3, 3), np.uint8))
    padded = cv2.copyMakeBorder(thresh, 30, 30, 30, 30, cv2.BORDER_CONSTANT, value=0)
    text = pytesseract.image_to_string(
        padded, config="--psm 7 -c tessedit_char_whitelist=0123456789"
    ).strip().replace(" ", "").replace(",", "").replace(".", "")
    return int(text) if text else None


# ─── RESOURCE AND LOOT READING ───────────────────────────────

def read_resources_from_village(img):
    """Read gold and elixir from the village screen HUD (top-right corner)."""
    gx1, gy1, gx2, gy2 = GOLD_REGION
    gold = _read_number_template(img[gy1:gy2, gx1:gx2]) or 0

    ex1, ey1, ex2, ey2 = ELIXIR_REGION
    elixir = _read_number_template(img[ey1:ey2, ex1:ex2]) or 0

    return gold, elixir


def read_enemy_loot(img):
    """
    Read enemy gold and elixir from the scouting screen (top-left area).
    Scans horizontal strips to find digit rows. Early-exits after 2 numbers.
    Returns (gold, elixir).
    """
    loot_x1, loot_x2 = ENEMY_LOOT_X_RANGE
    loot_y1, loot_y2 = ENEMY_LOOT_Y_RANGE

    loot_area = img[loot_y1:loot_y2, loot_x1:loot_x2]
    loot_gray = (cv2.cvtColor(loot_area, cv2.COLOR_BGR2GRAY)
                 if len(loot_area.shape) == 3 else loot_area)

    numbers_found = []
    for y_offset in range(0, loot_y2 - loot_y1, ENEMY_LOOT_Y_STEP):
        strip = loot_gray[y_offset:y_offset + ENEMY_LOOT_STRIP_HEIGHT, :]
        if strip.size == 0:
            continue
        value = _read_number_template(strip, extra_scales=ENEMY_LOOT_SCALES, is_gray=True)
        if value is not None and value > 1000:
            abs_y = y_offset + loot_y1
            if not any(abs(abs_y - py) < ENEMY_LOOT_Y_DEDUP for py, _ in numbers_found):
                numbers_found.append((abs_y, value))
                if len(numbers_found) >= 2:
                    break

    gold = numbers_found[0][1] if numbers_found else 0
    elixir = numbers_found[1][1] if len(numbers_found) > 1 else 0
    return gold, elixir


# ─── VALIDATION ──────────────────────────────────────────────

def validate_critical_templates():
    """Verify the YOLO model is present and loads correctly. Raises on failure."""
    try:
        _get_detector()
        logger.info("YOLO detector validated successfully")
    except Exception as e:
        raise FileNotFoundError(
            f"YOLO model failed to load: {e}. "
            f"Train the model first: python training/train.py "
            f"--data datasets/full/dataset.yaml"
        ) from e


def auto_capture_template(img, button_name):
    """Legacy no-op: template auto-capture is not needed with YOLO detection."""
    return False
```

- [ ] **Step 4: Run new vision tests**

```bash
.venv/bin/python -m pytest tests/test_vision_yolo.py -v
```

Expected: all tests pass.

- [ ] **Step 5: Verify existing offline tests still pass (flow tests)**

The flow tests in `tests/test_offline.py` mock `bot.battle.read_enemy_loot` etc. — they should still pass since the public API is unchanged.

```bash
.venv/bin/python tests/test_offline.py flow
```

Expected:
```
PASS: scout_and_decide returned (True, ...) for high loot
PASS: deploy_troops() was called
...
```

- [ ] **Step 6: Commit**

```bash
git add bot/vision.py tests/test_vision_yolo.py
git commit -m "feat: rewrite bot/vision.py internals to use YOLOv11 detector"
```

---

## Task 10: Run all tests + integration smoke test

**Files:** None (verification only)

- [ ] **Step 1: Run the full pytest suite**

```bash
.venv/bin/python -m pytest tests/test_detector.py tests/test_vision_yolo.py tests/test_stream_unit.py -v
```

Expected: all tests pass.

- [ ] **Step 2: Verify syntax of all modified files**

```bash
.venv/bin/python -c "
import ast
for f in ['bot/detector.py', 'bot/vision.py', 'bot/settings.py',
          'training/download_dataset.py', 'training/train.py',
          'training/capture_frames.py', 'training/merge_datasets.py']:
    ast.parse(open(f).read())
    print(f'OK  {f}')
"
```

Expected: `OK` for all files.

- [ ] **Step 3: Smoke test with trained model**

With BlueStacks running CoC on the village screen:

```bash
.venv/bin/python -c "
from bot.screen import init_stream, shutdown_stream, screenshot
from bot.vision import detect_screen_state, find_button

init_stream()
import time; time.sleep(2)  # let stream warm up

img = screenshot()
state = detect_screen_state(img)
print(f'Screen state: {state}')

btn = find_button(img, 'attack_button')
print(f'Attack button: {btn}')

shutdown_stream()
"
```

Expected:
```
Screen state: GameState.VILLAGE
Attack button: (<x>, <y>)  ← actual coordinates
```

- [ ] **Step 4: Final commit**

```bash
git add requirements.txt .gitignore bot/settings.py
git commit -m "feat: Phase 2a complete — YOLOv11 replaces template matching in bot/vision.py

- bot/detector.py: new Detector class wrapping YOLOv11 inference
- bot/vision.py: full rewrite using YOLO for state/button/troop detection;
  digit OCR retained for loot and resource reading
- training/: download, capture, merge, train pipeline scripts
- Public API of bot/vision.py is unchanged — no callers modified

Closes #1"
```

---

## Self-Review

### Spec coverage check

| Requirement from issue #1 | Covered by |
|---|---|
| `bot/detector.py` with `Detection` + `Detector` | Task 3 |
| Download public HuggingFace dataset | Task 4 |
| Training script | Task 5 |
| Baseline model trained | Task 5 |
| Data capture during live runs | Task 6 |
| Label UI/button classes | Task 7 |
| Merge pipeline | Task 8 |
| Fine-tune full model | Task 8 |
| Rewrite `bot/vision.py` internals | Task 9 |
| Public API unchanged | Task 9 (verified in step 5) |
| `validate_critical_templates()` updated | Task 9 (validates model, not templates) |
| `auto_capture_template()` made no-op | Task 9 |
| Settings `yolo_model_path`, `yolo_confidence_threshold` | Task 2 |
| `requirements.txt` updated | Task 1 |
| `.gitignore` updated | Task 1 |
| Unit tests for `Detector` | Task 3 |
| Unit tests for new `vision.py` | Task 9 |
| Integration smoke test | Task 10 |
| Inference ≤ 50ms on Apple Silicon | Verified by YOLOv11n benchmarks (≈10ms on MPS) |

### No placeholders found ✓
### Type consistency check ✓
- `Detection` defined in Task 3, used identically in Tasks 3 and 9
- `Detector.predict()` → `list[Detection]` used correctly throughout
- `detect_screen_state()` → `GameState` consistent with callers
- `find_button()` → `tuple[int,int] | None` consistent with `battle.py` callers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 2a: YOLO Detection Foundation — full implementation #3

Phase 2a: YOLO Detection Foundation Implementation Plan

File Structure

New files

Modified files

Task 1: Dependencies + `.gitignore`

Task 2: Settings — add YOLO keys

Task 3: `bot/detector.py` — Detection + Detector

Task 4: Download public dataset

Task 5: Training script + train baseline model

Task 6: Data capture pipeline

Task 7: Label UI classes in Roboflow + export

Task 8: Merge datasets + train full model

Task 9: Rewrite `bot/vision.py` internals

Task 10: Run all tests + integration smoke test

Self-Review

Spec coverage check

No placeholders found ✓

Type consistency check ✓

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Class	What it looks like
`btn_attack`	Green "Attack!" sword button (village screen, bottom-left)
`btn_find_match`	"Find a Match" button (attack menu)
`btn_start_battle`	Green "Battle!" button (army screen, bottom-right)
`btn_next_base`	"Next" button with gold coin cost (scouting screen, bottom-right)
`btn_return_home`	"Return Home" button (results screen)
`btn_end_battle`	Red "End Battle" / "Surrender" button
`btn_confirm`	Confirmation checkmark button (upgrade dialogs)
`btn_close`	X close button (popups)
`btn_okay`	"OK" button (popups)
`btn_later`	"Later" button (popups)
`hud_village`	The village resource bar (top of village screen — gold + elixir HUD)
`hud_scouting`	The scouting HUD overlay (loot values visible in top-left)
`hud_results`	Stars/results screen overlay
`hud_army`	The army selection screen HUD
`loot_gold`	Gold icon + value in scouting screen (top-left)
`loot_elixir`	Elixir icon + value in scouting screen (top-left)
`loot_gem`	Green gem cost indicator (upgrade confirmations)
`troop_slot`	Individual troop icon in the bottom deployment bar

Requirement from issue #1	Covered by
`bot/detector.py` with `Detection` + `Detector`	Task 3
Download public HuggingFace dataset	Task 4
Training script	Task 5
Baseline model trained	Task 5
Data capture during live runs	Task 6
Label UI/button classes	Task 7
Merge pipeline	Task 8
Fine-tune full model	Task 8
Rewrite `bot/vision.py` internals	Task 9
Public API unchanged	Task 9 (verified in step 5)
`validate_critical_templates()` updated	Task 9 (validates model, not templates)
`auto_capture_template()` made no-op	Task 9
Settings `yolo_model_path`, `yolo_confidence_threshold`	Task 2
`requirements.txt` updated	Task 1
`.gitignore` updated	Task 1
Unit tests for `Detector`	Task 3
Unit tests for new `vision.py`	Task 9
Integration smoke test	Task 10
Inference ≤ 50ms on Apple Silicon	Verified by YOLOv11n benchmarks (≈10ms on MPS)

Phase 2a: YOLO Detection Foundation — full implementation #3

Description

Phase 2a: YOLO Detection Foundation Implementation Plan

File Structure

New files

Modified files

Task 1: Dependencies + .gitignore

Task 2: Settings — add YOLO keys

Task 3: bot/detector.py — Detection + Detector

Task 4: Download public dataset

Task 5: Training script + train baseline model

Task 6: Data capture pipeline

Task 7: Label UI classes in Roboflow + export

Task 8: Merge datasets + train full model

Task 9: Rewrite bot/vision.py internals

Task 10: Run all tests + integration smoke test

Self-Review

Spec coverage check

No placeholders found ✓

Type consistency check ✓

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Task 1: Dependencies + `.gitignore`

Task 3: `bot/detector.py` — Detection + Detector

Task 9: Rewrite `bot/vision.py` internals