Skip to content

Phase 2a: YOLO Detection Foundation — full implementation #3

@Armin2708

Description

@Armin2708

Phase 2a: YOLO Detection Foundation Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Replace all OpenCV template-matching in bot/vision.py with a YOLOv11 model that detects every game element — buildings, defenses, hero pads, UI buttons, and screen state indicators — while keeping the existing public API unchanged so no callers need to change.

Architecture: A new bot/detector.py module wraps Ultralytics YOLO inference behind a clean Detector class with a predict(frame) → list[Detection] interface. bot/vision.py is rewritten to use this detector while keeping its public functions (find_button, detect_screen_state, get_troop_slots, find_popup, validate_critical_templates) identical in signature. Digit OCR (for loot/resource reading) stays unchanged — YOLO handles detection, template matching handles digit recognition.

Tech Stack: Ultralytics YOLOv11 (ultralytics>=8.3), HuggingFace datasets library for public CoC dataset, Roboflow (web UI, free tier) for manual labeling of UI classes, Python 3.13, pytest


File Structure

New files

  • bot/detector.pyDetection dataclass + Detector class wrapping YOLO inference
  • training/download_dataset.py — downloads keremberke/clash-of-clans-object-detection from HuggingFace, converts COCO→YOLOv11 format
  • training/train.py — fine-tunes YOLOv11n on a dataset YAML, copies best weights to models/coc.pt
  • training/capture_frames.py — saves screenshots + state metadata during live bot runs for building the UI class dataset
  • training/merge_datasets.py — merges public buildings dataset + manually labeled UI dataset into datasets/full/
  • models/.gitkeep — placeholder so the directory is committed (weights are gitignored)
  • tests/test_detector.py — unit tests for bot/detector.py
  • tests/test_vision_yolo.py — unit tests for rewritten bot/vision.py

Modified files

  • requirements.txt — add ultralytics>=8.3, datasets>=2.0
  • .gitignore — add models/*.pt, datasets/, runs/
  • bot/settings.py — add yolo_model_path, yolo_confidence_threshold defaults
  • bot/vision.py — full rewrite of internals (public API unchanged)

Task 1: Dependencies + .gitignore

Files:

  • Modify: requirements.txt

  • Modify: .gitignore

  • Step 1: Add new dependencies to requirements.txt

opencv-python>=4.8.0
numpy>=1.20.0
pytesseract>=0.3.10
Pillow>=9.0.0
PySide6>=6.6.0,<6.11
packaging>=21.0
ultralytics>=8.3
datasets>=2.0
  • Step 2: Add model and training artifacts to .gitignore

Append to .gitignore:

# YOLO model weights (large binaries — download/train locally)
models/*.pt

# Training datasets (downloaded/generated — do not commit)
datasets/

# Ultralytics training run outputs
runs/
  • Step 3: Install new dependencies
.venv/bin/pip install ultralytics>=8.3 "datasets>=2.0"

Expected: installs ultralytics (pulls torch), datasets, huggingface-hub. Takes 1-3 minutes.

  • Step 4: Verify ultralytics works
.venv/bin/python -c "from ultralytics import YOLO; print('ultralytics OK')"

Expected: ultralytics OK (may print version info).

  • Step 5: Create models/ directory with placeholder
mkdir -p models && touch models/.gitkeep
  • Step 6: Commit
git add requirements.txt .gitignore models/.gitkeep
git commit -m "feat: add ultralytics + datasets dependencies for YOLO phase 2a"

Task 2: Settings — add YOLO keys

Files:

  • Modify: bot/settings.py (DEFAULTS dict, around line 146)

  • Step 1: Add YOLO settings to DEFAULTS in bot/settings.py

In the DEFAULTS dict, after the "stream_buffer_size": 60, line, add:

    # YOLO detection model
    "yolo_model_path": "models/coc.pt",
    "yolo_confidence_threshold": 0.45,
  • Step 2: Verify settings loads without error
.venv/bin/python -c "from bot.settings import Settings; s = Settings(); print(s.get('yolo_model_path'))"

Expected: models/coc.pt

  • Step 3: Commit
git add bot/settings.py
git commit -m "feat: add yolo_model_path and yolo_confidence_threshold settings"

Task 3: bot/detector.py — Detection + Detector

Files:

  • Create: bot/detector.py

  • Test: tests/test_detector.py

  • Step 1: Write the failing tests first

Create tests/test_detector.py:

"""Unit tests for bot/detector.py — mocks YOLO so no model file is needed."""
import numpy as np
import pytest
from unittest.mock import MagicMock, patch


def _make_frame(h=1440, w=2560):
    return np.zeros((h, w, 3), dtype=np.uint8)


def _mock_yolo_result(cls_id: int, cls_name: str, conf: float, xyxy: list):
    """Build a minimal mock YOLO result object."""
    mock_box = MagicMock()
    mock_box.cls = [cls_id]
    mock_box.conf = [conf]
    mock_box.xyxy = [np.array(xyxy, dtype=float)]

    mock_result = MagicMock()
    mock_result.boxes = [mock_box]
    mock_result.names = {cls_id: cls_name}
    return mock_result


class TestDetection:
    def test_center_midpoint(self):
        from bot.detector import Detection
        d = Detection(cls="btn_attack", conf=0.9, x1=100, y1=200, x2=200, y2=300)
        assert d.center == (150, 250)

    def test_area(self):
        from bot.detector import Detection
        d = Detection(cls="btn_attack", conf=0.9, x1=0, y1=0, x2=100, y2=50)
        assert d.area == 5000

    def test_bbox_tuple(self):
        from bot.detector import Detection
        d = Detection(cls="x", conf=0.5, x1=10, y1=20, x2=30, y2=40)
        assert d.bbox == (10, 20, 30, 40)


class TestDetectorPredict:
    def test_predict_returns_detection(self):
        from bot.detector import Detector, Detection

        mock_result = _mock_yolo_result(0, "btn_attack", 0.9, [100, 200, 200, 300])
        mock_model = MagicMock(return_value=[mock_result])
        mock_model.names = {0: "btn_attack"}

        with patch("bot.detector.YOLO", return_value=mock_model):
            detector = Detector("fake.pt", confidence=0.5)

        dets = detector.predict(_make_frame())
        assert len(dets) == 1
        assert dets[0].cls == "btn_attack"
        assert dets[0].conf == pytest.approx(0.9)
        assert dets[0].x1 == 100
        assert dets[0].y2 == 300

    def test_predict_empty_result(self):
        from bot.detector import Detector

        mock_result = MagicMock()
        mock_result.boxes = []
        mock_result.names = {}
        mock_model = MagicMock(return_value=[mock_result])

        with patch("bot.detector.YOLO", return_value=mock_model):
            detector = Detector("fake.pt")

        assert detector.predict(_make_frame()) == []

    def test_predict_multiple_detections(self):
        from bot.detector import Detector

        r1 = _mock_yolo_result(0, "btn_attack", 0.9, [0, 0, 100, 50])
        r2 = _mock_yolo_result(1, "btn_next_base", 0.8, [200, 200, 300, 250])

        # Two results in the same inference call
        mock_result_combined = MagicMock()
        box1 = MagicMock()
        box1.cls = [0]; box1.conf = [0.9]; box1.xyxy = [np.array([0, 0, 100, 50])]
        box2 = MagicMock()
        box2.cls = [1]; box2.conf = [0.8]; box2.xyxy = [np.array([200, 200, 300, 250])]
        mock_result_combined.boxes = [box1, box2]
        mock_result_combined.names = {0: "btn_attack", 1: "btn_next_base"}
        mock_model = MagicMock(return_value=[mock_result_combined])

        with patch("bot.detector.YOLO", return_value=mock_model):
            detector = Detector("fake.pt")

        dets = detector.predict(_make_frame())
        assert len(dets) == 2
        assert {d.cls for d in dets} == {"btn_attack", "btn_next_base"}


class TestDetectorFind:
    def _make_detector_with_dets(self, dets):
        from bot.detector import Detector
        detector = Detector.__new__(Detector)
        detector._confidence = 0.5
        detector._model = MagicMock()
        from unittest.mock import patch
        detector._predict_mock = dets
        return detector

    def test_find_returns_highest_confidence(self):
        from bot.detector import Detector, Detection
        detector = Detector.__new__(Detector)
        detector._confidence = 0.5
        det_low = Detection("btn_attack", 0.6, 0, 0, 10, 10)
        det_high = Detection("btn_attack", 0.9, 50, 50, 60, 60)
        det_other = Detection("btn_next_base", 0.95, 100, 100, 110, 110)

        with patch.object(detector, "predict", return_value=[det_low, det_high, det_other]):
            result = detector.find(_make_frame(), "btn_attack")

        assert result is det_high

    def test_find_returns_none_when_absent(self):
        from bot.detector import Detector, Detection
        detector = Detector.__new__(Detector)
        det = Detection("btn_attack", 0.9, 0, 0, 10, 10)

        with patch.object(detector, "predict", return_value=[det]):
            result = detector.find(_make_frame(), "btn_next_base")

        assert result is None

    def test_find_any_returns_first_match(self):
        from bot.detector import Detector, Detection
        detector = Detector.__new__(Detector)
        det1 = Detection("btn_okay", 0.8, 0, 0, 10, 10)
        det2 = Detection("btn_close", 0.85, 50, 50, 60, 60)

        with patch.object(detector, "predict", return_value=[det1, det2]):
            result = detector.find_any(_make_frame(), "btn_close", "btn_okay", "btn_later")

        # Should return highest-conf among the matching classes
        assert result is det2

    def test_find_all_filters_by_class(self):
        from bot.detector import Detector, Detection
        detector = Detector.__new__(Detector)
        dets = [
            Detection("troop_slot", 0.9, 0, 0, 50, 50),
            Detection("troop_slot", 0.85, 60, 0, 110, 50),
            Detection("btn_attack", 0.9, 200, 200, 250, 250),
        ]

        with patch.object(detector, "predict", return_value=dets):
            result = detector.find_all(_make_frame(), "troop_slot")

        assert len(result) == 2
        assert all(d.cls == "troop_slot" for d in result)
  • Step 2: Run tests to verify they fail
.venv/bin/python -m pytest tests/test_detector.py -v 2>&1 | head -30

Expected: ModuleNotFoundError: No module named 'bot.detector'

  • Step 3: Create bot/detector.py
"""
YOLO-based game element detector.

Wraps Ultralytics YOLOv11 inference with a minimal interface:
  Detector.predict(frame)  → list[Detection]
  Detector.find(frame, cls)  → Detection | None
  Detector.find_any(frame, *classes)  → Detection | None
  Detector.find_all(frame, cls)  → list[Detection]
"""

import logging
import numpy as np
from dataclasses import dataclass

logger = logging.getLogger("coc.detector")

# Imported at module level so tests can patch it cleanly
from ultralytics import YOLO


@dataclass
class Detection:
    cls: str
    conf: float
    x1: int
    y1: int
    x2: int
    y2: int

    @property
    def center(self) -> tuple[int, int]:
        return ((self.x1 + self.x2) // 2, (self.y1 + self.y2) // 2)

    @property
    def area(self) -> int:
        return (self.x2 - self.x1) * (self.y2 - self.y1)

    @property
    def bbox(self) -> tuple[int, int, int, int]:
        return (self.x1, self.y1, self.x2, self.y2)


class Detector:
    """Load a YOLOv11 model and run inference on game screenshots."""

    def __init__(self, model_path: str, confidence: float = 0.45):
        logger.info("Loading YOLO model from %s", model_path)
        self._model = YOLO(model_path)
        self._confidence = confidence
        logger.info(
            "Model loaded — %d classes, first 5: %s",
            len(self._model.names),
            list(self._model.names.values())[:5],
        )

    def predict(self, frame: np.ndarray) -> list[Detection]:
        """Run inference on a BGR frame. Returns all detections above confidence."""
        results = self._model(frame, conf=self._confidence, verbose=False)
        detections: list[Detection] = []
        for r in results:
            if r.boxes is None:
                continue
            names = r.names
            for box in r.boxes:
                cls_id = int(box.cls[0])
                detections.append(Detection(
                    cls=names[cls_id],
                    conf=float(box.conf[0]),
                    x1=int(box.xyxy[0][0]),
                    y1=int(box.xyxy[0][1]),
                    x2=int(box.xyxy[0][2]),
                    y2=int(box.xyxy[0][3]),
                ))
        return detections

    def find(self, frame: np.ndarray, cls: str) -> "Detection | None":
        """Return the highest-confidence detection of a given class, or None."""
        matches = [d for d in self.predict(frame) if d.cls == cls]
        return max(matches, key=lambda d: d.conf) if matches else None

    def find_any(self, frame: np.ndarray, *classes: str) -> "Detection | None":
        """Return the highest-confidence detection among multiple classes."""
        matches = [d for d in self.predict(frame) if d.cls in classes]
        return max(matches, key=lambda d: d.conf) if matches else None

    def find_all(self, frame: np.ndarray, cls: str) -> list[Detection]:
        """Return all detections of a given class."""
        return [d for d in self.predict(frame) if d.cls == cls]
  • Step 4: Run tests to verify they pass
.venv/bin/python -m pytest tests/test_detector.py -v

Expected: all 13 tests pass.

  • Step 5: Commit
git add bot/detector.py tests/test_detector.py
git commit -m "feat: add Detector class wrapping YOLOv11 inference (bot/detector.py)"

Task 4: Download public dataset

Files:

  • Create: training/download_dataset.py

The public HuggingFace dataset (keremberke/clash-of-clans-object-detection) has 125 images, 16 building/defense classes, in COCO format. This task downloads it and converts to YOLOv11 TXT format.

  • Step 1: Create training/download_dataset.py
#!/usr/bin/env python3
"""
Download the public Clash of Clans dataset from HuggingFace and convert
from COCO JSON format to YOLOv11 TXT format.

Source: keremberke/clash-of-clans-object-detection (CC BY 4.0, 125 images)
16 classes: buildings and defensive structures.

Usage:
    python training/download_dataset.py
    python training/download_dataset.py --output datasets/public
"""

import argparse
from pathlib import Path


# Class list from the HuggingFace dataset (index = YOLO class ID)
PUBLIC_CLASSES = [
    "ad", "airsweeper", "bombtower", "canon", "clancastle",
    "eagle", "inferno", "kingpad", "mortar", "queenpad",
    "rcpad", "scattershot", "th13", "wardenpad", "wizztower", "xbow",
]


def download_and_convert(output_dir: str = "datasets/public") -> None:
    from datasets import load_dataset

    output = Path(output_dir)
    for split in ("train", "validation", "test"):
        (output / split / "images").mkdir(parents=True, exist_ok=True)
        (output / split / "labels").mkdir(parents=True, exist_ok=True)

    print("Downloading from HuggingFace (keremberke/clash-of-clans-object-detection)...")
    ds = load_dataset("keremberke/clash-of-clans-object-detection", name="full")

    split_map = {"train": "train", "validation": "validation", "test": "test"}
    for hf_split, out_split in split_map.items():
        split_data = ds[hf_split]
        print(f"  Converting {hf_split} ({len(split_data)} images)...")

        for idx, example in enumerate(split_data):
            img = example["image"]  # PIL Image
            w, h = img.size

            img_path = output / out_split / "images" / f"{idx:05d}.jpg"
            img.save(str(img_path), quality=95)

            annotations = example["objects"]
            lines = []
            for bbox, cat_id in zip(annotations["bbox"], annotations["category"]):
                # COCO bbox: [x_topleft, y_topleft, width, height] in pixels
                bx, by, bw, bh = bbox
                cx = (bx + bw / 2) / w
                cy = (by + bh / 2) / h
                nw = bw / w
                nh = bh / h
                lines.append(f"{cat_id} {cx:.6f} {cy:.6f} {nw:.6f} {nh:.6f}")

            lbl_path = output / out_split / "labels" / f"{idx:05d}.txt"
            lbl_path.write_text("\n".join(lines))

    # Write dataset.yaml for training
    yaml_path = output / "dataset.yaml"
    yaml_path.write_text(
        f"# Public CoC baseline dataset (buildings + defenses only)\n"
        f"# Source: keremberke/clash-of-clans-object-detection (HuggingFace, CC BY 4.0)\n\n"
        f"path: {output.resolve()}\n"
        f"train: train/images\n"
        f"val: validation/images\n"
        f"test: test/images\n\n"
        f"nc: {len(PUBLIC_CLASSES)}\n"
        f"names: {PUBLIC_CLASSES}\n"
    )

    total = sum(len(ds[s]) for s in split_map)
    print(f"\nDataset ready at {output}/")
    print(f"  Total: {total} images, {len(PUBLIC_CLASSES)} classes")
    print(f"  Classes: {PUBLIC_CLASSES}")
    print(f"  Config:  {yaml_path}")


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Download public CoC YOLO dataset")
    parser.add_argument("--output", default="datasets/public",
                        help="Output directory (default: datasets/public)")
    args = parser.parse_args()
    download_and_convert(args.output)
  • Step 2: Run the download
.venv/bin/python training/download_dataset.py

Expected output (takes 2-5 minutes depending on connection):

Downloading from HuggingFace (keremberke/clash-of-clans-object-detection)...
  Converting train (88 images)...
  Converting validation (24 images)...
  Converting test (13 images)...

Dataset ready at datasets/public/
  Total: 125 images, 16 classes
  Classes: ['ad', 'airsweeper', ...]
  Config:  datasets/public/dataset.yaml
  • Step 3: Verify output structure
ls datasets/public/train/images/ | head -5
ls datasets/public/train/labels/ | head -5
head -3 datasets/public/train/labels/00000.txt

Expected: image files 00000.jpg..., label files 00000.txt..., each label line like 3 0.512345 0.234567 0.123456 0.098765

  • Step 4: Commit
git add training/download_dataset.py
git commit -m "feat: add training/download_dataset.py for public CoC HuggingFace dataset"

Task 5: Training script + train baseline model

Files:

  • Create: training/train.py

  • Step 1: Create training/train.py

#!/usr/bin/env python3
"""
Fine-tune YOLOv11n on a Clash of Clans dataset.

Starts from the COCO-pretrained YOLOv11n weights, fine-tunes on the
provided dataset, and copies the best checkpoint to models/coc.pt.

Usage:
    # Baseline (public buildings only, 16 classes):
    python training/train.py --data datasets/public/dataset.yaml --epochs 50

    # Full model (all classes including UI buttons):
    python training/train.py --data datasets/full/dataset.yaml --epochs 100

    # Resume from last checkpoint:
    python training/train.py --data datasets/full/dataset.yaml --resume
"""

import argparse
import shutil
from pathlib import Path


def train(
    data_yaml: str,
    epochs: int = 50,
    model_size: str = "n",
    resume: bool = False,
    batch: int = 16,
    imgsz: int = 640,
) -> None:
    from ultralytics import YOLO

    last_ckpt = Path("runs/detect/coc/weights/last.pt")
    if resume and last_ckpt.exists():
        print(f"Resuming from {last_ckpt}")
        model = YOLO(str(last_ckpt))
    else:
        base = f"yolo11{model_size}.pt"
        print(f"Starting from {base} (COCO-pretrained)")
        model = YOLO(base)

    model.train(
        data=data_yaml,
        epochs=epochs,
        imgsz=imgsz,
        batch=batch,
        name="coc",
        project="runs/detect",
        patience=20,
        save=True,
        plots=True,
        exist_ok=True,
    )

    best = Path("runs/detect/coc/weights/best.pt")
    if best.exists():
        Path("models").mkdir(exist_ok=True)
        dest = Path("models/coc.pt")
        shutil.copy(best, dest)
        print(f"\nBest weights → {dest}")
    else:
        print("WARNING: best.pt not found — check runs/detect/coc/weights/")


if __name__ == "__main__":
    p = argparse.ArgumentParser()
    p.add_argument("--data", default="datasets/public/dataset.yaml")
    p.add_argument("--epochs", type=int, default=50)
    p.add_argument("--model", default="n", choices=["n", "s", "m"],
                   help="YOLOv11 size: n=nano, s=small, m=medium")
    p.add_argument("--batch", type=int, default=16)
    p.add_argument("--imgsz", type=int, default=640)
    p.add_argument("--resume", action="store_true")
    args = p.parse_args()
    train(args.data, args.epochs, args.model, args.resume, args.batch, args.imgsz)
  • Step 2: Train the baseline model (16 building classes)
.venv/bin/python training/train.py \
    --data datasets/public/dataset.yaml \
    --epochs 50 \
    --model n

Expected: Ultralytics training output with per-epoch mAP. Finishes with:

Best weights → models/coc.pt

Takes ~10-20 minutes on CPU, ~3 minutes on Apple Silicon MPS.

  • Step 3: Verify baseline model detects buildings
.venv/bin/python -c "
from bot.detector import Detector
import cv2, numpy as np

d = Detector('models/coc.pt')
# Test on a blank image — should return empty
blank = np.zeros((1440, 2560, 3), dtype=np.uint8)
dets = d.predict(blank)
print(f'Blank image: {len(dets)} detections (expected 0)')
print('Detector loaded OK')
"

Expected: Blank image: 0 detections (expected 0) and Detector loaded OK

  • Step 4: Commit training script
git add training/train.py
git commit -m "feat: add training/train.py for YOLOv11 fine-tuning"

Task 6: Data capture pipeline

Files:

  • Create: training/capture_frames.py

This script runs alongside the bot, saving screenshots every N seconds with metadata. The saved screenshots will be labeled in Roboflow (Task 7) to add UI button and screen state classes.

  • Step 1: Create training/capture_frames.py
#!/usr/bin/env python3
"""
Capture screenshots during live bot runs for building the YOLO training dataset.

Saves each screenshot as a JPG + JSON metadata file. Screenshots should then
be uploaded to Roboflow (free tier) for manual annotation of UI classes
not covered by the public building dataset.

Two ways to use:
  1. Standalone: run alongside the bot in a separate terminal
     python training/capture_frames.py --interval 5 --output datasets/captured

  2. Programmatic: call capture_one() from key bot events
     from training.capture_frames import capture_one
     capture_one("scouting")  # saves one screenshot with hint label

Output per screenshot:
  datasets/captured/20260327_143201_123456_scouting.jpg
  datasets/captured/20260327_143201_123456_scouting.json  ← state + hint
"""

import os
import json
import time
import threading
import argparse
from pathlib import Path
from datetime import datetime

_output_dir = Path("datasets/captured")
_stop_event = threading.Event()


def capture_one(hint: str = "auto", output_dir: Path | None = None) -> Path | None:
    """Save one screenshot + metadata. Returns the image path, or None on error."""
    try:
        from bot.screen import screenshot
        from bot.vision import detect_screen_state
        import cv2

        out = output_dir or _output_dir
        out.mkdir(parents=True, exist_ok=True)

        img = screenshot()
        state = str(detect_screen_state(img))
        ts = datetime.now().strftime("%Y%m%d_%H%M%S_%f")
        stem = f"{ts}_{hint}"

        img_path = out / f"{stem}.jpg"
        cv2.imwrite(str(img_path), img, [cv2.IMWRITE_JPEG_QUALITY, 95])

        meta = {"timestamp": ts, "state": state, "hint": hint}
        (out / f"{stem}.json").write_text(json.dumps(meta, indent=2))

        return img_path
    except Exception as e:
        print(f"[capture] Error: {e}")
        return None


def _capture_loop(interval: float, output_dir: Path) -> None:
    print(f"[capture] Saving to {output_dir}/ every {interval}s  (Ctrl+C to stop)")
    count = 0
    while not _stop_event.is_set():
        path = capture_one("auto", output_dir)
        if path:
            count += 1
            print(f"[capture] #{count}  {path.name}")
        time.sleep(interval)


def run_standalone(interval: float = 5.0, output_dir: str = "datasets/captured") -> None:
    """Run capture loop in the foreground until Ctrl+C."""
    _stop_event.clear()
    out = Path(output_dir)
    try:
        _capture_loop(interval, out)
    except KeyboardInterrupt:
        print(f"\n[capture] Stopped. {len(list(out.glob('*.jpg')))} screenshots saved.")


if __name__ == "__main__":
    p = argparse.ArgumentParser(description="Capture bot screenshots for YOLO labeling")
    p.add_argument("--interval", type=float, default=5.0,
                   help="Seconds between captures (default: 5)")
    p.add_argument("--output", default="datasets/captured",
                   help="Output directory (default: datasets/captured)")
    args = p.parse_args()
    run_standalone(args.interval, args.output)
  • Step 2: Verify the script runs without import errors
.venv/bin/python -c "import training.capture_frames; print('import OK')"

Expected: import OK (bot stream not started — that's fine, errors only on capture_one() calls).

  • Step 3: Run capture during a live bot session

With the bot running (./start.sh → click Start), open a second terminal and run:

.venv/bin/python training/capture_frames.py --interval 5 --output datasets/captured

Let it run for at least one full attack cycle (village → scouting → battle → results → village). Stop with Ctrl+C.

Expected: 30-100 screenshots saved to datasets/captured/, covering all game states.

  • Step 4: Verify captures have good state coverage
python -c "
import json
from pathlib import Path
from collections import Counter

meta_files = list(Path('datasets/captured').glob('*.json'))
states = Counter(json.loads(f.read_text())['state'] for f in meta_files)
for state, count in sorted(states.items()):
    print(f'  {state}: {count}')
print(f'Total: {sum(states.values())}')
"

Expected: coverage across GameState.VILLAGE, GameState.SCOUTING, GameState.BATTLE_ACTIVE, GameState.RESULTS. Aim for at least 20 images per state before proceeding.

  • Step 5: Commit
git add training/capture_frames.py
git commit -m "feat: add training/capture_frames.py for live screenshot collection"

Task 7: Label UI classes in Roboflow + export

Files: None (manual labeling step)

The public dataset covers buildings but has no UI buttons or screen state indicators. This task labels those classes on the captured screenshots.

Classes to label (18 new classes, on top of the 16 public building classes):

Class What it looks like
btn_attack Green "Attack!" sword button (village screen, bottom-left)
btn_find_match "Find a Match" button (attack menu)
btn_start_battle Green "Battle!" button (army screen, bottom-right)
btn_next_base "Next" button with gold coin cost (scouting screen, bottom-right)
btn_return_home "Return Home" button (results screen)
btn_end_battle Red "End Battle" / "Surrender" button
btn_confirm Confirmation checkmark button (upgrade dialogs)
btn_close X close button (popups)
btn_okay "OK" button (popups)
btn_later "Later" button (popups)
hud_village The village resource bar (top of village screen — gold + elixir HUD)
hud_scouting The scouting HUD overlay (loot values visible in top-left)
hud_results Stars/results screen overlay
hud_army The army selection screen HUD
loot_gold Gold icon + value in scouting screen (top-left)
loot_elixir Elixir icon + value in scouting screen (top-left)
loot_gem Green gem cost indicator (upgrade confirmations)
troop_slot Individual troop icon in the bottom deployment bar

Steps:

  • Step 1: Upload captured screenshots to Roboflow
  1. Go to roboflow.com → Create free account
  2. Create new project: "CoC-UI-Classes", type "Object Detection"
  3. Upload all screenshots from datasets/captured/*.jpg
  • Step 2: Add the 18 class labels in Roboflow

In Roboflow project settings → Classes, add each class from the table above.

  • Step 3: Label at least 15 examples of each button class

Use Roboflow's annotation tool. Draw tight bounding boxes around each UI element.

Tips:

  • btn_attack appears on every village screenshot

  • btn_next_base appears on every scouting screenshot — label it in every scouting image

  • hud_results, btn_return_home appear together on results screens

  • For troop_slot, label each visible troop icon in the bottom bar (there will be 4-8 per battle screenshot)

  • Skip hud_village / hud_scouting / hud_army if they're ambiguous — prioritize the button classes

  • Step 4: Export in YOLOv11 format

In Roboflow: Generate → Export → Format: "YOLOv11" → Download ZIP.

Extract to datasets/labeled/:

mkdir -p datasets/labeled
unzip ~/Downloads/CoC-UI-Classes.v1.yolov11.zip -d datasets/labeled/

Verify structure:

ls datasets/labeled/
# Expected: train/ valid/ test/ data.yaml
  • Step 5: No commit needed (datasets/ is gitignored)

Task 8: Merge datasets + train full model

Files:

  • Create: training/merge_datasets.py

  • Step 1: Create training/merge_datasets.py

#!/usr/bin/env python3
"""
Merge the public building dataset (16 classes) with the manually labeled
UI dataset (18 classes) into a unified dataset for the full model.

The public dataset uses class IDs 0-15.
The labeled dataset uses class IDs 0-17 (Roboflow resets to 0).
This script remaps labeled IDs → 16-33 so they don't conflict.

Usage:
    python training/merge_datasets.py \
        --public datasets/public \
        --labeled datasets/labeled \
        --output datasets/full
"""

import shutil
import argparse
from pathlib import Path


PUBLIC_CLASSES = [
    "ad", "airsweeper", "bombtower", "canon", "clancastle",
    "eagle", "inferno", "kingpad", "mortar", "queenpad",
    "rcpad", "scattershot", "th13", "wardenpad", "wizztower", "xbow",
]

# Must match the class order used in Roboflow when labeling.
# Check datasets/labeled/data.yaml for the exact order Roboflow assigned.
LABELED_CLASSES = [
    "btn_attack", "btn_find_match", "btn_start_battle", "btn_next_base",
    "btn_return_home", "btn_end_battle", "btn_confirm", "btn_close",
    "btn_okay", "btn_later", "hud_village", "hud_scouting", "hud_results",
    "hud_army", "loot_gold", "loot_elixir", "loot_gem", "troop_slot",
]

ALL_CLASSES = PUBLIC_CLASSES + LABELED_CLASSES


def _copy_split(src_img: Path, src_lbl: Path, dst_img: Path, dst_lbl: Path,
                prefix: str, class_offset: int) -> int:
    """Copy images and labels (with class ID remapping) to destination."""
    if not src_img.exists():
        return 0
    count = 0
    for img_path in sorted(src_img.glob("*.jpg")):
        shutil.copy(img_path, dst_img / f"{prefix}_{img_path.name}")
        lbl_path = src_lbl / img_path.with_suffix(".txt").name
        dst_lbl_path = dst_lbl / f"{prefix}_{lbl_path.name}"
        if lbl_path.exists():
            lines = [l for l in lbl_path.read_text().strip().split("\n") if l.strip()]
            remapped = []
            for line in lines:
                parts = line.split()
                parts[0] = str(int(parts[0]) + class_offset)
                remapped.append(" ".join(parts))
            dst_lbl_path.write_text("\n".join(remapped))
        else:
            dst_lbl_path.write_text("")
        count += 1
    return count


def merge(public_dir: str, labeled_dir: str, output_dir: str) -> None:
    pub = Path(public_dir)
    lab = Path(labeled_dir)
    out = Path(output_dir)

    for split in ("train", "val"):
        (out / split / "images").mkdir(parents=True, exist_ok=True)
        (out / split / "labels").mkdir(parents=True, exist_ok=True)

    total = 0

    # Public dataset: class IDs 0-15, no remapping needed
    # train split + test split both go to train (small dataset — maximize training data)
    for src_split, dst_split in [("train", "train"), ("test", "train")]:
        n = _copy_split(
            pub / src_split / "images", pub / src_split / "labels",
            out / dst_split / "images", out / dst_split / "labels",
            prefix=f"pub_{src_split}", class_offset=0,
        )
        total += n
    n = _copy_split(
        pub / "validation" / "images", pub / "validation" / "labels",
        out / "val" / "images", out / "val" / "labels",
        prefix="pub_val", class_offset=0,
    )
    total += n

    # Labeled dataset: Roboflow class IDs start at 0, remap to 16+
    labeled_offset = len(PUBLIC_CLASSES)
    for src_split, dst_split in [("train", "train"), ("test", "train")]:
        n = _copy_split(
            lab / src_split / "images", lab / src_split / "labels",
            out / dst_split / "images", out / dst_split / "labels",
            prefix=f"lab_{src_split}", class_offset=labeled_offset,
        )
        total += n
    for src_split in ("valid", "validation"):
        src_img = lab / src_split / "images"
        if src_img.exists():
            n = _copy_split(
                src_img, lab / src_split / "labels",
                out / "val" / "images", out / "val" / "labels",
                prefix="lab_val", class_offset=labeled_offset,
            )
            total += n
            break

    # Write unified dataset.yaml
    yaml_path = out / "dataset.yaml"
    yaml_path.write_text(
        f"# Full CoC dataset: public buildings (16 classes) + labeled UI (18 classes)\n\n"
        f"path: {out.resolve()}\n"
        f"train: train/images\n"
        f"val: val/images\n\n"
        f"nc: {len(ALL_CLASSES)}\n"
        f"names: {ALL_CLASSES}\n"
    )

    print(f"Merged {total} images → {out}/")
    print(f"Classes ({len(ALL_CLASSES)}): {ALL_CLASSES[:5]}... +{len(ALL_CLASSES) - 5} more")
    print(f"Config: {yaml_path}")


if __name__ == "__main__":
    p = argparse.ArgumentParser()
    p.add_argument("--public", default="datasets/public")
    p.add_argument("--labeled", default="datasets/labeled")
    p.add_argument("--output", default="datasets/full")
    args = p.parse_args()
    merge(args.public, args.labeled, args.output)
  • Step 2: Run the merge
.venv/bin/python training/merge_datasets.py \
    --public datasets/public \
    --labeled datasets/labeled \
    --output datasets/full

Expected:

Merged ~300 images → datasets/full/
Classes (34): ['ad', 'airsweeper', ...]... +29 more
Config: datasets/full/dataset.yaml
  • Step 3: Verify class IDs in merged labels
# Public image should have IDs 0-15
head -3 datasets/full/train/labels/pub_train_00000.txt

# Labeled image should have IDs 16-33
head -3 datasets/full/train/labels/lab_train_*.txt | head -5
  • Step 4: Train the full model
.venv/bin/python training/train.py \
    --data datasets/full/dataset.yaml \
    --epochs 100 \
    --model n

Expected output ends with:

Best weights → models/coc.pt

Takes ~30-60 minutes on CPU, ~10 minutes on Apple Silicon MPS.

  • Step 5: Verify the full model detects buttons
.venv/bin/python -c "
from bot.detector import Detector
# Just verify the model loads and reports 34 classes
d = Detector('models/coc.pt')
print(f'Classes: {len(d._model.names)}')
print('Full model OK')
"

Expected: Classes: 34 and Full model OK

  • Step 6: Commit training scripts
git add training/merge_datasets.py
git commit -m "feat: add training/merge_datasets.py to unify public + labeled UI datasets"

Task 9: Rewrite bot/vision.py internals

Files:

  • Modify: bot/vision.py (full rewrite — same public API, YOLO internals)
  • Test: tests/test_vision_yolo.py

The public API that must remain unchanged:

  • find_button(img, button_name) → tuple[int,int] | None

  • detect_screen_state(img) → GameState

  • read_enemy_loot(img) → tuple[int,int]

  • read_resources_from_village(img) → tuple[int,int]

  • get_deploy_corner(img) → list[tuple[int,int]]

  • get_troop_slots(img) → list[tuple[int,int]]

  • find_popup(img) → tuple[int,int] | None

  • validate_critical_templates() → None

  • auto_capture_template(img, button_name) → bool

  • Step 1: Write failing tests for the new vision.py API

Create tests/test_vision_yolo.py:

"""
Tests for the YOLO-backed bot/vision.py public API.
All tests mock bot.vision._get_detector so no model file is needed.
"""
import numpy as np
import pytest
from unittest.mock import MagicMock, patch
from bot.state_machine import GameState


def _make_frame(h=1440, w=2560):
    return np.zeros((h, w, 3), dtype=np.uint8)


def _mock_detector(detected_classes: list[str], center: tuple[int, int] = (150, 250)):
    """Return a mock Detector whose predict() returns one detection per class."""
    from bot.detector import Detection
    x1, y1 = center[0] - 50, center[1] - 25
    x2, y2 = center[0] + 50, center[1] + 25
    dets = [Detection(cls=c, conf=0.9, x1=x1, y1=y1, x2=x2, y2=y2)
            for c in detected_classes]

    mock = MagicMock()
    mock.predict.return_value = dets
    mock.find.side_effect = lambda frame, cls: next(
        (d for d in dets if d.cls == cls), None)
    mock.find_any.side_effect = lambda frame, *classes: (
        max((d for d in dets if d.cls in classes), key=lambda d: d.conf, default=None))
    mock.find_all.side_effect = lambda frame, cls: [d for d in dets if d.cls == cls]
    return mock


class TestDetectScreenState:
    def _check(self, classes, expected_state):
        from bot.vision import detect_screen_state
        with patch("bot.vision._get_detector", return_value=_mock_detector(classes)):
            assert detect_screen_state(_make_frame()) == expected_state

    def test_results_hud(self):
        self._check(["hud_results"], GameState.RESULTS)

    def test_results_return_home_button(self):
        self._check(["btn_return_home"], GameState.RESULTS)

    def test_scouting_next_base(self):
        self._check(["btn_next_base"], GameState.SCOUTING)

    def test_scouting_end_battle_plus_loot(self):
        self._check(["btn_end_battle", "loot_gold"], GameState.SCOUTING)

    def test_scouting_end_battle_plus_elixir(self):
        self._check(["btn_end_battle", "loot_elixir"], GameState.SCOUTING)

    def test_battle_active_end_battle_no_loot(self):
        self._check(["btn_end_battle"], GameState.BATTLE_ACTIVE)

    def test_army_start_battle(self):
        self._check(["btn_start_battle"], GameState.ARMY)

    def test_attack_menu_find_match(self):
        self._check(["btn_find_match"], GameState.ATTACK_MENU)

    def test_village_attack_button(self):
        self._check(["btn_attack"], GameState.VILLAGE)

    def test_village_hud(self):
        self._check(["hud_village"], GameState.VILLAGE)

    def test_unknown_no_detections(self):
        self._check([], GameState.UNKNOWN)

    def test_results_takes_priority_over_end_battle(self):
        # If both hud_results and btn_end_battle visible, should be RESULTS
        self._check(["hud_results", "btn_end_battle"], GameState.RESULTS)


class TestFindButton:
    def test_returns_center_for_known_button(self):
        from bot.vision import find_button
        mock_d = _mock_detector(["btn_attack"], center=(300, 400))
        with patch("bot.vision._get_detector", return_value=mock_d):
            result = find_button(_make_frame(), "attack_button")
        assert result == (300, 400)

    def test_returns_none_when_button_not_detected(self):
        from bot.vision import find_button
        mock_d = _mock_detector([])
        with patch("bot.vision._get_detector", return_value=mock_d):
            result = find_button(_make_frame(), "attack_button")
        assert result is None

    def test_returns_none_for_unknown_button_name(self):
        from bot.vision import find_button
        mock_d = _mock_detector(["btn_attack"])
        with patch("bot.vision._get_detector", return_value=mock_d):
            result = find_button(_make_frame(), "totally_unknown_button")
        assert result is None

    def test_next_base_maps_to_btn_next_base(self):
        from bot.vision import find_button
        mock_d = _mock_detector(["btn_next_base"], center=(2300, 1200))
        with patch("bot.vision._get_detector", return_value=mock_d):
            result = find_button(_make_frame(), "next_base")
        assert result == (2300, 1200)


class TestFindPopup:
    def test_finds_close_x(self):
        from bot.vision import find_popup
        mock_d = _mock_detector(["btn_close"], center=(100, 100))
        with patch("bot.vision._get_detector", return_value=mock_d):
            result = find_popup(_make_frame())
        assert result == (100, 100)

    def test_finds_okay(self):
        from bot.vision import find_popup
        mock_d = _mock_detector(["btn_okay"], center=(200, 300))
        with patch("bot.vision._get_detector", return_value=mock_d):
            result = find_popup(_make_frame())
        assert result == (200, 300)

    def test_returns_none_when_no_popup(self):
        from bot.vision import find_popup
        mock_d = _mock_detector([])
        with patch("bot.vision._get_detector", return_value=mock_d):
            result = find_popup(_make_frame())
        assert result is None


class TestGetTroopSlots:
    def test_returns_centers_sorted_by_x(self):
        from bot.vision import get_troop_slots
        from bot.detector import Detection
        dets = [
            Detection("troop_slot", 0.9, 200, 1300, 250, 1350),
            Detection("troop_slot", 0.85, 100, 1300, 150, 1350),
            Detection("troop_slot", 0.8, 300, 1300, 350, 1350),
        ]
        mock_d = MagicMock()
        mock_d.find_all.return_value = dets
        with patch("bot.vision._get_detector", return_value=mock_d):
            slots = get_troop_slots(_make_frame())
        xs = [s[0] for s in slots]
        assert xs == sorted(xs)
        assert len(slots) == 3

    def test_returns_empty_when_no_slots(self):
        from bot.vision import get_troop_slots
        mock_d = MagicMock()
        mock_d.find_all.return_value = []
        with patch("bot.vision._get_detector", return_value=mock_d):
            assert get_troop_slots(_make_frame()) == []


class TestGetDeployCorner:
    def test_returns_correct_number_of_points(self):
        from bot.vision import get_deploy_corner
        import bot.config as config
        points = get_deploy_corner(_make_frame())
        assert len(points) == config.DEPLOY_NUM_POINTS

    def test_points_are_within_frame_bounds(self):
        from bot.vision import get_deploy_corner
        frame = _make_frame(h=1440, w=2560)
        for x, y in get_deploy_corner(frame):
            assert 0 <= x <= 2560
            assert 0 <= y <= 1440


class TestValidateCriticalTemplates:
    def test_passes_when_detector_loads(self):
        from bot.vision import validate_critical_templates
        mock_d = MagicMock()
        with patch("bot.vision._get_detector", return_value=mock_d):
            validate_critical_templates()  # should not raise

    def test_raises_when_detector_fails(self):
        from bot.vision import validate_critical_templates
        with patch("bot.vision._get_detector", side_effect=FileNotFoundError("model missing")):
            with pytest.raises(FileNotFoundError, match="model"):
                validate_critical_templates()


class TestAutoCaptureLegacy:
    def test_returns_false_noop(self):
        from bot.vision import auto_capture_template
        frame = _make_frame()
        result = auto_capture_template(frame, "next_base")
        assert result is False
  • Step 2: Run tests to verify they fail
.venv/bin/python -m pytest tests/test_vision_yolo.py -v 2>&1 | head -40

Expected: most tests fail because bot.vision._get_detector doesn't exist yet.

  • Step 3: Rewrite bot/vision.py

Replace the entire file with:

"""
Vision module — YOLO-driven detection for all game screens.

Replaces OpenCV template matching with YOLOv11 inference.
Public API is unchanged — all callers (battle.py, main.py, screen.py)
require no modifications.

Digit OCR (read_enemy_loot, read_resources_from_village) is retained
unchanged as it is effective for reading game font digits.
"""

import cv2
import numpy as np
import logging
import threading
import pytesseract
from bot.state_machine import GameState

logger = logging.getLogger("coc.vision")

from bot.config import (
    TEMPLATE_THRESHOLD,
    GOLD_REGION, ELIXIR_REGION,
    ENEMY_LOOT_X_RANGE, ENEMY_LOOT_Y_RANGE, ENEMY_LOOT_STRIP_HEIGHT,
    ENEMY_LOOT_Y_STEP, ENEMY_LOOT_Y_DEDUP, ENEMY_LOOT_SCALES,
    DEPLOY_NUM_POINTS, DEPLOY_X_START, DEPLOY_X_END, DEPLOY_Y_START, DEPLOY_Y_END,
)


# ─── YOLO DETECTOR (lazy singleton) ───────────────────────────

_detector = None
_detector_lock = threading.Lock()

# Maps legacy button names used by battle.py / main.py → YOLO class names
_BUTTON_CLASS_MAP = {
    "attack_button":   "btn_attack",
    "find_match":      "btn_find_match",
    "start_battle":    "btn_start_battle",
    "stars_screen":    "hud_results",
    "return_home":     "btn_return_home",
    "next_base":       "btn_next_base",
    "end_battle":      "btn_end_battle",
    "confirm_upgrade": "btn_confirm",
    "gem_cost":        "loot_gem",
    "close_x":         "btn_close",
    "okay_button":     "btn_okay",
    "later_button":    "btn_later",
    "loot_gold":       "loot_gold",
    "loot_elixir":     "loot_elixir",
}


def _get_detector():
    """Return the lazy-loaded Detector singleton (double-checked locking)."""
    global _detector
    if _detector is not None:
        return _detector
    with _detector_lock:
        if _detector is not None:
            return _detector
        from bot.settings import Settings
        settings = Settings()
        model_path = settings.get("yolo_model_path", "models/coc.pt")
        confidence = settings.get("yolo_confidence_threshold", 0.45)
        from bot.detector import Detector
        _detector = Detector(model_path, confidence)
    return _detector


# ─── SCREEN STATE DETECTION ───────────────────────────────────

def detect_screen_state(img):
    """
    Determine which screen is currently visible.
    Returns a GameState enum. Enum values are backward-compatible strings.
    """
    dets = _get_detector().predict(img)
    cls_set = {d.cls for d in dets}

    # Results screen overlays everything — check first
    if "hud_results" in cls_set or "btn_return_home" in cls_set:
        return GameState.RESULTS

    # Next base button = scouting an enemy base
    if "btn_next_base" in cls_set:
        return GameState.SCOUTING

    # End battle + loot = scouting; end battle alone = in-battle
    if "btn_end_battle" in cls_set:
        if "loot_gold" in cls_set or "loot_elixir" in cls_set:
            return GameState.SCOUTING
        return GameState.BATTLE_ACTIVE

    # Army selection screen
    if "btn_start_battle" in cls_set:
        return GameState.ARMY

    # Attack menu
    if "btn_find_match" in cls_set:
        return GameState.ATTACK_MENU

    # Village
    if "btn_attack" in cls_set or "hud_village" in cls_set:
        return GameState.VILLAGE

    return GameState.UNKNOWN


# ─── BUTTON FINDING ───────────────────────────────────────────

def find_button(img, button_name):
    """Find a button by legacy name. Returns (x, y) center or None."""
    yolo_cls = _BUTTON_CLASS_MAP.get(button_name)
    if yolo_cls is None:
        logger.warning("find_button: unknown button name '%s'", button_name)
        return None
    det = _get_detector().find(img, yolo_cls)
    return det.center if det else None


def find_popup(img):
    """Detect any dismissable popup. Returns (x, y) to tap, or None."""
    for cls in ("btn_close", "btn_okay", "btn_later"):
        det = _get_detector().find(img, cls)
        if det:
            return det.center
    return None


# ─── DEPLOYMENT ───────────────────────────────────────────────

def get_deploy_corner(img):
    """Return a list of (x, y) deployment points along the top-left battle edge."""
    h, w = img.shape[:2]
    points = []
    for i in range(DEPLOY_NUM_POINTS):
        t = i / (DEPLOY_NUM_POINTS - 1)
        x = int(w * DEPLOY_X_START + t * (w * (DEPLOY_X_END - DEPLOY_X_START)))
        y = int(h * DEPLOY_Y_START - t * (h * (DEPLOY_Y_START - DEPLOY_Y_END)))
        points.append((x, y))
    return points


def get_troop_slots(img):
    """Detect troop slot icons in the bottom deployment bar. Returns [(x,y)] sorted by x."""
    dets = _get_detector().find_all(img, "troop_slot")
    slots = [d.center for d in dets]
    slots.sort(key=lambda s: s[0])
    return slots


# ─── DIGIT OCR (unchanged — for loot and resource reading) ────

_DIGIT_TEMPLATES = None


def _load_digit_templates():
    """Load digit templates 0-9 from templates/digits/ as grayscale."""
    global _DIGIT_TEMPLATES
    from bot.settings import BASE_WIDTH, BASE_HEIGHT
    from bot.config import SCREEN_WIDTH, SCREEN_HEIGHT
    from bot.utils import load_template

    rx = SCREEN_WIDTH / BASE_WIDTH
    ry = SCREEN_HEIGHT / BASE_HEIGHT
    need_scale = (rx != 1.0 or ry != 1.0)

    _DIGIT_TEMPLATES = {}
    for d in range(10):
        t = load_template(f"templates/digits/{d}.png")
        if t is not None:
            if need_scale:
                h, w = t.shape[:2]
                t = cv2.resize(t, (max(1, int(w * rx)), max(1, int(h * ry))),
                               interpolation=cv2.INTER_AREA)
            _DIGIT_TEMPLATES[d] = cv2.cvtColor(t, cv2.COLOR_BGR2GRAY)
    if _DIGIT_TEMPLATES:
        logger.info("Loaded digit templates: %s", sorted(_DIGIT_TEMPLATES.keys()))


def _read_number_template(crop, extra_scales=None, is_gray=False):
    """Read a number from a cropped image using digit template matching."""
    global _DIGIT_TEMPLATES
    if _DIGIT_TEMPLATES is None:
        _load_digit_templates()
    if not _DIGIT_TEMPLATES:
        return _ocr_number(crop)

    gray = crop if is_gray else (
        cv2.cvtColor(crop, cv2.COLOR_BGR2GRAY) if len(crop.shape) == 3 else crop
    )

    all_matches = []
    for digit, template in _DIGIT_TEMPLATES.items():
        for scale in (extra_scales or [0.9, 1.0, 1.1]):
            th, tw = template.shape[:2]
            sh, sw = int(th * scale), int(tw * scale)
            if sh > gray.shape[0] or sw > gray.shape[1]:
                continue
            scaled = cv2.resize(template, (sw, sh))
            result = cv2.matchTemplate(gray, scaled, cv2.TM_CCOEFF_NORMED)
            for y, x in zip(*np.where(result >= TEMPLATE_THRESHOLD)):
                all_matches.append((x, digit, result[y, x], sw))

    all_matches.sort(key=lambda m: -m[2])
    digits_found = []
    for x, digit, conf, sw in all_matches:
        if not any(abs(x - ex) < sw * 0.4 for ex, _, _ in digits_found):
            digits_found.append((x, digit, conf))

    digits_found.sort(key=lambda d: d[0])
    number_str = "".join(str(d) for _, d, _ in digits_found)
    return int(number_str) if number_str else None


def _ocr_number(crop):
    """Fallback: OCR a crop to extract a number."""
    if crop.size == 0:
        return 0
    gray = cv2.cvtColor(crop, cv2.COLOR_BGR2GRAY) if len(crop.shape) == 3 else crop
    big = cv2.resize(gray, None, fx=4, fy=4, interpolation=cv2.INTER_CUBIC)
    _, thresh = cv2.threshold(big, 180, 255, cv2.THRESH_BINARY)
    thresh = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, np.ones((3, 3), np.uint8))
    padded = cv2.copyMakeBorder(thresh, 30, 30, 30, 30, cv2.BORDER_CONSTANT, value=0)
    text = pytesseract.image_to_string(
        padded, config="--psm 7 -c tessedit_char_whitelist=0123456789"
    ).strip().replace(" ", "").replace(",", "").replace(".", "")
    return int(text) if text else None


# ─── RESOURCE AND LOOT READING ───────────────────────────────

def read_resources_from_village(img):
    """Read gold and elixir from the village screen HUD (top-right corner)."""
    gx1, gy1, gx2, gy2 = GOLD_REGION
    gold = _read_number_template(img[gy1:gy2, gx1:gx2]) or 0

    ex1, ey1, ex2, ey2 = ELIXIR_REGION
    elixir = _read_number_template(img[ey1:ey2, ex1:ex2]) or 0

    return gold, elixir


def read_enemy_loot(img):
    """
    Read enemy gold and elixir from the scouting screen (top-left area).
    Scans horizontal strips to find digit rows. Early-exits after 2 numbers.
    Returns (gold, elixir).
    """
    loot_x1, loot_x2 = ENEMY_LOOT_X_RANGE
    loot_y1, loot_y2 = ENEMY_LOOT_Y_RANGE

    loot_area = img[loot_y1:loot_y2, loot_x1:loot_x2]
    loot_gray = (cv2.cvtColor(loot_area, cv2.COLOR_BGR2GRAY)
                 if len(loot_area.shape) == 3 else loot_area)

    numbers_found = []
    for y_offset in range(0, loot_y2 - loot_y1, ENEMY_LOOT_Y_STEP):
        strip = loot_gray[y_offset:y_offset + ENEMY_LOOT_STRIP_HEIGHT, :]
        if strip.size == 0:
            continue
        value = _read_number_template(strip, extra_scales=ENEMY_LOOT_SCALES, is_gray=True)
        if value is not None and value > 1000:
            abs_y = y_offset + loot_y1
            if not any(abs(abs_y - py) < ENEMY_LOOT_Y_DEDUP for py, _ in numbers_found):
                numbers_found.append((abs_y, value))
                if len(numbers_found) >= 2:
                    break

    gold = numbers_found[0][1] if numbers_found else 0
    elixir = numbers_found[1][1] if len(numbers_found) > 1 else 0
    return gold, elixir


# ─── VALIDATION ──────────────────────────────────────────────

def validate_critical_templates():
    """Verify the YOLO model is present and loads correctly. Raises on failure."""
    try:
        _get_detector()
        logger.info("YOLO detector validated successfully")
    except Exception as e:
        raise FileNotFoundError(
            f"YOLO model failed to load: {e}. "
            f"Train the model first: python training/train.py "
            f"--data datasets/full/dataset.yaml"
        ) from e


def auto_capture_template(img, button_name):
    """Legacy no-op: template auto-capture is not needed with YOLO detection."""
    return False
  • Step 4: Run new vision tests
.venv/bin/python -m pytest tests/test_vision_yolo.py -v

Expected: all tests pass.

  • Step 5: Verify existing offline tests still pass (flow tests)

The flow tests in tests/test_offline.py mock bot.battle.read_enemy_loot etc. — they should still pass since the public API is unchanged.

.venv/bin/python tests/test_offline.py flow

Expected:

PASS: scout_and_decide returned (True, ...) for high loot
PASS: deploy_troops() was called
...
  • Step 6: Commit
git add bot/vision.py tests/test_vision_yolo.py
git commit -m "feat: rewrite bot/vision.py internals to use YOLOv11 detector"

Task 10: Run all tests + integration smoke test

Files: None (verification only)

  • Step 1: Run the full pytest suite
.venv/bin/python -m pytest tests/test_detector.py tests/test_vision_yolo.py tests/test_stream_unit.py -v

Expected: all tests pass.

  • Step 2: Verify syntax of all modified files
.venv/bin/python -c "
import ast
for f in ['bot/detector.py', 'bot/vision.py', 'bot/settings.py',
          'training/download_dataset.py', 'training/train.py',
          'training/capture_frames.py', 'training/merge_datasets.py']:
    ast.parse(open(f).read())
    print(f'OK  {f}')
"

Expected: OK for all files.

  • Step 3: Smoke test with trained model

With BlueStacks running CoC on the village screen:

.venv/bin/python -c "
from bot.screen import init_stream, shutdown_stream, screenshot
from bot.vision import detect_screen_state, find_button

init_stream()
import time; time.sleep(2)  # let stream warm up

img = screenshot()
state = detect_screen_state(img)
print(f'Screen state: {state}')

btn = find_button(img, 'attack_button')
print(f'Attack button: {btn}')

shutdown_stream()
"

Expected:

Screen state: GameState.VILLAGE
Attack button: (<x>, <y>)  ← actual coordinates
  • Step 4: Final commit
git add requirements.txt .gitignore bot/settings.py
git commit -m "feat: Phase 2a complete — YOLOv11 replaces template matching in bot/vision.py

- bot/detector.py: new Detector class wrapping YOLOv11 inference
- bot/vision.py: full rewrite using YOLO for state/button/troop detection;
  digit OCR retained for loot and resource reading
- training/: download, capture, merge, train pipeline scripts
- Public API of bot/vision.py is unchanged — no callers modified

Closes #1"

Self-Review

Spec coverage check

Requirement from issue #1 Covered by
bot/detector.py with Detection + Detector Task 3
Download public HuggingFace dataset Task 4
Training script Task 5
Baseline model trained Task 5
Data capture during live runs Task 6
Label UI/button classes Task 7
Merge pipeline Task 8
Fine-tune full model Task 8
Rewrite bot/vision.py internals Task 9
Public API unchanged Task 9 (verified in step 5)
validate_critical_templates() updated Task 9 (validates model, not templates)
auto_capture_template() made no-op Task 9
Settings yolo_model_path, yolo_confidence_threshold Task 2
requirements.txt updated Task 1
.gitignore updated Task 1
Unit tests for Detector Task 3
Unit tests for new vision.py Task 9
Integration smoke test Task 10
Inference ≤ 50ms on Apple Silicon Verified by YOLOv11n benchmarks (≈10ms on MPS)

No placeholders found ✓

Type consistency check ✓

  • Detection defined in Task 3, used identically in Tasks 3 and 9
  • Detector.predict()list[Detection] used correctly throughout
  • detect_screen_state()GameState consistent with callers
  • find_button()tuple[int,int] | None consistent with battle.py callers

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions