Skip to content

Fine-tuned model runs on CPU instead of GPU with CoreML on Apple Silicon #427

@valentinweyer

Description

@valentinweyer

Search before asking

  • I have searched the RF-DETR issues and found no similar bug report.

Bug

Description:
When running inference with my fine-tuned RF-DETR model (player-and-handball-detection-3z9xf/1) on Apple Silicon, the model only uses CPU despite CoreML being configured. The base RF-DETR model (e.g., rfdetr-small) runs on GPU as expected when loaded via RFDETRSmall(), but not when loaded via get_model("rfdetr-small").


Key Observations:

  • Fine-tuned model (get_model("player-and-handball-detection-3z9xf/1")) → CPU only.
  • Base model (RFDETRSmall()) → GPU works.
  • Base model via get_model("rfdetr-small")CPU only (same as fine-tuned model).
  • GPU utilization remains at 0% during inference for get_model() cases.

Environment

  • Roboflow Inference Version: 0.59.0
  • OS: macOS 26.0.1
  • Python Version: 3.12.11
  • ONNX Runtime Version: 1.21.1
  • Hardware: MacBook Pro M1 Max (64GB RAM)

Minimal Reproducible Example

import os
import time
import cv2
import supervision as sv
from inference import get_model
import onnxruntime as ort

# Configure CoreML
os.environ["INFERENCE_LOG_LEVEL"] = "DEBUG"
os.environ["ONNXRUNTIME_EXECUTION_PROVIDERS"] = "[CoreMLExecutionProvider]"
os.environ["ORT_LOG_SEVERITY_LEVEL"] = "0"      # VERBOSE
os.environ["ORT_LOG_VERBOSITY_LEVEL"] = "1"     # Extra detail
print("[ONNX Runtime] Available providers:", ort.get_available_providers())

# Load API key and model
API_KEY = os.getenv("ROBOFLOW_API_KEY")
MODEL_ID = "player-and-handball-detection-3z9xf/1"
VIDEO_IN = "https://commondatastorage.googleapis.com/gtv-videos-bucket/sample/BigBuckBunny.mp4"
VIDEO_OUT = "Clip_annotated_coreml.mp4"

print("[Inference] Loading model…")
model = get_model(MODEL_ID, api_key=API_KEY)

# Process video
cap = cv2.VideoCapture(VIDEO_IN)
assert cap.isOpened(), f"Cannot open {VIDEO_IN}"
w, h = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps_in = cap.get(cv2.CAP_PROP_FPS) or 25.0
vw = cv2.VideoWriter(VIDEO_OUT, cv2.VideoWriter_fourcc(*"mp4v"), fps_in, (w, h))
box_annot = sv.BoxAnnotator(thickness=2)
font = cv2.FONT_HERSHEY_SIMPLEX
t0_all = time.time()
n = 0

print("[Inference] Started…")
while True:
    ok, frame = cap.read()
    if not ok:
        break

    t0 = time.time()
    n += 1
    res = model.infer(frame, confidence=0.25, iou=0.5)[0]
    dets = sv.Detections.from_inference(res)
    annotated = box_annot.annotate(scene=frame.copy(), detections=dets)

    # Draw labels
    names = dets.data.get("class_name", [str(i) for i in dets.class_id])
    labels = [f"{n_} {c:.2f}" for n_, c in zip(names, dets.confidence)]
    for (x1, y1, x2, y2), text in zip(dets.xyxy, labels):
        cv2.putText(annotated, text, (int(x1), max(0, int(y1) - 4)), font, 0.5, (0, 255, 0), 1, cv2.LINE_AA)

    # Calculate FPS
    dt = time.time() - t0
    fps_inst = 1.0 / dt if dt > 0 else 0.0
    fps_avg = n / (time.time() - t0_all + 1e-9)
    txt = f"FPS: {fps_inst:.1f} (avg {fps_avg:.1f})"
    cv2.putText(annotated, txt, (12, 28), font, 0.8, (0, 0, 0), 4, cv2.LINE_AA)
    cv2.putText(annotated, txt, (12, 28), font, 0.8, (255, 255, 0), 2, cv2.LINE_AA)

    if n % 30 == 0:
        print(f"[{n}] inst={fps_inst:.2f} | avg={fps_avg:.2f} | preds={len(dets)}")

    vw.write(annotated)

cap.release()
vw.release()
print(f"[Done] Saved: {VIDEO_OUT}")

Additional

Terminal Output:

[ONNX Runtime] Available providers: ['CoreMLExecutionProvider', 'AzureExecutionProvider', 'CPUExecutionProvider']
[Inference] Loading model…
UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names. Available providers: 'CoreMLExecutionProvider, AzureExecutionProvider, CPUExecutionProvider'
[Inference] Started…
[30] inst=2.73 | avg=2.59 | preds=19
...

Expected Behavior:

  • The fine-tuned model should utilize the Apple Silicon GPU for inference, similar to the base RFDETRSmall model.
  • GPU utilization should be visible in Activity Monitor during inference.

Actual Behavior:

  • The fine-tuned model runs entirely on CPU, resulting in lower FPS (~2.59 FPS) and no GPU utilization.
  • The same behavior occurs with the base model when loaded via get_model("rfdetr-small").
  • The base RFDETRSmall model runs on GPU with the same script and environment.

Diagnostic Information:

  • Activity Monitor confirms 0% GPU usage during inference.

Are you willing to submit a PR?

  • Yes, I'd like to help by submitting a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions