- 
                Notifications
    
You must be signed in to change notification settings  - Fork 457
 
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Search before asking
- I have searched the RF-DETR issues and found no similar bug report.
 
Bug
Description:
When running inference with my fine-tuned RF-DETR model (player-and-handball-detection-3z9xf/1) on Apple Silicon, the model only uses CPU despite CoreML being configured. The base RF-DETR model (e.g., rfdetr-small) runs on GPU as expected when loaded via RFDETRSmall(), but not when loaded via get_model("rfdetr-small").
Key Observations:
- Fine-tuned model (
get_model("player-and-handball-detection-3z9xf/1")) → CPU only. - Base model (
RFDETRSmall()) → GPU works. - Base model via 
get_model("rfdetr-small")→ CPU only (same as fine-tuned model). - GPU utilization remains at 0% during inference for 
get_model()cases. 
Environment
- Roboflow Inference Version: 
0.59.0 - OS: macOS 26.0.1
 - Python Version: 
3.12.11 - ONNX Runtime Version: 
1.21.1 - Hardware: MacBook Pro M1 Max (64GB RAM)
 
Minimal Reproducible Example
import os
import time
import cv2
import supervision as sv
from inference import get_model
import onnxruntime as ort
# Configure CoreML
os.environ["INFERENCE_LOG_LEVEL"] = "DEBUG"
os.environ["ONNXRUNTIME_EXECUTION_PROVIDERS"] = "[CoreMLExecutionProvider]"
os.environ["ORT_LOG_SEVERITY_LEVEL"] = "0"      # VERBOSE
os.environ["ORT_LOG_VERBOSITY_LEVEL"] = "1"     # Extra detail
print("[ONNX Runtime] Available providers:", ort.get_available_providers())
# Load API key and model
API_KEY = os.getenv("ROBOFLOW_API_KEY")
MODEL_ID = "player-and-handball-detection-3z9xf/1"
VIDEO_IN = "https://commondatastorage.googleapis.com/gtv-videos-bucket/sample/BigBuckBunny.mp4"
VIDEO_OUT = "Clip_annotated_coreml.mp4"
print("[Inference] Loading model…")
model = get_model(MODEL_ID, api_key=API_KEY)
# Process video
cap = cv2.VideoCapture(VIDEO_IN)
assert cap.isOpened(), f"Cannot open {VIDEO_IN}"
w, h = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps_in = cap.get(cv2.CAP_PROP_FPS) or 25.0
vw = cv2.VideoWriter(VIDEO_OUT, cv2.VideoWriter_fourcc(*"mp4v"), fps_in, (w, h))
box_annot = sv.BoxAnnotator(thickness=2)
font = cv2.FONT_HERSHEY_SIMPLEX
t0_all = time.time()
n = 0
print("[Inference] Started…")
while True:
    ok, frame = cap.read()
    if not ok:
        break
    t0 = time.time()
    n += 1
    res = model.infer(frame, confidence=0.25, iou=0.5)[0]
    dets = sv.Detections.from_inference(res)
    annotated = box_annot.annotate(scene=frame.copy(), detections=dets)
    # Draw labels
    names = dets.data.get("class_name", [str(i) for i in dets.class_id])
    labels = [f"{n_} {c:.2f}" for n_, c in zip(names, dets.confidence)]
    for (x1, y1, x2, y2), text in zip(dets.xyxy, labels):
        cv2.putText(annotated, text, (int(x1), max(0, int(y1) - 4)), font, 0.5, (0, 255, 0), 1, cv2.LINE_AA)
    # Calculate FPS
    dt = time.time() - t0
    fps_inst = 1.0 / dt if dt > 0 else 0.0
    fps_avg = n / (time.time() - t0_all + 1e-9)
    txt = f"FPS: {fps_inst:.1f} (avg {fps_avg:.1f})"
    cv2.putText(annotated, txt, (12, 28), font, 0.8, (0, 0, 0), 4, cv2.LINE_AA)
    cv2.putText(annotated, txt, (12, 28), font, 0.8, (255, 255, 0), 2, cv2.LINE_AA)
    if n % 30 == 0:
        print(f"[{n}] inst={fps_inst:.2f} | avg={fps_avg:.2f} | preds={len(dets)}")
    vw.write(annotated)
cap.release()
vw.release()
print(f"[Done] Saved: {VIDEO_OUT}")Additional
Terminal Output:
[ONNX Runtime] Available providers: ['CoreMLExecutionProvider', 'AzureExecutionProvider', 'CPUExecutionProvider']
[Inference] Loading model…
UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names. Available providers: 'CoreMLExecutionProvider, AzureExecutionProvider, CPUExecutionProvider'
[Inference] Started…
[30] inst=2.73 | avg=2.59 | preds=19
...
Expected Behavior:
- The fine-tuned model should utilize the Apple Silicon GPU for inference, similar to the base 
RFDETRSmallmodel. - GPU utilization should be visible in Activity Monitor during inference.
 
Actual Behavior:
- The fine-tuned model runs entirely on CPU, resulting in lower FPS (~2.59 FPS) and no GPU utilization.
 - The same behavior occurs with the base model when loaded via 
get_model("rfdetr-small"). - The base 
RFDETRSmallmodel runs on GPU with the same script and environment. 
Diagnostic Information:
- Activity Monitor confirms 0% GPU usage during inference.
 
Are you willing to submit a PR?
- Yes, I'd like to help by submitting a PR!
 
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working