Search before asking
Bug
I am currently working on comparing RF-DETR Nano models with YOLOv11 models. In particular I am trying to now validate the statement that the RF-DETR Nano is faster and higher quality than YOLOv11-medium. It does apper to be the case in sake of my football players dataset:
YOLO RESULTS:

RF-DETR NANO RESULTS:

However I get significant performance issues when running RF-DETR Nano for ball dataset:
The performance for ball dataset dropped more than two times in case of lower resolutions.
Link to models: here
link to datasets:
ball: here
players: here
Environment
RF-DETR version: 1.6.3
OS: Windows-10-10.0.26200-SP0
Python version: 3.11.9 (tags/v3.11.9:de54cf5, Apr 2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)]
PyTorch version: 2.11.0+cu128
PyTorch CUDA version: 12.8
GPU 0: NVIDIA GeForce RTX 3070 Laptop GPU
Minimal Reproducible Example
The rf-detr models are run with optimize_for_inference(). This code calculates the adapter_ms:
def rfdetr_predict_to_predet(
model,
pil_img: Image.Image,
threshold: float,
active_cids: List[int],
) -> U.PredictResult:
U.cuda_sync_if_needed()
adapter_t0 = time.perf_counter()
if not hasattr(model, "_debug_predict_printed"):
setattr(model, "_debug_predict_printed", True)
print("=" * 80)
print("[FIRST PREDICT]")
print("model detected device:", inspect_model_device(model))
print("image size:", pil_img.size)
print("threshold:", threshold)
print_gpu_memory("[CUDA BEFORE PREDICT]")
dets = None
model_ms = None
extra_timings: Dict[str, Any] = {}
try:
out = model.predict(pil_img, threshold=threshold, return_timings=True)
if isinstance(out, tuple) and len(out) == 2 and isinstance(out[1], dict):
dets, timing = out
if timing.get("forward_ms") is not None:
model_ms = float(timing["forward_ms"])
for key in ["preprocess_ms", "forward_ms", "postprocess_ms", "convert_ms", "total_ms"]:
if timing.get(key) is not None:
extra_timings[key] = float(timing[key])
else:
dets = out
except TypeError:
dets, model_ms = U.timed_call_ms(
lambda: model.predict(pil_img, threshold=threshold)
)
if getattr(model, "_debug_predict_printed", False) and not hasattr(model, "_debug_predict_done_printed"):
setattr(model, "_debug_predict_done_printed", True)
print_gpu_memory("[CUDA AFTER PREDICT]")
print("timings:", extra_timings if extra_timings else {"forward_ms": model_ms})
print("=" * 80)
empty_pred = U.PredDet(
np.zeros((0, 4), float),
np.zeros((0,), float),
np.zeros((0,), int),
)
if dets is None or len(dets) == 0:
U.cuda_sync_if_needed()
adapter_t1 = time.perf_counter()
return U.PredictResult(
pred=empty_pred,
adapter_ms=(adapter_t1 - adapter_t0) * 1000.0,
model_ms=model_ms,
extra_timings=extra_timings or None,
)
xyxy = np.asarray(dets.xyxy, dtype=float)
scores = np.asarray(dets.confidence, dtype=float)
cls = np.asarray(dets.class_id, dtype=int)
keep = []
out_cids = []
for i in range(len(cls)):
cid = int(cls[i])
if cid in SKIP_COCO_CATEGORY_IDS:
continue
if cid not in active_cids:
continue
keep.append(i)
out_cids.append(cid)
if not keep:
U.cuda_sync_if_needed()
adapter_t1 = time.perf_counter()
return U.PredictResult(
pred=empty_pred,
adapter_ms=(adapter_t1 - adapter_t0) * 1000.0,
model_ms=model_ms,
extra_timings=extra_timings or None,
)
idx = np.array(keep, dtype=int)
pred = U.PredDet(
boxes_xyxy=xyxy[idx],
scores=scores[idx],
class_ids=np.array(out_cids, dtype=int),
)
U.cuda_sync_if_needed()
adapter_t1 = time.perf_counter()
return U.PredictResult(
pred=pred,
adapter_ms=(adapter_t1 - adapter_t0) * 1000.0,
model_ms=model_ms,
extra_timings=extra_timings or None,
)
Additional
No response
Are you willing to submit a PR?
Search before asking
Bug
I am currently working on comparing RF-DETR Nano models with YOLOv11 models. In particular I am trying to now validate the statement that the RF-DETR Nano is faster and higher quality than YOLOv11-medium. It does apper to be the case in sake of my football players dataset:


YOLO RESULTS:
RF-DETR NANO RESULTS:
However I get significant performance issues when running RF-DETR Nano for ball dataset:
The performance for ball dataset dropped more than two times in case of lower resolutions.
Link to models: here
link to datasets:
ball: here
players: here
Environment
RF-DETR version: 1.6.3
OS: Windows-10-10.0.26200-SP0
Python version: 3.11.9 (tags/v3.11.9:de54cf5, Apr 2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)]
PyTorch version: 2.11.0+cu128
PyTorch CUDA version: 12.8
GPU 0: NVIDIA GeForce RTX 3070 Laptop GPU
Minimal Reproducible Example
The rf-detr models are run with optimize_for_inference(). This code calculates the adapter_ms:
Additional
No response
Are you willing to submit a PR?