Skip to content

[Bug]: Jetson Nano Slow Performance With GPU/CUDA #2766

@marcjasner

Description

@marcjasner

What Operating System(s) are you seeing this problem on?

Other (plase, specify in the Steps to Reproduce)

dlib version

19.24

Python version

3.6

Compiler

gcc 7.5

Expected Behavior

I am attempting to create a facial detection/recognition component for a system I'm working on and I am unable to get dlib/face_recognition to perform at better than 2FPS under any circumstances.

The systems is running on a Jetson Nano 4G running Ubuntu 18.04 with Jetpack 4.6 installed.

I built dlib from scratch (using this helper script: https://github.com/JpnTr/Jetson-Nano-Install-Dlib-Library) and verified that suggested Jetson specific patches were made (as per https://medium.com/@ageitgey/build-a-hardware-based-face-recognition-system-for-150-with-the-nvidia-jetson-nano-and-python-a25cb8c891fd).

The test code I am running (against a single picture at 585x388 resolution with 5 people in it) looks like:

`#!/usr/bin/python3.6

import face_recognition
import time

def current_milli_time():
return round(time.time() * 1000)

for i in range(0,30):
t1=current_milli_time()
image = face_recognition.load_image_file("humans_1.jpg")
t2=current_milli_time()
face_locations = face_recognition.face_locations(image, model="cnn")
t3=current_milli_time()

print(face_locations)
print("load: ", t2-t1 )
print("detect: ",t3-t2)
print("Total: ", t3-t1)`

With no model specified (so the CPU is being used I believe) the normal face detection time is about 500ms, give or take. When I specify model="cnn" that number actually INCREASES to over 800ms.

tegrastats verifies that my GPU utilization is 99%.

I've seen this issue reported by other people but I have yet to see a solution. Shouldn't this be a reasonably fast operation (under 100ms) on a GPU? I've seen other (c/c++ based) face detection methods that suggest that detection can take as little as 20-50ms.

Current Behavior

Current behavior is that face detection takes 500ms on the CPU and even longer (800+ms) when using CUDA/GPU.

Steps to Reproduce

Nothing fancy, just run the code I provided.

Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions