[Bug]: Jetson Nano Slow Performance With GPU/CUDA

### What Operating System(s) are you seeing this problem on?

Other (plase, specify in the Steps to Reproduce)

### dlib version

19.24

### Python version

3.6

### Compiler

gcc 7.5

### Expected Behavior

I am attempting to create a facial detection/recognition component for a system I'm working on and I am unable to get dlib/face_recognition to perform at better than 2FPS under any circumstances.

The systems is running on a Jetson Nano 4G running Ubuntu 18.04 with Jetpack 4.6 installed.

I built dlib from scratch (using this helper script: https://github.com/JpnTr/Jetson-Nano-Install-Dlib-Library) and verified that suggested Jetson specific patches were made (as per https://medium.com/@ageitgey/build-a-hardware-based-face-recognition-system-for-150-with-the-nvidia-jetson-nano-and-python-a25cb8c891fd).

The test code I am running (against a single picture at 585x388 resolution with 5 people in it) looks like:

`#!/usr/bin/python3.6

import face_recognition
import time

def current_milli_time():
    return round(time.time() * 1000)

for i in range(0,30):
  t1=current_milli_time()
  image = face_recognition.load_image_file("humans_1.jpg")
  t2=current_milli_time()
  face_locations = face_recognition.face_locations(image, model="cnn")
  t3=current_milli_time()

  print(face_locations)
  print("load: ", t2-t1 )
  print("detect: ",t3-t2)
  print("Total: ", t3-t1)`

With no model specified (so the CPU is being used I believe) the normal face detection time is about 500ms, give or take. When I specify model="cnn" that number actually INCREASES to over 800ms. 

tegrastats verifies that my GPU utilization is 99%.

I've seen this issue reported by other people but I have yet to see a solution. Shouldn't this be a reasonably fast operation (under 100ms) on a GPU? I've seen other (c/c++ based) face detection methods that suggest that detection can take as little as 20-50ms.

### Current Behavior

Current behavior is that face detection takes 500ms on the CPU and even longer (800+ms) when using CUDA/GPU.

### Steps to Reproduce

Nothing fancy, just run the code I provided.

### Anything else?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Jetson Nano Slow Performance With GPU/CUDA #2766

What Operating System(s) are you seeing this problem on?

dlib version

Python version

Compiler

Expected Behavior

Current Behavior

Steps to Reproduce

Anything else?

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug]: Jetson Nano Slow Performance With GPU/CUDA #2766

Description

What Operating System(s) are you seeing this problem on?

dlib version

Python version

Compiler

Expected Behavior

Current Behavior

Steps to Reproduce

Anything else?

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions