Skip to content

Reducing Latency for Hand-Tracking Solution in Python #5789

@MarcKr3

Description

@MarcKr3

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

Yes

OS Platform and Distribution

Windows 10 amd64x

MediaPipe Tasks SDK version

Mediapipe Version: 0.10.20

Task name (e.g. Image classification, Gesture recognition etc.)

Hand landmark detection

Programming Language and version (e.g. C++, Python, Java)

Python

Describe the actual behavior

Using the LITE Model (model_complexity=0), I'm measuring Latency of 35-27ms

Describe the expected behaviour

According to "https://ai.google.dev/edge/mediapipe/solutions/vision/hand_landmarker", the FULL Model (model_complexity=1) has a latency of CPU:17ms, GPU:12ms

Standalone code/steps you may have used to try to get what you need

The way I measure latency is as follows:

def log_latency(start_time, event):
    elapsed_time = time.time() - start_time
    print(f"[{event}] Elapsed Time: {elapsed_time:.4f} seconds")
    return elapsed_time
    
with mp_hands.Hands(model_complexity=0, min_detection_confidence=0.3, min_tracking_confidence=0.5) as hands: 
    while cap.isOpened():
        ret, frame = cap.read()
        
        # BGR 2 RGB
        image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        
        # Flip on horizontal
        image = cv2.flip(image, 1)
        
        # Set flag
        image.flags.writeable = False

        # Detections
        det_time = time.time()
        results = hands.process(image)
        log_latency(det_time, "Landmark Detection")

        # Set flag to true
        image.flags.writeable = True
        
        # RGB 2 BGR
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

        # Detections
        'print(results)'
        
        # Rendering results
        if results.multi_hand_landmarks:
            for num, hand in enumerate(results.multi_hand_landmarks):
                mp_drawing.draw_landmarks(image, hand, mp_hands.HAND_CONNECTIONS, 
                                        mp_drawing.DrawingSpec(color=(51, 102, 0), thickness=1, circle_radius=2),
                                        mp_drawing.DrawingSpec(color=(33, 165, 205), thickness=2, circle_radius=0),
                                         )
                    
            # Draw Finger distances to image from point list
            draw_tip_distances(image, results, point_list)
            
            # Draw Hand distances to image from tip list
            draw_hand_distances(num, image, results, tip_list)

        # Check for countdown trigger
        key = cv2.waitKey(10) & 0xFF
        if key == ord('c'):
            cal_distances(image, hands)

        # Show the image
        cv2.imshow('Hand Tracking', image)

        if key == ord('q'):
            break

cap.release()
cv2.destroyAllWindows()

Other info / Complete Logs

I'm trying to get the Hand Tracking as close to real-time as possible.
I apologize if I'm misinterpreting the "expected" latency values.

I'm considering:
*Hardware acceleration: Unfortunately, I don't have a CUDA GPU. Isn't supported in the Python solution as far as I know anyways.
*Playing with detection & tracking confidence yielded improvements of ~2ms
*Tracking only necessary landmarks: For my task I only need wrist & Fingertip landmarks, however the model tracks all 21 landmarks. Would creating a custom model like this be possible/reduce latency?
*I considered switching to C++, but had problems setting up the MediaPipe Framework. I would get the Hello World to run successfully, but the hand_tracking_cpu example failed to build....

I'll gladly specify further if necessary! Thanks!

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions