-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Description
Have I written custom code (as opposed to using a stock example script provided in MediaPipe)
Yes
OS Platform and Distribution
Windows 10 amd64x
MediaPipe Tasks SDK version
Mediapipe Version: 0.10.20
Task name (e.g. Image classification, Gesture recognition etc.)
Hand landmark detection
Programming Language and version (e.g. C++, Python, Java)
Python
Describe the actual behavior
Using the LITE Model (model_complexity=0), I'm measuring Latency of 35-27ms
Describe the expected behaviour
According to "https://ai.google.dev/edge/mediapipe/solutions/vision/hand_landmarker", the FULL Model (model_complexity=1) has a latency of CPU:17ms, GPU:12ms
Standalone code/steps you may have used to try to get what you need
The way I measure latency is as follows:
def log_latency(start_time, event):
elapsed_time = time.time() - start_time
print(f"[{event}] Elapsed Time: {elapsed_time:.4f} seconds")
return elapsed_time
with mp_hands.Hands(model_complexity=0, min_detection_confidence=0.3, min_tracking_confidence=0.5) as hands:
while cap.isOpened():
ret, frame = cap.read()
# BGR 2 RGB
image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
# Flip on horizontal
image = cv2.flip(image, 1)
# Set flag
image.flags.writeable = False
# Detections
det_time = time.time()
results = hands.process(image)
log_latency(det_time, "Landmark Detection")
# Set flag to true
image.flags.writeable = True
# RGB 2 BGR
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
# Detections
'print(results)'
# Rendering results
if results.multi_hand_landmarks:
for num, hand in enumerate(results.multi_hand_landmarks):
mp_drawing.draw_landmarks(image, hand, mp_hands.HAND_CONNECTIONS,
mp_drawing.DrawingSpec(color=(51, 102, 0), thickness=1, circle_radius=2),
mp_drawing.DrawingSpec(color=(33, 165, 205), thickness=2, circle_radius=0),
)
# Draw Finger distances to image from point list
draw_tip_distances(image, results, point_list)
# Draw Hand distances to image from tip list
draw_hand_distances(num, image, results, tip_list)
# Check for countdown trigger
key = cv2.waitKey(10) & 0xFF
if key == ord('c'):
cal_distances(image, hands)
# Show the image
cv2.imshow('Hand Tracking', image)
if key == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
Other info / Complete Logs
I'm trying to get the Hand Tracking as close to real-time as possible.
I apologize if I'm misinterpreting the "expected" latency values.
I'm considering:
*Hardware acceleration: Unfortunately, I don't have a CUDA GPU. Isn't supported in the Python solution as far as I know anyways.
*Playing with detection & tracking confidence yielded improvements of ~2ms
*Tracking only necessary landmarks: For my task I only need wrist & Fingertip landmarks, however the model tracks all 21 landmarks. Would creating a custom model like this be possible/reduce latency?
*I considered switching to C++, but had problems setting up the MediaPipe Framework. I would get the Hello World to run successfully, but the hand_tracking_cpu example failed to build....
I'll gladly specify further if necessary! Thanks!