Pose detect_for_video with GPU delegate crashes when segmentation enabled on Mac

### Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

Yes

### OS Platform and Distribution

Apple M3 Sequoia 15.1.1

### MediaPipe Tasks SDK version

3.10.0

### Task name (e.g. Image classification, Gesture recognition etc.)

Pose landmarker

### Programming Language and version (e.g. C++, Python, Java)

Python 3.10

### Describe the actual behavior

Detection crashes application when segmentation is enabled. If I turn off segmentation mask, detect_for_video runs fine. CPU delegate runs fine for all cases.

### Describe the expected behaviour

It should not crash.

### Standalone code/steps you may have used to try to get what you need

Basic pose detection code using the latest API, with GPU as delegate on Mac M3.

### Other info / Complete Logs

```shell
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1734586092.124030 9722600 gl_context.cc:369] GL version: 2.1 (2.1 Metal - 89.3), renderer: Apple M3
INFO: Created TensorFlow Lite delegate for Metal.
2024-12-18 21:28:12 - root - INFO - Model loaded successfully
W0000 00:00:1734586092.236510 9722705 landmark_projection_calculator.cc:186] Using NORM_RECT without IMAGE_DIMENSIONS is only supported for the square ROI. Provide IMAGE_DIMENSIONS or use PROJECTION_MATRIX.
E0000 00:00:1734586092.239868 9722700 shader_util.cc:99] Failed to compile shader:
 1 #version 330 
 2 #ifdef GL_ES 
 3 #define DEFAULT_PRECISION(p, t) precision p t; 
 4 #else 
 5 #define DEFAULT_PRECISION(p, t) 
 6 #define lowp 
 7 #define mediump 
 8 #define highp 
 9 #endif  // defined(GL_ES) 
10 #if __VERSION__ < 130
11 #define in attribute
12 #define out varying
13 #endif  // __VERSION__ < 130
14 in vec4 position; in mediump vec4 texture_coordinate; out mediump vec2 sample_coordinate; void main() { gl_Position = position; sample_coordinate = texture_coordinate.xy; }
E0000 00:00:1734586092.239882 9722700 shader_util.cc:106] Error message: ERROR: 0:1: '' :  version '330' is not supported

E0000 00:00:1734586092.239922 9722600 calculator_graph.cc:928] INTERNAL: CalculatorGraph::Run() failed: 
Calculator::Process() for node "mediapipe_tasks_vision_pose_landmarker_poselandmarkergraph__mediapipe_tasks_vision_pose_landmarker_multipleposelandmarksdetectorgraph__mediapipe_tasks_vision_pose_landmarker_singleposelandmarksdetectorgraph__TensorsToSegmentationCalculator" failed: ; RET_CHECK failure (mediapipe/calculators/tensor/tensors_to_segmentation_converter_metal.cc:217) upsample_program_Problem initializing the program.

This happens when detect_for_video is called.

def load_model(self):
        logging.info(f"Loading MediaPipe Pose model from {self.model_path}")

        # No GPU support on Windows yet
        # Why no GPU with segmentation on Mac?
        delegate = python.BaseOptions.Delegate.CPU

        #if platform.system() == "Darwin" and not self.enable_segmentation:
        if platform.system() == "Darwin":
            delegate = python.BaseOptions.Delegate.GPU

        base_options = python.BaseOptions(
            model_asset_path=self.model_path,
            delegate=delegate
        )

        options = vision.PoseLandmarkerOptions(
            base_options=base_options,
            num_poses=self.num_poses,
            min_pose_detection_confidence=self.min_pose_detection_confidence,
            min_pose_presence_confidence=self.min_pose_presence_confidence,
            min_tracking_confidence=self.min_tracking_confidence,
            output_segmentation_masks=self.enable_segmentation,
            running_mode=vision.RunningMode.VIDEO
        )
        self.pose_detector = vision.PoseLandmarker.create_from_options(options)
        logging.info("Model loaded successfully")

    def unload_model(self):
        logging.info("Unloading MediaPipe Pose model")
        """Unload the pose detection model and free resources"""
        if self.pose_detector is not None:
            self.pose_detector.close()
            self.pose_detector = None
        logging.info("Model unloaded successfully")


    def pose(self, image, timestamp_ms):
        if self.pose_detector is None:
            self.load_model()

        mp_image = mp.Image(image_format=mp.ImageFormat.SRGBA, data=cv2.cvtColor(image, cv2.COLOR_BGR2RGBA))
        detection_result = self.pose_detector.detect_for_video(mp_image, int(round(timestamp_ms)))
        result = {
            'poses': [],
            'world_poses': [],
            'segmentation_mask': None
        }
        
        if detection_result.pose_landmarks:
            for idx, (pose_landmarks, world_landmarks) in enumerate(zip(
                detection_result.pose_landmarks, 
                detection_result.pose_world_landmarks)):
                
                keypoint_dict = {}
                world_keypoint_dict = {}
                
                for i, landmark_name in enumerate(self.KEYPOINT_NAMES):
                    # Normalized coordinates
                    point = pose_landmarks[i]
                    keypoint_dict[landmark_name] = {
                        'x': 1.0-point.x,
                        'y': 1.0-point.y,
                        'z': point.z,
                        'confidence': point.visibility
                    }
                    
                    # World coordinates
                    world_point = world_landmarks[i]
                    world_keypoint_dict[landmark_name] = {
                        'x': world_point.x,
                        'y': world_point.y,
                        'z': world_point.z,
                        'confidence': world_point.visibility
                    }
                
                result['poses'].append(keypoint_dict)
                result['world_poses'].append(world_keypoint_dict)
            
            if self.enable_segmentation and detection_result.segmentation_masks:
                # Initialize a combined mask with zeros
                combined_mask = np.zeros_like(detection_result.segmentation_masks[0].numpy_view(), dtype=np.float32)
                
                for mask in detection_result.segmentation_masks:
                    combined_mask += mask.numpy_view()
                    
                # Normalize the combined mask to the range [0, 65535]
                combined_mask = (combined_mask / combined_mask.max() * 65535).astype(np.uint16)
                result['segmentation_mask'] = combined_mask
        
        return result
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pose detect_for_video with GPU delegate crashes when segmentation enabled on Mac #5788

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

OS Platform and Distribution

MediaPipe Tasks SDK version

Task name (e.g. Image classification, Gesture recognition etc.)

Programming Language and version (e.g. C++, Python, Java)

Describe the actual behavior

Describe the expected behaviour

Standalone code/steps you may have used to try to get what you need

Other info / Complete Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pose detect_for_video with GPU delegate crashes when segmentation enabled on Mac #5788

Description

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

OS Platform and Distribution

MediaPipe Tasks SDK version

Task name (e.g. Image classification, Gesture recognition etc.)

Programming Language and version (e.g. C++, Python, Java)

Describe the actual behavior

Describe the expected behaviour

Standalone code/steps you may have used to try to get what you need

Other info / Complete Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions