Skip to content

Add Ham2Pose metrics exact re-implementations #31

@cleong110

Description

@cleong110

Following up on #4 , It should now be possible to try and re-implement Ham2Pose metrics very closely.

Regarding implementation of 'nMSE', 'nAPE', 'DTW', 'nDTW' from Ham2Pose, first of all they are OpenPose based. However Zifan implemented Holistic versions: https://github.com/J22Melody/iict-eval-private/blob/text2pose/metrics/metrics.py

Preprocessing

All of them use the following preprocesssing:https://github.com/J22Melody/iict-eval-private/blob/text2pose/metrics/metrics.py#L15C1-L33C16

def normalize_pose(pose: Pose) -> Pose:
    return pose.normalize(pose.header.normalization_info(
        p1=("POSE_LANDMARKS", "RIGHT_SHOULDER"),
        p2=("POSE_LANDMARKS", "LEFT_SHOULDER")
    ))


def get_pose(pose_path: str):
    with open(pose_path, 'rb') as f:
        pose = Pose.read(f.read())

    if "WORLD_LANDMARKS" in [c.name for c in pose.header.components]:
        pose = pose.get_components(["POSE_LANDMARKS", "FACE_LANDMARKS", "LEFT_HAND_LANDMARKS", "RIGHT_HAND_LANDMARKS"])
    if "FACE_LANDMARKS" in [c.name for c in pose.header.components]:
        pose = reduce_holistic(pose)

    pose = normalize_pose(pose)

    return pose

In our library this is accomplished by chaining the following Preprocessors

  • RemoveWorldLandmarksProcessor()
  • ReduceHolisticProcessor()
  • NormalizePosesProcessor()

But then the Distance Measures would need some tweaking, those are at https://github.com/J22Melody/iict-eval-private/blob/text2pose/metrics/ham2pose.py. Some of our metrics might be equivalent, for example we could mimic a lot of them by adding

  • ZeroPadShorterPosesProcessor

And then calling PowerDistance.

which metrics are defined?

There are 4:

                entry['nMSE'] = compare_poses(pose_hyp, pose_ref, distance_function=mse)
                entry['nAPE'] = compare_poses(pose_hyp, pose_ref, distance_function=APE)
                entry['DTW'] = compare_poses(pose_hyp, pose_ref, distance_function=fastdtw)
                entry['nDTW'] = compare_poses(pose_hyp, pose_ref, distance_function='nfastdtw')

The critical function is compare_poses, here: https://github.com/J22Melody/iict-eval-private/blob/text2pose/metrics/ham2pose.py#L163, which defines them all as aggregated trajectory-based distances. It essentially does:

for each keypoint index:
   pick a method for calculating trajectory distances
   calculate a distance for the two trajectories with that index (e.g. right shoulder)
take the average of all the trajectory distances

We can mimic this with AggregatedDistanceMeasure https://github.com/sign-language-processing/pose-evaluation/blob/main/pose_evaluation/metrics/distance_measure.py#L68

DistanceMeasure also has code for iterating over keypoint trajectories
https://github.com/sign-language-processing/pose-evaluation/blob/main/pose_evaluation/metrics/distance_measure.py#L38, used in https://github.com/sign-language-processing/pose-evaluation/blob/main/pose_evaluation/metrics/dtw_metric.py#L27-L40

Some of the metrics are defined pointwise though, not trajectory-wise.

Distance Measures:

mse:

Annotated with ChatGPT, this is

def mse(trajectory1, trajectory2):
    # Ensure both trajectories have the same length by padding the shorter one with zeros
    if len(trajectory1) < len(trajectory2):
        diff = len(trajectory2) - len(trajectory1)
        trajectory1 = np.concatenate((trajectory1, np.zeros((diff, 3))))
    elif len(trajectory2) < len(trajectory1):
        diff = len(trajectory1) - len(trajectory2)
        trajectory2 = np.concatenate((trajectory2, np.zeros((diff, 3))))

    # Extract the boolean masks indicating which values are masked (invalid/missing) in each trajectory
    pose1_mask = np.ma.getmask(trajectory1)  # shape: (T, 3) or (T,)
    pose2_mask = np.ma.getmask(trajectory2)

    # Replace masked elements with zeros in both trajectories
    # This zero-fills all positions where either trajectory is masked
    trajectory1[pose1_mask] = 0
    trajectory1[pose2_mask] = 0
    trajectory2[pose1_mask] = 0
    trajectory2[pose2_mask] = 0

    # Compute squared error for each time step: (x1 - x2)^2 + (y1 - y2)^2 + (z1 - z2)^2
    sq_error = np.power(trajectory1 - trajectory2, 2).sum(-1)  # shape: (T,)

    # Return the mean of the squared errors across all time steps
    return sq_error.mean()

So we only need to add Zero-padding and MaskedFill Preprocessors, first, then do this

# collect trajectory distances

for each trajectory 
   sq_error = np.power(trajectory1 - trajectory2, 2).sum(-1) 
   trajectory_distances.append(sq_error)
self._aggregate(trajectory_distances)

APE

Annotated with ChatGPT again,

import numpy as np

def APE(trajectory1, trajectory2):
    # Ensure both trajectories have the same length by padding the shorter one with zeros
    if len(trajectory1) < len(trajectory2):
        diff = len(trajectory2) - len(trajectory1)
        trajectory1 = np.concatenate((trajectory1, np.zeros((diff, 3))))
    elif len(trajectory2) < len(trajectory1):
        diff = len(trajectory1) - len(trajectory2)
        trajectory2 = np.concatenate((trajectory2, np.zeros((diff, 3))))

    # Extract boolean masks indicating masked (invalid or missing) entries
    pose1_mask = np.ma.getmask(trajectory1)
    pose2_mask = np.ma.getmask(trajectory2)

    # Replace masked values with zeros in both trajectories
    # Zero out entries in both arrays where either has a mask
    trajectory1[pose1_mask] = 0
    trajectory1[pose2_mask] = 0
    trajectory2[pose1_mask] = 0
    trajectory2[pose2_mask] = 0

    # Compute squared Euclidean distance for each frame
    sq_error = np.power(trajectory1 - trajectory2, 2).sum(-1)  # shape: (T,)

    # Return the mean of the per-frame Euclidean distances (L2 norm)
    return np.sqrt(sq_error).mean()

The only difference with mse is this:

return np.sqrt(sq_error).mean()

"fastdtw"

Basically, this passes each trajectory pair to fastdtw, but instructs it to use unmasked_euclidean.

This would be easy to recreate, we already have it mostly done here: https://github.com/sign-language-processing/pose-evaluation/blob/main/pose_evaluation/metrics/dtw_metric.py#L14-L43, it is just... really slow.

unmasked euclidean:

def unmasked_euclidean(point1, point2):
    if np.ma.is_masked(point2):  # reference label keypoint is missing
        return euclidean((0, 0, 0), point1)
    elif np.ma.is_masked(point1):  # reference label keypoint is not missing, other label keypoint is missing
        return euclidean((0, 0, 0), point2)
    d = euclidean(point1, point2)
    return d

This is trivial to implement via:

  • MaskedFill with 0
  • pass scipy euclidean distance to fastdtw

"nfastdtw"

Exactly the same as "fastdtw" except that pointwise distances are calculated thus, again annotated with ChatGPT

from scipy.spatial.distance import euclidean
import numpy as np

def masked_euclidean(point1, point2):
    # Case 1: point2 (the reference label keypoint) is missing
    # Treat this as "ignore the frame" — return 0 distance
    if np.ma.is_masked(point2):
        return 0

    # Case 2: point1 (the predicted keypoint) is missing but reference is not
    # Penalize missing prediction by returning half the magnitude of the reference point
    elif np.ma.is_masked(point1):
        return euclidean((0, 0, 0), point2) / 2

    # Case 3: both points are present
    # Return the regular Euclidean distance
    return euclidean(point1, point2)

We can pretty easily recreate this by

  • MaskedFill with zeros
  • custom pointwise distance: return "defaultdistance" if point 2 is masked, scipy euclidean otherwise
  • pass to fastdtw

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions