-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Following up on #4 , It should now be possible to try and re-implement Ham2Pose metrics very closely.
Regarding implementation of 'nMSE', 'nAPE', 'DTW', 'nDTW' from Ham2Pose, first of all they are OpenPose based. However Zifan implemented Holistic versions: https://github.com/J22Melody/iict-eval-private/blob/text2pose/metrics/metrics.py
Preprocessing
All of them use the following preprocesssing:https://github.com/J22Melody/iict-eval-private/blob/text2pose/metrics/metrics.py#L15C1-L33C16
def normalize_pose(pose: Pose) -> Pose:
return pose.normalize(pose.header.normalization_info(
p1=("POSE_LANDMARKS", "RIGHT_SHOULDER"),
p2=("POSE_LANDMARKS", "LEFT_SHOULDER")
))
def get_pose(pose_path: str):
with open(pose_path, 'rb') as f:
pose = Pose.read(f.read())
if "WORLD_LANDMARKS" in [c.name for c in pose.header.components]:
pose = pose.get_components(["POSE_LANDMARKS", "FACE_LANDMARKS", "LEFT_HAND_LANDMARKS", "RIGHT_HAND_LANDMARKS"])
if "FACE_LANDMARKS" in [c.name for c in pose.header.components]:
pose = reduce_holistic(pose)
pose = normalize_pose(pose)
return poseIn our library this is accomplished by chaining the following Preprocessors
- RemoveWorldLandmarksProcessor()
- ReduceHolisticProcessor()
- NormalizePosesProcessor()
But then the Distance Measures would need some tweaking, those are at https://github.com/J22Melody/iict-eval-private/blob/text2pose/metrics/ham2pose.py. Some of our metrics might be equivalent, for example we could mimic a lot of them by adding
- ZeroPadShorterPosesProcessor
And then calling PowerDistance.
which metrics are defined?
There are 4:
entry['nMSE'] = compare_poses(pose_hyp, pose_ref, distance_function=mse)
entry['nAPE'] = compare_poses(pose_hyp, pose_ref, distance_function=APE)
entry['DTW'] = compare_poses(pose_hyp, pose_ref, distance_function=fastdtw)
entry['nDTW'] = compare_poses(pose_hyp, pose_ref, distance_function='nfastdtw')
The critical function is compare_poses, here: https://github.com/J22Melody/iict-eval-private/blob/text2pose/metrics/ham2pose.py#L163, which defines them all as aggregated trajectory-based distances. It essentially does:
for each keypoint index:
pick a method for calculating trajectory distances
calculate a distance for the two trajectories with that index (e.g. right shoulder)
take the average of all the trajectory distancesWe can mimic this with AggregatedDistanceMeasure https://github.com/sign-language-processing/pose-evaluation/blob/main/pose_evaluation/metrics/distance_measure.py#L68
DistanceMeasure also has code for iterating over keypoint trajectories
https://github.com/sign-language-processing/pose-evaluation/blob/main/pose_evaluation/metrics/distance_measure.py#L38, used in https://github.com/sign-language-processing/pose-evaluation/blob/main/pose_evaluation/metrics/dtw_metric.py#L27-L40
Some of the metrics are defined pointwise though, not trajectory-wise.
Distance Measures:
mse:
Annotated with ChatGPT, this is
def mse(trajectory1, trajectory2):
# Ensure both trajectories have the same length by padding the shorter one with zeros
if len(trajectory1) < len(trajectory2):
diff = len(trajectory2) - len(trajectory1)
trajectory1 = np.concatenate((trajectory1, np.zeros((diff, 3))))
elif len(trajectory2) < len(trajectory1):
diff = len(trajectory1) - len(trajectory2)
trajectory2 = np.concatenate((trajectory2, np.zeros((diff, 3))))
# Extract the boolean masks indicating which values are masked (invalid/missing) in each trajectory
pose1_mask = np.ma.getmask(trajectory1) # shape: (T, 3) or (T,)
pose2_mask = np.ma.getmask(trajectory2)
# Replace masked elements with zeros in both trajectories
# This zero-fills all positions where either trajectory is masked
trajectory1[pose1_mask] = 0
trajectory1[pose2_mask] = 0
trajectory2[pose1_mask] = 0
trajectory2[pose2_mask] = 0
# Compute squared error for each time step: (x1 - x2)^2 + (y1 - y2)^2 + (z1 - z2)^2
sq_error = np.power(trajectory1 - trajectory2, 2).sum(-1) # shape: (T,)
# Return the mean of the squared errors across all time steps
return sq_error.mean()So we only need to add Zero-padding and MaskedFill Preprocessors, first, then do this
# collect trajectory distances
for each trajectory
sq_error = np.power(trajectory1 - trajectory2, 2).sum(-1)
trajectory_distances.append(sq_error)
self._aggregate(trajectory_distances)APE
Annotated with ChatGPT again,
import numpy as np
def APE(trajectory1, trajectory2):
# Ensure both trajectories have the same length by padding the shorter one with zeros
if len(trajectory1) < len(trajectory2):
diff = len(trajectory2) - len(trajectory1)
trajectory1 = np.concatenate((trajectory1, np.zeros((diff, 3))))
elif len(trajectory2) < len(trajectory1):
diff = len(trajectory1) - len(trajectory2)
trajectory2 = np.concatenate((trajectory2, np.zeros((diff, 3))))
# Extract boolean masks indicating masked (invalid or missing) entries
pose1_mask = np.ma.getmask(trajectory1)
pose2_mask = np.ma.getmask(trajectory2)
# Replace masked values with zeros in both trajectories
# Zero out entries in both arrays where either has a mask
trajectory1[pose1_mask] = 0
trajectory1[pose2_mask] = 0
trajectory2[pose1_mask] = 0
trajectory2[pose2_mask] = 0
# Compute squared Euclidean distance for each frame
sq_error = np.power(trajectory1 - trajectory2, 2).sum(-1) # shape: (T,)
# Return the mean of the per-frame Euclidean distances (L2 norm)
return np.sqrt(sq_error).mean()The only difference with mse is this:
return np.sqrt(sq_error).mean()"fastdtw"
Basically, this passes each trajectory pair to fastdtw, but instructs it to use unmasked_euclidean.
This would be easy to recreate, we already have it mostly done here: https://github.com/sign-language-processing/pose-evaluation/blob/main/pose_evaluation/metrics/dtw_metric.py#L14-L43, it is just... really slow.
unmasked euclidean:
def unmasked_euclidean(point1, point2):
if np.ma.is_masked(point2): # reference label keypoint is missing
return euclidean((0, 0, 0), point1)
elif np.ma.is_masked(point1): # reference label keypoint is not missing, other label keypoint is missing
return euclidean((0, 0, 0), point2)
d = euclidean(point1, point2)
return dThis is trivial to implement via:
- MaskedFill with 0
- pass scipy euclidean distance to fastdtw
"nfastdtw"
Exactly the same as "fastdtw" except that pointwise distances are calculated thus, again annotated with ChatGPT
from scipy.spatial.distance import euclidean
import numpy as np
def masked_euclidean(point1, point2):
# Case 1: point2 (the reference label keypoint) is missing
# Treat this as "ignore the frame" — return 0 distance
if np.ma.is_masked(point2):
return 0
# Case 2: point1 (the predicted keypoint) is missing but reference is not
# Penalize missing prediction by returning half the magnitude of the reference point
elif np.ma.is_masked(point1):
return euclidean((0, 0, 0), point2) / 2
# Case 3: both points are present
# Return the regular Euclidean distance
return euclidean(point1, point2)We can pretty easily recreate this by
- MaskedFill with zeros
- custom pointwise distance: return "defaultdistance" if point 2 is masked, scipy euclidean otherwise
- pass to fastdtw