Skip to content

Pose distance metrics from Ham2Pose #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

cleong110
Copy link
Contributor

@cleong110 cleong110 commented Nov 18, 2024

Initial commit is just to copy code. Next we need to rewrite to subclass DistanceMetric, etc.

Tasks include:

  • add scipy and fastdtw as requirements.
  • Create a DistanceMetric class for DTW-MJE and Normalized DTW-MJE, from Ham2Pose.
  • add in proper tests
  • add in utilities for pose
  • add in tests for those utilities

@cleong110
Copy link
Contributor Author

Oh yes, when accepted this will fix #3

@cleong110 cleong110 changed the title CDL: copying code by @j22melody, as requested. Pose distance metrics from Ham2Pose Nov 25, 2024
@cleong110
Copy link
Contributor Author

Spent the day messing around with this, thinking about how to implement this.

Essentially the Ham2Pose Metrics https://github.com/J22Melody/iict-eval-private/blob/text2pose/metrics/metrics.py#L94 have the following variables, various combinations of which constitute different metrics

  • normalize or not?
  • which distance metric function to use for (xyz point 1, xyz point 2)? euclidean or cityblock or etc.
  • apply dynamic time warping? Or pad the shorter one?
  • how to deal with missing points?

Also, a number of these work along joint-point trajectories. For example sequence of thumb positions from hyp and ref. For this, the pose.body.points_perspective() (https://github.com/sign-language-processing/pose/blob/master/src/python/pose_format/pose_body.py) function seems useful. For example it makes APE easier to implement: https://github.com/rotem-shalev/Ham2Pose/blob/25a5cd7221dfb81a24088e4f38bca868d8e896fc/metrics.py#L101

A couple conundrums to solve:

  • Do we do things like normalizing, padding, etc as part of the metric, or part of the preprocessing? If it's part of the preprocessing, then DTW-MJE and DTW-nMJE are are not separate DistanceMetric classes, just the same class with different options. And even dynamic time warping itself could be thought of as an option for how to handle unequal sequences.
  • In general, do we make subclasses for APE, MSE, and so on, or do we just add more options to DistanceMetric?
  • Specifically: what about DistanceMetric.kind? Right now we have a string variable which determines which distance function to use between two spatial points. Would it be better to have a DistanceMetric.spatial_distance(hyp_xyz_point, ref_xyz_point), which is overridden by EuclideanDistanceMetric?
  • Also DistanceMetric doesn't operate joint-wise at the moment.

@J22Melody
Copy link

A base class could contain many optional basic pose operations (normalize, pad, etc.), but at the moment, a DistanceMetric class with different options is also fine to me.

In general, I lean towards whatever can be quickly built. Then, we can optimize as we use it and develop other metrics. We will know better what should be shared among classes and what not.

Let's see @AmitMY's opinions.

@AmitMY
Copy link
Contributor

AmitMY commented Jan 17, 2025

I think a base metric, with options, is a good idea.
hide_legs=True for example as default.
Then, we could create aliases for metrics, such as:

class DTW_MJE(DistanceMetric):
  def __init__():
     super().__init__(normalize=False, hide_legs=True, etc...)

That way, we can have a generic approach, but still "keep" the old names through basic inheritance. It will also allow creating all possible setups of DistanceMetric, such that we will then be able to ask "what are the best settings for this task?" and find them.

Finally, maybe it will be a really good section of writing: Everyone is inventing metrics left and right, giving them names, there are so many metrics with the same name, etc. We go with the sacrebleu approach, that all the distance metrics are DistanceMetric with a different setup. It has a signature (when printing, __str__) that says what metric it is, so it is easily reproducible. (e.g. "nrefs:1|case:mixed|eff:no|tok:13a|smooth:exp|version:2.0.0")

@cleong110
Copy link
Contributor Author

OK, it's still very WIP, but I pushed some changes along the lines of making this more like sacrebleu:

  • DistanceMetric now takes a list of PoseProcessor objects, which it will use when preprocess is called. Things like PosesNormalizer
  • Also you can add to the list by flags. Various flags such as normalize_pose will cause the instantiation of a PoseProcessor and it gets added to the list.
  • Each PoseProcessor object has a str function with relevant details. When we want to calcualte the signature, the DistanceMetric can use them.
  • unless overriden, a DistanceMetric will go keypoint by keypoint, get the full trajectory, and then call a function to get a trajectory-to-trajectory distance.
  • Unless overridden the trajectory distances go along each each pair of spatial coordinates and will calculate a spatial distance.
  • Thus I can implement DTW-MJE by just overriding the trajectory and point-to-point spatial distance functions, and setting the normalize_poses to true. The resulting DistanceMetric should print something like "DynamicTimeWarping[normalize,.....][masked_euclidean]"

TODO: something similar for distancecalculators, so they also have a str

@cleong110
Copy link
Contributor Author

cleong110 commented Jan 21, 2025

I'm trying to think about how to organize and implement the Distance Calculations in a good way so that we can generate the signatures consistently. And I'm trying to understand the implementations from ham2pose.

It seems there's two main distance categories:

  1. element-wise, taking in two coordinates. For example euclidean, cityblock, or just straight subtraction/difference.
  2. trajectory-wise, taking in two sequences of coordinates. Mostly these consist of "mean element-wise", though not always.

For example, Average Position Error simply subtracts the two trajectories from each other (absolute difference), and then takes the mean. It could have a signature like: point_dist:squared_difference|traj_distance:mean_point. There's some preprocessing included as well, so the full signature here would be something like |APE|zero_pad_shorter|point_dist:squared_difference|traj_dist:mean_point`

def APE(trajectory1, trajectory2):
    if len(trajectory1) < len(trajectory2):
        diff = len(trajectory2) - len(trajectory1)
        trajectory1 = np.concatenate((trajectory1, np.zeros((diff, 3))))
    elif len(trajectory2) < len(trajectory1):
        trajectory2 = np.concatenate((trajectory2, np.zeros((len(trajectory1) - len(trajectory2), 3))))
    pose1_mask = np.ma.getmask(trajectory1)
    pose2_mask = np.ma.getmask(trajectory2)
    trajectory1[pose1_mask] = 0
    trajectory1[pose2_mask] = 0
    trajectory2[pose1_mask] = 0
    trajectory2[pose2_mask] = 0
    sq_error = np.power(trajectory1 - trajectory2, 2).sum(-1)
    return np.sqrt(sq_error).mean()

But notionally you could have a "distance" that didn't do a point-distance. For example you could imagine a (terrible) Distance which would just be DifferenceOfSums

def DifferenceOfSums(trajectory1, trajectory2):
    return np.sum(trajectory1, -1) - np.sum(trajectory2, -1)

In which case I would want the signature to be something like traj_dist:diff_of_trajectory_sums with no point_dist

I don't think we need to worry about a case where there's no traj distance. The whole point here is that we put in two pose trajectories and get one float out.

I think the answer here is to have a separate DistanceCalculator class that knows its own signature, so that we can have an APEDistance which gives us |point_dist:squared_difference| . But that also seems kind of redundant as I have much of that implemented within DistanceMetric?

"distances_aggregator": args.get("distances_aggregator", None),
"mask_strategy":args.get("mask_strategy", None),
}
)


Copy link
Contributor

@AmitMY AmitMY Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class PowerDistance(PointwiseDistance):
    def __init__(self, power: int = 2, default_distance=0):
        self.power = power
        self.default_distance = default_distance

    def __call__(self, p1: MaskedArray, p2: MaskedArray):
        return (p1 - p2).pow(self.power).abs().filled(self.default_distance).mean()

L2Distance = PowerDistance(power=2)
L1Distance = PowerDistance(power=1)

class APEDistance(PointwiseDistance):
    def __call__(self, p1: MaskedArray, p2: MaskedArray):
        return (p1 - p2).pow(2).sum(-1).sqrt().filled(default_distance).mean()

# Example usage
DistanceMetric(distance=L2Distance, alignment_strategy="pad"|"truncate"|"by-reference")

Copy link
Contributor

@AmitMY AmitMY Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class DTWMetric(BaseMetric):
  def __init__(self, distance: Distance, trajectory:"keypoints"|"frames", processors=List[Processor])
     self.__super__...

  def score(self, pose1: Pose, pose2: Pose):
    tensor1 = self.process(pose1)
    tensor2 = self.process(pose2)
    if self. trajectory == "keypoints":
       tensor1 = points_perspective(tensor1)
       tensor2 = points_perspective(tensor2)
       distance = 0
       for trajectory1, trajectory2 in zip(tensor1, tensor2): 
          distance += fastdtw(metric=self.distance, seq1=trajectory1, seq2=trajectory2)
       return distance
     return fastdtw(metric=self.distance, seq1=tensor1, seq2=tensor2)

def PoseSetMaskedToOrigin(pose):
    return pose.filled(0)

dtw_metric = DTWMetric(
    distance=L2Distance(),
    trajectory="keypoints",
    processors=[
        PoseNormalize(),
        HideLegs(),
        PoseSetMaskedToOrigin(),
        ReduceToCommonKeypoints(),
    ],
)
# Usage signature: dtw(distance=str(distance), processors="|".join(processors))

@cleong110
Copy link
Contributor Author

cleong110 commented Jan 27, 2025

Had a design discussion with Amit.

  • Do some cleanup: breakup the PR into pieces. Specifically...
  • all the pose_utils things. Pose utils and some tests.  #15
  • code for signatures
  • THEN the metrics
  • Base class does processing. No need for a bunch of options like normalize_pose, just standardize on the processors list.
  • DTW and DistanceMetric can both subclass from some third class, they're different enough that subclassing one from the other doesn't make sense.
  • DistanceMetric can have various alignment strategies like truncate, pad, etc. We don't have to include dtw. padding or truncating to match the reference is the default.
  • Make a Distance class, which is NOT a metric, which handles tensors/arrays in/out.
  • I don't think we need the AggregationStrategies. That's something for each Distance to decide.

@cleong110
Copy link
Contributor Author

One issue I'm running into is: if we calculate a separate distance for every keypoint's trajectory, we need to figure out how to aggregate those.

@cleong110
Copy link
Contributor Author

As of PR #19, this one is rendered moot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants