Skip to content

Further thoughts on library architecture and design: composable pipeline with constructed signatures #14

@cleong110

Description

@cleong110

Jotting down some notes based on slack discussions.

Composable pipeline, composed signature

We would like to be able to compose pieces together into metrics, and we would like the metrics to have signatures like sacrebleu does, which fully describe the relevant details to get the same results.

So we start with subclasses of DistanceMetric with their various options, then add more details, and compose like so:

DTW_MJE = DTWMetric(DistanceMetric(normalize=True, ignore_mask=False, kin='euclidean'))
nDTW_MJE = DTWMetric(DistanceMetric(normalize=True, ignore_mask=True, kin='euclidean'))

And then the signatures could be constructed as well:

dtw(example_scale=1|metric=(distance|normalize|remove_legs|...))

Or, for example, our EmbeddingMetric currently works for single signs. By composing it with SegmentedMetric we can create SegmentedMetric(EmbeddingMetric())

Or for another example, we could do something like SegmentedMetric(DTWMetric(DistanceMetric(kind="euclidean")))

Signature format

Just for reference, this is what sacrebleu signatures look like:

 - BLEU       nrefs:1|case:mixed|eff:no|tok:13a|smooth:exp|version:2.0.0
 - chrF2      nrefs:1|case:mixed|eff:yes|nc:6|nw:0|space:no|version:2.0.0

Presumably we'd like something similar, but reflecting the transformations, processing, etc that occur.

How does SacreBleu do it?

Relevant code is at https://github.com/mjpost/sacrebleu/blob/master/sacrebleu/metrics/base.py

There's Signature in there, but also Metric and Score.

  • Basically Signature is a dict with a class wrapped around it, with methods that help you update and output
  • Scores are numbers, but with lots of information and convenience methods so that you can know what that number means, how it was calculated, what the confidence is, etc. Also a way to output verbose/pretty.
  • Metric is the thing that calculates and outputs Scores, apparently. And each one has a corresponding Score and Signature.

####Pros and cons:
nice ideas here.

  • pro: I like the __SIGNATURE_TYPE member of a class
  • I like the use of ABCs (abstract base classes)
  • get_signature

Cons:

  • not exactly composable the way we're envisioning.

How to design this?

To have a composable thing like this, each building block needs to take the output of the other building blocks.
When we compose pieces together, we want the resulting signature to represent that composition.

what do the building blocks need to do?

  • signature: Fundamentally each building block needs to store or generate on demand some sort of signature information, which describes the relevant name and settings, which can be used to construct the signature output.
  • Have some way to define what inputs/outputs it needs.

what are some potential building blocks?

So first of all, here are some base pipeline building blocks, continuing the thoughts from #13

  • Pose (Pre-)Processors. They take in a Pose and do some sort of transformation to them, with the output being also a Pose. Things like masking, padding, or but also format changes like dropping keypoints. Theoretically Dynamic Time Warping counts
  • Pose Conversion/Transformation: take the pose and output something that is NOT a pose. Embedding or encoding is an example.
  • Coordinate Pair Processor: Things that take two coordinates (x0, y0, [z0]) and (x1, y1, [z1]), and do something. Euclidean distance is an example.
  • Coordinate Pair Distance: Takes two coordinates and outputs a float.
  • Trajectory Pair Processor: take two 1D arrays of Coordinates and do something. Mostly this means output a float distance, but it could theoretically include Dynamic Time Warping as well
  • Trajectory Pair Distance: takes two 1D arrays of Coordinates and outputs a float.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions