Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make FrechetDistance generic over Metric Spaces #1274

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

grevgeny
Copy link
Contributor

@grevgeny grevgeny commented Nov 21, 2024

  • I agree to follow the project's code of conduct.
  • I added an entry to CHANGES.md if knowledge of this change could be valuable to users.

Part of #1181

@grevgeny
Copy link
Contributor Author

@michaelkirk What do you think about the following approach to FrechetDistance with new metric space setup for lines?

@grevgeny grevgeny marked this pull request as ready for review November 22, 2024 14:22
@lnicola
Copy link
Member

lnicola commented Dec 3, 2024

This is only a drive-by comment, but wouldn't it make sense to pass the metric as a value, e.g. g1.frechet_distance(&Euclidean, &g2)? I think some metrics can be parameterized. An example is Mahalanobis, which is somewhat like a skewed Euclidean, and takes a point and a covariance matrix.

@michaelkirk
Copy link
Member

Hi @grevgeny - this approach nicely extends frechet to other metric spaces. Thank you!

I have a preference for the API to be like this:

Euclidean::frechet_distance(&line_string_1, &line_string_2)

rather than

line_string.frechet_distance::<Euclidean>(&line_string_2)

They are equivalent at this point, but I have it in mind to one day make the metric spaces customizable (e.g. custom earth radius for Haversine) and I think that change will be a little cleaner if we approach it as suggested.

@lnicola
Copy link
Member

lnicola commented Dec 3, 2024

We could go directly for Euclidean.frechet_distance(&line_string_1, &line_string), then.

@michaelkirk
Copy link
Member

michaelkirk commented Dec 3, 2024

We could go directly for Euclidean.frechet_distance(&line_string_1, &line_string, then.

I'm not sure what you're talking about — could you be more complete in your thought?

If you're talking about (someday) customizing metric spaces, I was imaging something like:

struct Haversine { meters_radius: f64 };

impl Default for Haversine {
  fn default() -> Self {
    Self { meters_radius: 6_378_137 }
  }
}

trait FrechetDistance: Default , Distance<Point, Point> {
  fn frechet_distance(lhs, rhs) -> F {
    Self::default()::custom_frechet_distance(lhs, rhs)
  }
  fn custom_frechet_distance(&self, lhs: &LineString, rhs: &LineString) -> F {
     // more or less the same impl as it exists today
  }
}

// stay concise for the vast majority (I think?) of people who don't want to customize anything
Haversine:::frechet_distance(&line_string_1, &line_string)

// Offer a more verbose API for the 0.5% of users who want to customize the metric space
let mars_haversine = Haversine::with_radius(3_389_500);
mars_haversine.custom_frechet_distance(&line_string_1, &line_string)

But probably more thought is due.

@lnicola
Copy link
Member

lnicola commented Dec 3, 2024

Yeah, something like that. In my example, Euclidean is an empty struct, and there's a single trait method, which takes &self. I imagine we could also make constants for the more common parameterizations.

@michaelkirk
Copy link
Member

michaelkirk commented Dec 3, 2024

Euclidean isn't a very illuminating example, because I don't think you'd ever want to customize it, so let's talk about Haversine.

... maybe this would be better:

trait FrechetDistance: Distance<Point, Point> {
  fn frechet_distance(&self, lhs: &LineString, rhs: &LineString) -> F {
     // more or less the same impl as it exists today
  }
}

pub struct Haversine;
pub struct CustomHaversine { radius_meters: f64 };

impl Haversine {
  pub fn with_radius(radius_meters: f64) -> CustomHaversine {
    // Is it weird that this returns something other than `Self`?
    CustomHaversine { radius_meters }
  }
  pub fn wgs84() -> CustomHaversine {
    CustomHaversine { radius_meters: 6_378_137 }
  }
  pub fn earth_mean_radius() -> CustomHaversine {
    CustomHaversine { radius_meters: 6_371_000 }
  }
}

impl Distance<Point, Point> for CustomHaversine {
  fn to_radians(degrees: f64) -> f64 {
    degrees * PI / 180.0
  }
  fn distance(&self, lhs: &Point, rhs: &Point) -> F {
     let lat1 = Self::to_radians(lat1);
     let lon1 = Self::to_radians(lon1);
     let lat2 = Self::to_radians(lat2);
     let lon2 = Self::to_radians(lon2);

     let dlat = lat2 - lat1;
     let dlon = lon2 - lon1;

     let a = (dlat / 2.0).sin().powi(2) + lat1.cos() * lat2.cos() * (dlon / 2.0).sin().powi(2);
     let c = 2.0 * a.sqrt().atan2((1.0 - a).sqrt());

    // uses custom radius
     self.radius_meters * c  
  }
}

impl Distance<Point, Point> for Haversine {
 fn distance(&self, lhs: &Point, rhs: &Point) -> F {
    // assume Earth default radius
    CustomHaversine::wgs84().distance(lhs, rhs)
  }
}

// stays concise for the vast majority (I think?) of people who don't want to customize anything
Haversine.frechet_distance(&line_string_1, &line_string_2)

// Offer a slightly more verbose API for the 0.5% of users who want to customize the metric space
let mars_haversine = Haversine::with_radius(3_389_500);
mars_haversine.frechet_distance(&line_string_1, &line_string_2)

vs. my previous suggestion, this would avoid the weird distinction between frechet_distance and custom_frechet_distance. For something like Euclidean this distinction probably wouldn't even make sense.

Sorry to derail your PR review @grevgeny - it's kind of related though. 😅

@grevgeny grevgeny force-pushed the feat/frechet-in-various-spaces branch from f5dce87 to dbdafdf Compare January 7, 2025 16:29
@grevgeny
Copy link
Contributor Author

grevgeny commented Jan 7, 2025

@michaelkirk @lnicola Thanks for your review!

Should I then adjust my code based on your latest comment @michaelkirk ?

@michaelkirk
Copy link
Member

I'm not sure. I'm also looking for some design feedback on my proposal.

(/cc @lnicola too)

Do you think the approach I laid out above makes sense - I think it's what you were getting at @lnicola, but I wanted to use a non-euclidean example.

It would be a breaking change to move all the recently introduced LineMeasure implementations from (e.g.) Haversine::distance to Haversine.distance, but maybe worth it for the functionality of being able to customize.

@michaelkirk
Copy link
Member

Sorry to keep you in a holding pattern @grevgeny.

The reason I'm holding up the train is because I wanted to make some breaking changes to the line measures stuff to support customizable metric spaces.

Rather than release this, and then immediately break it, I'd prefer to make those API-breaking changes first, then adjust this PR.

I've just now opened the PR with those breaking changes here: #1298 - if/when that gets merged, I'd like to see this PR updated to look like:

Haversine.frechet_distance(line_string_1, line_string_2)
Euclidean.frechet_distance(line_string_1, line_string_2)
CustomHaversine::new(3_389_500.0).frechet_distance(line_string_1, line_string_2)

But maybe just hold off for now - I'm not sure how controversial #1298 will be. Feel free to weigh in over there.

@lnicola
Copy link
Member

lnicola commented Jan 11, 2025

Do you think the approach I laid out above makes sense - I think it's what you were getting at @lnicola, but I wanted to use a non-euclidean example.

Haversine.frechet_distance(&line_string_1, &line_string_2);
Haversine::with_radius(3_389_500).frechet_distance(&line_string_1, &line_string_2);

I.. haven't given this much thought, but the following feel lighter to me compared to having both Haversine and CustomHaversine:

HAVERSINE_EARTH.frechet_distance(&line_string_1, &line_string_2);
// or
Haversine::default().frechet_distance(&line_string_1, &line_string_2);
// or
Haversine::earth().frechet_distance(&line_string_1, &line_string_2);

Haversine::with_radius(3_389_500.0).frechet_distance(&line_string_1, &line_string_2);

@michaelkirk
Copy link
Member

michaelkirk commented Jan 13, 2025

I.. haven't given this much thought, but the following feel lighter to me compared to having both Haversine and CustomHaversine:

HAVERSINE_EARTH.frechet_distance(&line_string_1, &line_string_2);
// or
Haversine::default().frechet_distance(&line_string_1, &line_string_2);
// or
Haversine::earth().frechet_distance(&line_string_1, &line_string_2);

Haversine::with_radius(3_389_500.0).frechet_distance(&line_string_1, &line_string_2);

Yes, we can remove the distinction between Haversine and CustomHaversine, and say that every Haversine is customizable. This would be simpler because it removes a concept, which is generally good.

However, almost everyone wants the non-configurable Haversine, and furthermore most people would probably be slightly hindered rather than helped by the knowledge that Haversine even can be configured. That's why I thought it was worth having the second, mostly redundant concept.

Your example of capturing this in a constant e.g. HAVERSINE_EARTH is one approach, but for matters of taste I think:

// I prefer this
Haversine.frechet_distance(&line_string_1, &line_string_2);

// over this
HAVERSINE_EARTH.frechet_distance(&line_string_1, &line_string_2);

I prefer it because it prioritizes removing (a little bit) of friction from the much more common case.

@lnicola
Copy link
Member

lnicola commented Jan 13, 2025

I agree, that makes sense. How about HAVERSINE.frechet_distance(&line_string_1, &line_string_2);, then? Do you feel that the extra caps are worse than the Haversine / CustomHaversine split?

@michaelkirk
Copy link
Member

I'm... bordering on indifferent, which maybe means it's a good compromise! 😆

@michaelkirk
Copy link
Member

michaelkirk commented Jan 13, 2025

To be clear, I think this implies also:

HAVERSINE.distance(point_1, point_2)
GEODESIC.distance(point_1, point_2)
RHUMB.distance(point_1, point_2)

//  Euclidean is not customizable (right????), so it could just be:
Euclidean.distance(point_1, point_2)
// But maybe for consistency it should redundantly be defined as:
EUCLIDEAN.distance(point_1, point_2)

@lnicola
Copy link
Member

lnicola commented Jan 13, 2025

Euclidean doesn't need to be customizable, but I'd argue for EUCLIDEAN, for consistency. If we ever consider adding Mahalanobis (from my previous example), it could be argued it's more efficient to just transform the data. But the other three could, in principle, be customized.

While Metric / CustomMetric is a bit annoying to implement, it's not the end of the world either, and I guess we're unlikely to add a lot of other metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants