Skip to content

Conversation

@lmmx
Copy link
Contributor

@lmmx lmmx commented Mar 6, 2025

This one is ready for review @ion-elgreco ! 🎉

Overview

  • Float32 support is provided (no longer implicitly converted to Float64 before distance computation)
  • UInt64 behaviour is preserved (previously they were implicitly cast to Float64, now we do this clearly)

Background - Squashed branch carried over from #31 renamed to reflect that it provides initial float32 and uint64 support for the array distance methods, build and tests are passing

Status I'd be happy to merge this and submit the further method coverage in a fresh PR. I think what's left is:

  • list expressions / list.rs
  • elementwise str f32? / string.rs

I expect the rest will be less work now, but I also note that there are no tests for those parts so perhaps it might turn out to be more due to having to introduce testing before making the changes.

More details

  • The type casting behaviour was previously: input uint64/float32/float64 all get converted to float64 output dtype. Now the input types go through a condition which essentially says "if both columns of the distance computation are float32 then preserve the float32 dtype, otherwise a mix of float32/float64 or just float64 will still give float64 output dtype".
    • We would expect that to be faster as a result for float32 (TODO: check)
  • I thought it made more sense to move the distance_calc_uint_inp and distance_calc_numeric_inp functions out of the expressions module and into the array module as it is only used for array dtype inputs
  • these were then refactored into a single vector_distance_calc which gets reused for everything (I'd like to do some further testing to ensure this does indeed retain the same behaviour)
  • previously the array module was just being used for the cosine, euclidean, and Minkowski distances: it is now used for all of them as the new refactored structure requires the distance functions to all go in the same 'slot' (specifically it requires a function that's generic over f32/f64/uint64 array dtypes)

@lmmx

This comment was marked as resolved.

@lmmx
Copy link
Contributor Author

lmmx commented Mar 7, 2025

OK wonderful, that should do it :-) I added a little more detail to the opening PR comment

This PR now looks sound to me, but it'd be good to have test coverage that can confirm that the behaviour is indeed preserved from the original implementation, for all the array functions

edit - test coverage supplied in 8128ba9

compute_array_distance(
x,
y,
"bray_curtis",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could maybe use an enum for this in the future

@ion-elgreco
Copy link
Owner

@lmmx great work!

@ion-elgreco ion-elgreco merged commit 9c5d0c2 into ion-elgreco:main Mar 7, 2025
11 checks passed
@lmmx lmmx deleted the feat-f32-uint64-support branch March 7, 2025 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants