feat(array): `f32` and `uint64` support #34

lmmx · 2025-03-06T18:59:45Z

This one is ready for review @ion-elgreco ! 🎉

Overview

Float32 support is provided (no longer implicitly converted to Float64 before distance computation)
UInt64 behaviour is preserved (previously they were implicitly cast to Float64, now we do this clearly)

Background - Squashed branch carried over from #31 renamed to reflect that it provides initial float32 and uint64 support for the array distance methods, build and tests are passing

Status I'd be happy to merge this and submit the further method coverage in a fresh PR. I think what's left is:

list expressions / list.rs
elementwise str f32? / string.rs

I expect the rest will be less work now, but I also note that there are no tests for those parts so perhaps it might turn out to be more due to having to introduce testing before making the changes.

More details

The type casting behaviour was previously: input uint64/float32/float64 all get converted to float64 output dtype. Now the input types go through a condition which essentially says "if both columns of the distance computation are float32 then preserve the float32 dtype, otherwise a mix of float32/float64 or just float64 will still give float64 output dtype".
- We would expect that to be faster as a result for float32 (TODO: check)
I thought it made more sense to move the distance_calc_uint_inp and distance_calc_numeric_inp functions out of the expressions module and into the array module as it is only used for array dtype inputs
these were then refactored into a single vector_distance_calc which gets reused for everything (I'd like to do some further testing to ensure this does indeed retain the same behaviour)
previously the array module was just being used for the cosine, euclidean, and Minkowski distances: it is now used for all of them as the new refactored structure requires the distance functions to all go in the same 'slot' (specifically it requires a function that's generic over f32/f64/uint64 array dtypes)

polars_distance/src/expressions.rs

…e functions

…and cosine dists

…rray crate

lmmx · 2025-03-07T15:16:50Z

OK wonderful, that should do it :-) I added a little more detail to the opening PR comment

This PR now looks sound to me, but it'd be good to have test coverage that can confirm that the behaviour is indeed preserved from the original implementation, for all the array functions

edit - test coverage supplied in 8128ba9

ion-elgreco · 2025-03-07T15:39:13Z

polars_distance/src/expressions.rs

+    compute_array_distance(
+        x, 
+        y, 
+        "bray_curtis",


Could maybe use an enum for this in the future

ion-elgreco · 2025-03-07T15:39:40Z

@lmmx great work!

feat: float32 and uint64 support for array functions

e61fc46

This was referenced Mar 6, 2025

Float64 faster than Float32 #17

Closed

feat: support float32 datatype #31

Closed

ion-elgreco reviewed Mar 6, 2025

View reviewed changes

polars_distance/src/expressions.rs Outdated Show resolved Hide resolved

polars_distance/src/expressions.rs Outdated Show resolved Hide resolved

lmmx added 2 commits March 6, 2025 21:00

refactor(expressions): simplify type inference and casting in distanc…

ab3a043

…e functions

fix: lifetimes corrected

4dfee8d

This comment was marked as resolved.

Sign in to view

lmmx added 9 commits March 7, 2025 00:14

fix: correct minkowski

c3b42e0

fix(minkowski_dist): missed Float trait

23c4734

refactor: rely on array crate imports

862c19f

fix: add back more imports

cd20059

fix: array crate imports

848aadf

chore: delete dead code (superseded by vector_distance_calc)

621268e

chore: drop now-unused polars_arrow::array::Array import

ea50dee

refactor(distances): use vector_distance_calc function for euclidean …

c0b82a3

…and cosine dists

revert(distances): restore accelerated distance metric functions in a…

c52db03

…rray crate

test(array): cover float32 and uint64 variants

8128ba9

ion-elgreco reviewed Mar 7, 2025

View reviewed changes

polars_distance/src/expressions.rs

compute_array_distance(

x,

y,

"bray_curtis",

Copy link

Owner

ion-elgreco Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could maybe use an enum for this in the future

ion-elgreco approved these changes Mar 7, 2025

View reviewed changes

ion-elgreco merged commit 9c5d0c2 into ion-elgreco:main Mar 7, 2025
11 checks passed

lmmx deleted the feat-f32-uint64-support branch March 7, 2025 17:07

This was referenced Mar 18, 2025

[Feature Request] float32 support for list distances #35

Open

[Feature Request] float32 support for string distances #36

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(array): `f32` and `uint64` support #34

feat(array): `f32` and `uint64` support #34

Uh oh!

lmmx commented Mar 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

lmmx commented Mar 7, 2025 •

edited

Loading

Uh oh!

ion-elgreco Mar 7, 2025

Uh oh!

ion-elgreco commented Mar 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

feat(array): f32 and uint64 support #34

feat(array): f32 and uint64 support #34

Uh oh!

Conversation

lmmx commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

More details

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

lmmx commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ion-elgreco Mar 7, 2025

Choose a reason for hiding this comment

Uh oh!

ion-elgreco commented Mar 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(array): `f32` and `uint64` support #34

feat(array): `f32` and `uint64` support #34

lmmx commented Mar 6, 2025 •

edited

Loading

lmmx commented Mar 7, 2025 •

edited

Loading