Skip to content

Similarity trait Rust crate: compare input values, such as two or more items, then return an output value, such as a mesure of similarity, or correlation, or overlap.

License

Notifications You must be signed in to change notification settings

SixArm/similarity-trait-rust-crate

Repository files navigation

Similarity trait Rust crate

documentationsourcellms.txtcrateemail

The Similarity trait defines one function with one input and one output, so you can compare any kinds of input values and return any kind of output value.

We use this trait in our programs to create multiple kinds of similarity functionality, such as for trying various similarity algorithms that we want to use with the same input type and same output type.

For examples, please see the directory examples.

Similarity of a pair

One way to use this trait is to calculate the similarity of a pair of values, such as two numbers, or two strings, or two images.

This is sometimes known as pairwise similarity or pair matching.

Example: given two numbers, then return the percent change.

use similarity_trait::*;
struct MyStruct;

impl SimilarityIO<(i32, i32), Option<i32>> for MyStruct {
    fn similarity(input: (i32, i32)) -> Option<i32> {
        Some(input.1.checked_sub(input.0)?.abs())
    }
}

let absolute_difference = MyStruct::similarity((100, 120));
assert_eq!(absolute_difference, Some(20));

Similarity of a collection

One way to use this trait is to calculate the similarity of a collection of values, such as an array of numbers, or vector of strings, or set of images.

This is sometimes called intra-group similarity or statistical correlation.

Example: given numbers, then return the population standard deviation.

use similarity_trait::SimilarityIO;
struct MyStruct;

impl SimilarityIO<&Vec<f64>, Option<f64>> for MyStruct {
    /// Similarity of numbers via population standard deviation
    fn similarity(numbers: &Vec<f64>) -> Option<f64> {
        if numbers.is_empty() { return None }
        let mean = numbers.iter().sum::<f64>() / numbers.len() as f64;
        let variance = numbers.iter().map(|x| (x - mean).powi(2)).sum::<f64>() / numbers.len() as f64;
        Some(variance.sqrt())
    }
}

let numbers = vec![2.0, 4.0, 4.0, 4.0, 5.0, 5.0, 7.0, 9.0];
let population_standard_deviation = MyStruct::similarity(&numbers).expect("similarity");
assert!(population_standard_deviation > 1.999 && population_standard_deviation < 2.001);

For examples, please see the directory examples.

Similarity of a pair or a collection

You may want to choose whether you prefer to calculate the similarity of a pair (such as two strings) or a collection (such as a vector of strings).

Example: given a pair of strings, then return the Hamming distance.

use similarity_trait::SimilarityIO;
struct MyStruct;

impl SimilarityIO<(&str, &str), usize> for MyStruct {
    /// Similarity of a pair of strings via Hamming distance.
    fn similarity(pair: (&str, &str)) -> usize {
        pair.0.chars().zip(pair.1.chars()).filter(|(c1, c2)| c1 != c2).count()
    }
}

let pair = ("information", "informatics");
let hamming_distance = MyStruct::similarity(pair);
assert_eq!(hamming_distance, 2);

Example: given a collection of strings, then return the maximum Hamming distance.

use similarity_trait::SimilarityIO;
struct MyStruct;

impl SimilarityIO<Vec<&str>, usize> for MyStruct {
    /// Similarity of a collection of strings via maximum Hamming distance.
    fn similarity(collection: Vec<&str>) -> usize {
        let mut max = 0;
        for i in 0..collection.len() {
            for j in (i + 1)..collection.len() {
                max = std::cmp::max(max, collection[i].chars().zip(collection[j].chars()).filter(|(c1, c2)| c1 != c2).count())
            }
        }
        max
    }
}

let collection = vec!["information", "informatics", "affirmation"];
let maximum_hamming_distance = MyStruct::similarity(collection);
assert_eq!(maximum_hamming_distance, 5);

How to learn more

Wikipedia links:

Similarity research papers about patient matching:

About

Similarity trait Rust crate: compare input values, such as two or more items, then return an output value, such as a mesure of similarity, or correlation, or overlap.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages