Skip to content

Conversation

@stephaniereinders
Copy link
Member

@stephaniereinders stephaniereinders commented Mar 27, 2025

Update functions to allow users to input an optional second data frame of writer profiles. These features will allow the user to train model comparisons between short and long writing prompts.

…aframes

Distance functions now accept an optional second dataframe parameter. When provided, functions calculate distances between all pairs of rows
from the first and second dataframes, rather than within a single dataframe. The primary use case for this is to calculate distances between pairs of handwriting samples where one sample is short and the other is long, without also calculating the distances between two long samples or two short samples.
Fixed an issue where the ntrees parameter was defined but not used in the train_rf() function. The function was previously using a hardcoded number of trees regardless of the parameter value passed by the user.
…een two dataframes

train_rf() function now accepts an optional second dataframe parameter. When provided, train_rf() calculates distances between all pairs of rows from the first and second dataframes, rather than within a single dataframe. Then, a random forest is trained on the distances. The primary use case for this is to train a random forest on distances between pairs of handwriting samples where one sample is short and the other is long, without also calculating the distances between two long samples or two short samples.
    The get_ref_scores() function now accepts an optional second dataframe parameter. When provided, the funtion calculates similarity scores between all pairs of rows from the first and second dataframes, rather than within a single dataframe. The primary use case for this is to calculate scores between pairs of handwriting samples where one smple is short and the other is long, without also calculating the scores between two longsamples or two short samples.
…dataframes

compare_writer_profiles() now accepts an optional second dataframe parameter. When provided, the function compares all pairs of writer profiles from the first and second dataframes, rather than within a single dataframe. The primary use case for this is to compare handwriting samples where one sample is short and the other is long, without also comparing two long samples or two short samples.
More tests need to be performed before a random forest for short vs long comparisons is added to the handwriterRF package. short_vs_long.R script is now in the handwriterRF_short_vs_long repo for those tests.
Merge branch 'main' into 37-short-vs-long

# Conflicts:
#	tests/testthat/test-compare.R
@stephaniereinders stephaniereinders merged commit be0d812 into main Mar 27, 2025
1 of 6 checks passed
@stephaniereinders stephaniereinders deleted the 37-short-vs-long branch March 27, 2025 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Does training a random forest on short vs long samples improve the results on short vs long comparisons?

2 participants