Skip to content

Conversation

@mariannaparzych
Copy link
Contributor

This ticket ensures that CCT metric will not be sensitive to differences in whitespace (including newline).
All whitespaces in string are changed to single space " " in both GT and PRED before the metric is computed.

Additional changes in CHANGELOG due to auto-formatting.

Copy link
Contributor

@pawel-kmiecik pawel-kmiecik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

("text", "expected"),
[
(
"The dog\rloved the cat, but\t\n the cat\tloved the\n cow",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a case when some whitespaces are at the beginning and end of the string

@mariannaparzych mariannaparzych added this pull request to the merge queue Oct 24, 2024
Merged via the queue into main with commit aa5935b Oct 24, 2024
41 checks passed
@mariannaparzych mariannaparzych deleted the ml_384/whitespaces_in_cct branch October 24, 2024 13:40
temp-adelyn pushed a commit to temp-adelyn/unstructured that referenced this pull request Mar 3, 2025
This ticket ensures that CCT metric will not be sensitive to differences
in whitespace (including newline).
All whitespaces in string are changed to single space `" "` in both GT
and PRED before the metric is computed.

Additional changes in CHANGELOG due to auto-formatting.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants