Skip to content

Commit 6d914cb

Browse files
Merge pull request #215 from TomSteer1/main
Fix CER calculation in chapter 5
2 parents 7789171 + 7cd2c94 commit 6d914cb

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

chapters/en/chapter5/evaluation.mdx

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -116,22 +116,22 @@ individual characters, and annotate errors on a character-by-character basis:
116116
| Reference: | t | h | e | | c | a | t | | s | a | t | | o | n | | t | h | e | | m | a | t |
117117
|-------------|-----|-----|-----|-----|-----|-----|-----|-----|-----|-------|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
118118
| Prediction: | t | h | e | | c | a | t | | s | **i** | t | | o | n | | t | h | e | | | | |
119-
| Label: |||| |||| || S || ||| |||| | D | D | D |
119+
| Label: |||| |||| || S || ||| |||| D | D | D | D |
120120

121121
We can see now that for the word "sit", the "s" and "t" are marked as correct. It's only the "i" which is labelled as a
122122
substitution error (S). Thus, we reward our system for the partially correct prediction 🤝
123123

124-
In our example, we have 1 character substitution, 0 insertions, and 3 deletions. In total, we have 14 characters. So, our CER is:
124+
In our example, we have 1 character substitution, 0 insertions, and 4 deletions. In total, we have 22 characters. So, our CER is:
125125

126126
$$
127127
\begin{aligned}
128128
CER &= \frac{S + I + D}{N} \\
129-
&= \frac{1 + 0 + 3}{17} \\
130-
&= 0.235
129+
&= \frac{1 + 0 + 4}{22} \\
130+
&= 0.227
131131
\end{aligned}
132132
$$
133133

134-
Right! We have a CER of 0.235, or 23.5%. Notice how this is lower than our WER - we penalised the spelling error much less.
134+
Right! We have a CER of 0.227, or 22.7%. Notice how this is lower than our WER - we penalised the spelling error much less.
135135

136136
## Which metric should I use?
137137

0 commit comments

Comments
 (0)