You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/baseline.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -65,7 +65,7 @@ Below is the loss on the training set. One can observe that the multi-task model
65
65
66
66
This is not surprising as they contain two orders of magnitude more datapoints and pose a significant challenge for the relatively small models used in this analysis. This favors the Single dataset setup (which uses a model of the same size) and we conjecture larger models to bridge this gap moving forward.
67
67
68
-
||| CE or BCE loss in single-task $\downarrow$ | CE or BCE loss in multi-task $\downarrow$ |
68
+
||| CE or MSE loss in single-task $\downarrow$ | CE or MSE loss in multi-task $\downarrow$ |
0 commit comments