Here we document the main changes of the various models.
Indicated inference speed calculated by averaging 100 inferences (within one invocation) on an AMD Ryzen 9 7950X 16-Core Processor CPU.
- 216 possible tool's outputs, ~99% average accuracy, ~2ms inference speed.
- Difference with respect
standard_v3_1
: trained on a new (synthetic) dataset of CSV files to address a regression with CSV files (#983); model selection now uses minimal test loss instead of other heuristics.
- 216 possible tool's outputs.
- Overall same average accuracy of
standard_v3_0
, ~99%, but more robust detections of short textual input and improved detection of Javascript. - Inference speed: ~2ms (similar to
standard_v3_0
). - Augmentation techniques used during training: CutMix, which was used for
v1
but not forv2_1
; and "Random Snippet Selection", with which we train the model with random snippets extracted from samples in our dataset (this is only enabled for key textual content types).
- 216 possible tool's outputs.
- Overall same average accuracy of
standard_v2_1
, ~99%. - Inference speed: ~2ms (~3x faster than
standard_v2_1
, ~20% faster thanstandard_v1
).
- Support for 200+ content types, almost double what supported in
standard_v1
. - Overall average accuracy of ~99%.
- Inference speed: ~6.2ms, which is slower than
standard_v1
; Seefast_v2_1
in case you need something faster (at the price of less accuracy).
- Similar to
standard_v2_1
, but significantly faster (about 4x faster). - Overall average accuracy of ~98.5%.
- Initial release.
- Support for about 100 content types.
- Average accuracy 99%+.
- Inference speed: ~2.6ms.