Hi all -
I was wondering if you could publish the fixed per-language toxicity score thresholds used by the ToxicityBinaryClassifierFilter in the Apertus pipeline.
The Apertus paper (Section 3.1.3) states: "We filter the 5% of documents per language with the highest predicted toxicity scores from the pretraining corpus." The ToxicityBinaryClassifierFilter class in this repo accepts a threshold parameter that corresponds to the 95th percentile pre-computed on the full FineWeb/FineWeb-2 corpus.
However, I couldn't find the actual threshold values published anywhere.
Thanks in advance and great work on this project!
Hi all -
I was wondering if you could publish the fixed per-language toxicity score thresholds used by the
ToxicityBinaryClassifierFilterin the Apertus pipeline.The Apertus paper (Section 3.1.3) states: "We filter the 5% of documents per language with the highest predicted toxicity scores from the pretraining corpus." The
ToxicityBinaryClassifierFilterclass in this repo accepts a threshold parameter that corresponds to the 95th percentile pre-computed on the full FineWeb/FineWeb-2 corpus.However, I couldn't find the actual threshold values published anywhere.
Thanks in advance and great work on this project!