Publish per-language toxicity thresholds for reproducibility

Hi all -
I was wondering if you could publish the fixed per-language toxicity score thresholds used by the [`ToxicityBinaryClassifierFilter`](https://github.com/swiss-ai/pretrain-data/blob/8af990b9401101cf95acd02b066ed0c449789126/src/data_pipeline_pretrain/pipeline/filters/toxic_filter.py#L157) in the Apertus pipeline.

The Apertus paper (Section 3.1.3) states: "We filter the 5% of documents per language with the highest predicted toxicity scores from the pretraining corpus." The `ToxicityBinaryClassifierFilter` class in this repo accepts a threshold parameter that corresponds to the 95th percentile pre-computed on the full FineWeb/FineWeb-2 corpus.

However, I couldn't find the actual threshold values published anywhere. 

Thanks in advance and great work on this project!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Publish per-language toxicity thresholds for reproducibility #5

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Publish per-language toxicity thresholds for reproducibility #5

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions