ARABIC: Wrong language or dialect or script

Hey team,
As part of the Argilla FineWeb-C sprint, we are annotating arabic and its dialects. MSA, ARY and ARZ.
The problem for all of them is being miscallafied most of the time. 
For example, annotators in arabic report most data is not arabic but rather dialects with a lot of Arabizi (usage of latin script). In dialects, people report that most of the samples are in fact in arabic MSA !
This mismatch leads to labeling most of the data as problematic.
cc: @nataliaElv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARABIC: Wrong language or dialect or script #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ARABIC: Wrong language or dialect or script #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions