Speed up DALIA export

Hi @lea-33 ,

I was just wondering how we could potentially speed up the DALIA export. Do I presume right, that the slow part of the conversion is related to that part?

```
from transformers import pipeline
model_ckpt = "papluca/xlm-roberta-base-language-detection"
pipe = pipeline("text-classification", model=model_ckpt)
...
```

Do you think it would be possible to move this part to a  separate notebook, that we can run manually from time to time? It could add the `language` to entries in our yml file where language is not yet defined. In this way, we can cache language, and do not have to run this again and again.

Let me know what you think!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up DALIA export #770

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Speed up DALIA export #770

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions