Skip to content

Speed up DALIA export #770

@haesleinhuepf

Description

@haesleinhuepf

Hi @lea-33 ,

I was just wondering how we could potentially speed up the DALIA export. Do I presume right, that the slow part of the conversion is related to that part?

from transformers import pipeline
model_ckpt = "papluca/xlm-roberta-base-language-detection"
pipe = pipeline("text-classification", model=model_ckpt)
...

Do you think it would be possible to move this part to a separate notebook, that we can run manually from time to time? It could add the language to entries in our yml file where language is not yet defined. In this way, we can cache language, and do not have to run this again and again.

Let me know what you think!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions