-
Notifications
You must be signed in to change notification settings - Fork 63
Open
Description
NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?id_news_dataset
| Dataset | id_news_dataset |
|---|---|
| Description | The dataset compiles information from seven prominent Indonesian news platforms: Tempo, CNN Indonesia, CNBC Indonesia, Okezone, Suara, Kumparan, and JawaPos. Each source contributes a diverse range of articles, collectively forming a comprehensive repository of Indonesian news content. This dataset includes 2 special columns, 'embedding' which houses the text embeddings extracted using the OpenAI text-embedding-ada-002 model, and 'summary' which encapsulates the concise article summary generated via the ChatGPT API. |
| License | CC-BY-NC-4.0 |
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
No status