Skip to content

Add KantaHayashiAI/ClimbLab-Ja as a Datakit source#5741

Merged
Helw150 merged 4 commits into
mainfrom
agent/20260514-fix-5740
May 15, 2026
Merged

Add KantaHayashiAI/ClimbLab-Ja as a Datakit source#5741
Helw150 merged 4 commits into
mainfrom
agent/20260514-fix-5740

Conversation

@claude
Copy link
Copy Markdown
Contributor

@claude claude Bot commented May 14, 2026

Registers climblab-ja (~300B Japanese tokens, 201M rows / 480 GB parquet) as a single-source datakit entry. Derived from LLM-jp Corpus v4 with Nemotron-ClimbLab style semantic clustering and per-document quality / value scores; those columns pass through normalize so downstream consumers can re-filter without re-deriving them. License: ODC-BY.

Fixes #5740

Registers climblab-ja (~300B Japanese tokens, 201M rows / 480 GB parquet)
as a single-source datakit entry. Derived from LLM-jp Corpus v4 with
Nemotron-ClimbLab–style semantic clustering and per-document quality /
value scores; those columns pass through normalize so downstream
consumers can re-filter without re-deriving them. License: ODC-BY.
@claude claude Bot added the agent-generated Created by automation/agent label May 14, 2026
@Helw150 Helw150 merged commit e0d23b3 into main May 15, 2026
29 checks passed
@Helw150 Helw150 deleted the agent/20260514-fix-5740 branch May 15, 2026 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-generated Created by automation/agent

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ingest KantaHayashiAI/ClimbLab-Ja in Datakit

1 participant