Add KantaHayashiAI/ClimbLab-Ja as a Datakit source by claude[bot] · Pull Request #5741 · marin-community/marin

claude · 2026-05-14T17:39:43Z

Registers climblab-ja (~300B Japanese tokens, 201M rows / 480 GB parquet) as a single-source datakit entry. Derived from LLM-jp Corpus v4 with Nemotron-ClimbLab style semantic clustering and per-document quality / value scores; those columns pass through normalize so downstream consumers can re-filter without re-deriving them. License: ODC-BY.

Fixes #5740

Registers climblab-ja (~300B Japanese tokens, 201M rows / 480 GB parquet) as a single-source datakit entry. Derived from LLM-jp Corpus v4 with Nemotron-ClimbLab–style semantic clustering and per-document quality / value scores; those columns pass through normalize so downstream consumers can re-filter without re-deriving them. License: ODC-BY.

claude Bot added the agent-generated Created by automation/agent label May 14, 2026

claude Bot mentioned this pull request May 14, 2026

Ingest KantaHayashiAI/ClimbLab-Ja in Datakit #5740

Closed

Helw150 and others added 3 commits May 15, 2026 12:39

Add ClimbLab-Ja tokenize step and update token count to 371.92B

c875434

ruff: reorder imports in climblab_ja tokenize module

efb23e5

Merge branch 'main' into agent/20260514-fix-5740

9b2a47b

Helw150 approved these changes May 15, 2026

View reviewed changes

Helw150 merged commit e0d23b3 into main May 15, 2026
29 checks passed

Helw150 deleted the agent/20260514-fix-5740 branch May 15, 2026 20:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add KantaHayashiAI/ClimbLab-Ja as a Datakit source#5741

Add KantaHayashiAI/ClimbLab-Ja as a Datakit source#5741
Helw150 merged 4 commits into
mainfrom
agent/20260514-fix-5740

claude Bot commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

claude Bot commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant