ChainTrain — Verifiable Dataset & Model Lineage on Sui

ChainTrain is a full-stack framework that enables verifiable dataset and model lineage using the Sui blockchain, Walrus distributed storage, and Nautilus zero-knowledge proofs.

Problem

Today, AI models are built on datasets we cannot verify.
We don’t know whether the dataset used for training has been tampered with.
We cannot prove if the published model was actually trained on the dataset the creator claims.
There is no transparent lineage showing how datasets and models evolve over time.

This creates huge risks in safety, governance, compliance, and trust. The world is moving toward regulation and auditability, but AI data pipelines are still opaque and unverifiable.

In short, ChainTrain makes AI datasets and models provably trustworthy.

We ensure that what you train is what you claim — and what you publish is what you actually trained.

Core Verifiable Infrastructure:

. Dataset Registry Module . Merkle tree hashing . Walrus upload & download . Off-chain → on-chain submission flow . Simple lineage tracking . zk-proof generation via Nautilus . Proof submission to Sui . On-chain full verification . zk-circuit enhancements

UI & Governance:

. Dataset explorer . Lineage viewer . Models dashboard . Governance

Architecture:

Flow

User uploads/downloads datasets through ChainTrain.
ChainTrain backend:
- Computes Merkle root for dataset integrity.
- Uploads dataset to Walrus → receives blobId.
- Generates a zk-proof (Nautilus) for dataset correctness.
ChainTrain commits metadata to Sui blockchain:
- Merkle root
- Walrus blobId
- zk-proof reference
- Lineage (parent dataset/model)
User triggers model training:
- Training algorithm is committed to GitHub.
- Training code is deployed to a secure enclave (Nautilus on AWS).
- Enclave downloads dataset from Walrus.
- Enclave performs /process_data and trains the LLM.
Training enclave returns:
- Trained model
- Certification/proof artifact
ChainTrain registers trained model on Sui:
- Links it to dataset version (lineage)
- Stores proof metadata
Frontend displays:
- Dataset lineage
- Model lineage
- Governance

Model provenance:

Data provenance:

How to run this locally:

Backend

cd backend/dataset_registry/offchain
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
uvicorn server:app --reload

Frontend

cd frontend
npm install
npm run dev

License

MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.idea		.idea
.oca		.oca
backend		backend
frontend		frontend
nautilus		nautilus
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitignore~		.gitignore~
ChainTrain.iml		ChainTrain.iml
ChainTrain.iml~		ChainTrain.iml~
Dockerfile		Dockerfile
README.md		README.md
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChainTrain — Verifiable Dataset & Model Lineage on Sui

Problem

Architecture:

Flow

Model provenance:

Data provenance:

How to run this locally:

Backend

Frontend

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ChainTrain — Verifiable Dataset & Model Lineage on Sui

Problem

Architecture:

Flow

Model provenance:

Data provenance:

How to run this locally:

Backend

Frontend

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages