ChainTrain is a full-stack framework that enables verifiable dataset and model lineage using the Sui blockchain, Walrus distributed storage, and Nautilus zero-knowledge proofs.
-
Today, AI models are built on datasets we cannot verify.
-
We don’t know whether the dataset used for training has been tampered with.
-
We cannot prove if the published model was actually trained on the dataset the creator claims.
-
There is no transparent lineage showing how datasets and models evolve over time.
This creates huge risks in safety, governance, compliance, and trust. The world is moving toward regulation and auditability, but AI data pipelines are still opaque and unverifiable.
In short, ChainTrain makes AI datasets and models provably trustworthy.
We ensure that what you train is what you claim — and what you publish is what you actually trained.
Core Verifiable Infrastructure:
. Dataset Registry Module . Merkle tree hashing . Walrus upload & download . Off-chain → on-chain submission flow . Simple lineage tracking . zk-proof generation via Nautilus . Proof submission to Sui . On-chain full verification . zk-circuit enhancements
UI & Governance:
. Dataset explorer . Lineage viewer . Models dashboard . Governance
-
User uploads/downloads datasets through ChainTrain.
-
ChainTrain backend:
- Computes Merkle root for dataset integrity.
- Uploads dataset to Walrus → receives blobId.
- Generates a zk-proof (Nautilus) for dataset correctness.
-
ChainTrain commits metadata to Sui blockchain:
- Merkle root
- Walrus blobId
- zk-proof reference
- Lineage (parent dataset/model)
-
User triggers model training:
- Training algorithm is committed to GitHub.
- Training code is deployed to a secure enclave (Nautilus on AWS).
- Enclave downloads dataset from Walrus.
- Enclave performs /process_data and trains the LLM.
-
Training enclave returns:
- Trained model
- Certification/proof artifact
-
ChainTrain registers trained model on Sui:
- Links it to dataset version (lineage)
- Stores proof metadata
-
Frontend displays:
- Dataset lineage
- Model lineage
- Governance
cd backend/dataset_registry/offchain
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
uvicorn server:app --reloadcd frontend
npm install
npm run devLicense
MIT License.