CLAUDE.md - AI Context for distro-train

Project Overview

distro-train is a decentralized federated learning platform inspired by karankoder/Federated-Learning. It creates a peer-to-peer marketplace connecting ML users (who have data/models but lack compute) with trainers (who have idle GPUs/CPUs).

Core Architecture

Technology Stack

P2P Networking: py-libp2p for decentralized communication
Storage: Akave O3 decentralized storage with presigned URLs
Blockchain: Hedera (smart contracts + consensus service)
Frontend: Node.js, React.js, Yarn
Backend: Python with virtual environment

Key Components

ML Users: Upload datasets/models, receive trained weights
Trainers: Provide compute power, earn by training jobs
Bootstrap Node: P2P network entry point
Client Nodes: Bridge between frontend and P2P swarm

Workflow

ML user uploads dataset → chunked and stored on Akave O3
Presigned URLs generated for each chunk (not raw data over P2P)
URLs shared across P2P network
Trainers fetch chunks, train locally, upload weights
Encrypted presigned URLs for weights published on Hedera
Final trained weights returned to ML user

Key Innovations

Presigned URLs: Lightweight data distribution (no massive payloads over P2P)
Trustless: Smart contracts handle escrow/payments
Fault Tolerant: Consensus service logs preserve state
Privacy: Data never leaves local devices

Environment Variables Required

AWS_ACCESS_KEY_ID          # Akave O3
AWS_SECRET_ACCESS_KEY      # Akave O3
OPERATOR_ID                # Hedera
OPERATOR_KEY               # Hedera
API_KEY, API_SECRET        # Auth
JWT_TOKEN                  # Auth
BOOTSTRAP_ADDR             # P2P bootstrap node
CONTRACT_ID, TOPIC_ID      # Hedera

Development Focus Areas

Federated learning aggregation algorithms (FedAvg, FedProx)
Byzantine fault tolerance for malicious nodes
Trainer selection and assignment logic
Quality verification for trained weights
Dynamic pricing mechanism
Model encryption schemes

When Helping with This Project

Focus on decentralized, trustless design patterns
Consider P2P network reliability and fault tolerance
Prioritize data privacy and security
Think about scalability with presigned URL approach
Reference py-libp2p, Akave O3, and Hedera documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md - AI Context for distro-train

Project Overview

Core Architecture

Technology Stack

Key Components

Workflow

Key Innovations

Environment Variables Required

Development Focus Areas

When Helping with This Project

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md - AI Context for distro-train

Project Overview

Core Architecture

Technology Stack

Key Components

Workflow

Key Innovations

Environment Variables Required

Development Focus Areas

When Helping with This Project