Name	Name	Last commit message	Last commit date
parent directory ..
.Rhistory	.Rhistory
README.md	README.md
config.json	config.json
deploy.yaml	deploy.yaml
logo.png	logo.png

🤖 Akash Trainer

An openclaw-powered agent that let you train any ML/AI model on Akash GPUs with just a message. The agent would automate the whole training process. No setup. Just deploy.

Point it at a GitHub repo → it clones, installs, trains on a GPU, and pushes results back to your repo + close the deployment automatically.

Quick Start

Go to Akash Console → New Deployment → paste the deploy.yaml
Set three env vars:

Variable	Example
`REPO_URL`	`https://github.com/you/your-project.git`
`TRAIN_CMD`	`python3 train.py`
`GITHUB_TOKEN`	Your token with `repo` scope (create one here)

Deploy → accept a bid → done

Your trained model and outputs appear on GitHub as a new branch: trained-output/YYYYMMDD-HHMMSS

How It Works

You deploy on Akash
        ↓
  Clone your repo
        ↓
  Install all dependencies (auto-detected)
        ↓
  Verify GPU + CUDA works
        ↓
  Run your training command
        ↓
  Collect all new files (models, logs, metrics)
        ↓
  Push to GitHub as a new branch
        ↓
  Done. Close deployment.

Supports: PyTorch, TensorFlow, HuggingFace, JAX, and any Python ML framework. Private repos via GITHUB_TOKEN. Custom setup commands for anything else.

Environment Variables

Variable	Required	Description
`REPO_URL`	✅	GitHub repo URL
`TRAIN_CMD`	✅	Training command (e.g. `python3 train.py --epochs 50`)
`GITHUB_TOKEN`	Recommended	For auto-pushing results to GitHub
`REPO_BRANCH`	No	Branch to clone (default: `main`, falls back to `master`)
`SETUP_CMD`	No	Run before training (e.g. `pip3 install -e . && python3 download_data.py`)
`REQUIREMENTS`	No	Custom requirements file (default: `requirements.txt`)

Resource Guide

Pick the right size for your project:

Project	CPU	RAM	GPU	Storage	Est. Cost
MNIST / CIFAR demo	4	8Gi	1x any	20Gi	~$0.20-0.50
ResNet / YOLO / standard CV	8	16Gi	1x RTX4090+	50Gi	~$1-5
LLM fine-tuning (LoRA)	16	32Gi	1x A100/H100	100Gi	~$5-20

Edit the profiles.compute section in deploy.yaml to adjust.

Optional: `akash.yaml`

Drop an akash.yaml in your repo root to customize training without changing the deployment:

# All fields optional — only set what you need
train_cmd: python3 scripts/train.py --epochs 50 --lr 0.001
setup_cmd: pip3 install -e . && python3 scripts/download_data.py
requirements: requirements-gpu.txt
exclude:
  - data/raw/
  - "*.gz"
env:
  WANDB_API_KEY: "your-key"
  HF_TOKEN: "hf_xxx"

Without akash.yaml, the trainer auto-detects everything from your repo.

What Gets Pushed to GitHub

A new branch trained-output/YYYYMMDD-HHMMSS containing:

TRAINING_REPORT.md — duration, GPU used, files collected
models/ — saved model files
output/ — training outputs
All new files created during training (except downloaded datasets and caches)

If training fails → TRAINING_FAILED.md with the last 20 lines of error log is pushed instead, so you can debug without opening the Akash dashboard.

Files over 100MB are auto-excluded (GitHub limit).

Pre-installed

PyTorch 2.5.1 (CUDA 12.4) · TensorFlow · NumPy · Pandas · scikit-learn · matplotlib · OpenCV · W&B · TensorBoard · tqdm · rich

Your requirements.txt can add or override anything.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

🤖 Akash Trainer

Quick Start

How It Works

Environment Variables

Resource Guide

Optional: `akash.yaml`

What Gets Pushed to GitHub

Pre-installed

Links

FilesExpand file tree

akash-trainer

Directory actions

More options

Directory actions

More options

Latest commit

History

akash-trainer

Folders and files

parent directory

README.md

🤖 Akash Trainer

Quick Start

How It Works

Environment Variables

Resource Guide

Optional: akash.yaml

What Gets Pushed to GitHub

Pre-installed

Links

Optional: `akash.yaml`