Skip to content

seedspirit/nano-backend.ai

Repository files navigation

Nano Backend.AI

A small Go backend for an agent-native fine-tuning ledger.

Phase 0 is intentionally narrow: it targets a single machine with 2x RTX 3090 GPUs and runs one-GPU LoRA fine-tuning jobs from validated presets. The goal is not to expose a generic job runner; the goal is to make training runs submit-able, inspectable, and reproducible by AI agents.

See SPEC.md for the full MVP contract.

Agent Guidance

CLAUDE.md is the canonical agent instruction file for this repository. Agents that start from AGENTS.md should treat it as a pointer to CLAUDE.md and then follow the same shared rules.

Use .claude/skills/README.md for the available project workflows and skills.

MVP Goals

  • Accept declarative run drafts built from preset refs and option parameters
  • Validate presets and option policies before consuming queue or GPU capacity
  • Persist every run in a local SQLite ledger
  • Execute at most two single-GPU runs concurrently
  • Keep Docker behind an agent-side workload backend
  • Preserve logs, config, metrics, and artifacts for every terminal run
  • Make failures machine-readable through explicit run states and failure_reason

API Design Philosophy

AI agents are the primary consumer. Responses should be machine-readable first: structured JSON envelopes, endpoint-specific data payloads, explicit statuses, stable error.code values, and clear next-step hints where useful.

Long-running operations expose pollable resources. For Phase 0, logs use cursor-based polling rather than WebSockets.

Phase 0 Architecture

RunDraft
  -> API preflight validation
  -> preset registry / spec builder
  -> immutable spec.Spec
  -> SQLite run ledger
  -> ScheduleCoordinator
  -> WorkloadProvisioner / GPU claim
  -> WorkloadPlan
  -> WorkloadLauncher
  -> DockerWorkloadBackend
  -> local artifact store
Component Role
HTTP API Submit and inspect runs, logs, and artifacts
Spec builder Resolve preset refs, validate option parameters, and finalize immutable specs
ScheduleCoordinator Own run lifecycle transitions and terminal reconciliation
WorkloadProvisioner FIFO scheduling, 2-GPU assignment, and workload plan construction
WorkloadLauncher Manager-side port for prepare/start/cleanup calls
DockerWorkloadBackend Agent-side Docker container materialization and observation
SQLite Durable source of truth for projects, runs, and artifact metadata
Local artifact store Stores specs, resolved configs, logs, metrics, reports, adapters

Tech Stack

  • Language: Go
  • External API: HTTP + JSON REST
  • Database: SQLite for Phase 0
  • Workload substrate: Agent-side Docker backend behind REST/HTTP and a Go port
  • Storage: Local filesystem artifact store

Future Architecture Notes

Postgres, Redis/Valkey hints, alternate manager-agent transports, multi-node scheduling, and richer cancellation/orphan cleanup semantics are future architecture directions, not Phase 0 requirements.

Non-Goals (MVP)

  • Multi-tenant quota or policy enforcement
  • Distributed training
  • Kubernetes native integration
  • Real-time serving orchestration
  • Web UI or dashboard
  • Advanced scheduling or bin-packing
  • Webhook or notification system
  • W&B SaaS integration
  • Cancel API implementation, deferred to Phase 2

Project Layout

├── CLAUDE.md          # Canonical AI agent guidelines
├── AGENTS.md          # Pointer for agents that read AGENTS.md first
├── cmd/               # Binary entry points
├── internal/          # Private packages
├── docs/              # Design, education, and learning notes
├── SPEC.md            # Phase 0 MVP specification
└── Makefile           # Build, test, lint, fmt targets

License

TBD

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors