PicoLM

Minimal CUDA inference engine for LLMs on consumer GPUs.

Built for OpenCLAW — robotics inference on a single card.

Quick Start

pip install -e .

# Run inference (BF16, greedy decode)
python main.py \
  --model-path ~/models/Llama-3.2-1B \
  --model-type llama-3.2-1b \
  --prompt "Hello, world!"

Supported Models

Dense:

Model	Params	VRAM (BF16)
Llama 3.2 1B	1.2B	~2.4 GB
Llama 3.2 3B	3.2B	~6.4 GB
Qwen3 0.6B	0.6B	~1.2 GB
Qwen3 1.7B	1.7B	~3.4 GB
Qwen3 4B	4B	~8 GB

Precision Roadmap

Precision	GPU	Status
BF16	sm_80+	MVP
FP8 (E4M3)	sm_89+ (RTX 40x0)	planned
FP4 (E2M1)	sm_120 (RTX 50x0)	planned

No quantization pipeline — loads pre-quantized checkpoints from HuggingFace (NVFP4 format).

Architecture

Python + CUDA hybrid — Python controls flow, CUDA does compute:

picolm/
├── model.py          # ModelConfig + Transformer forward pass
├── weights.py        # safetensors loader + HF weight mapping
├── tokenizer.py      # HuggingFace AutoTokenizer wrapper
├── generate.py       # generation loop (greedy/sampling)
└── kernels/
    ├── gemm_sm12x.cuh  # TMA warp-specialized GEMM (from PTX-Forge)
    ├── gemm_sm8x.cuh   # cooperative ldgsts GEMM (from PTX-Forge)
    └── gemm_api.cuh    # GEMM dispatch API
main.py               # CLI entry point

Non-Goals

General-purpose serving (use vLLM/ollama for that)
Quantization (use TensorRT Model Optimizer, then load the checkpoint here)
Multi-GPU / tensor parallelism

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
picolm		picolm
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Makefile		Makefile
README.md		README.md
main.py		main.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PicoLM

Quick Start

Supported Models

Precision Roadmap

Architecture

Non-Goals

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PicoLM

Quick Start

Supported Models

Precision Roadmap

Architecture

Non-Goals

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages