PicoLM-Zig Ultra

Run a 1-billion parameter LLM on consumer hardware with zero dependencies. Pure Zig. High performance. Cross-platform.

echo "Explain gravity" | .\picolm.exe model.gguf --q8k -n 100 -j 8

PicoLM-Zig Ultra is a high-performance, minimalist inference engine for GGUF models, migrated to Zig 0.13.0. It delivers "Zigsteroids" performance with native support for Windows and Linux.

🚀 Key Features

⚡ Blazing Fast: Heavy use of SIMD (AVX2, FMA) and hardware-specific optimizations.
✨ Pure Zig: Minimal codebase, zero dependencies, lightning-fast compilation.
🪟 Windows & Linux: Native support with high-performance memory mapping fallbacks for Windows.
🔢 Multi-Quant Support: Full support for Q2_K, Q4_K, Q8_0, and Q8_K quantization modes.
🧠 Hardcore Optimizations:
- CCX Pinning: Optimized for Zen 2/3 architectures.
- NUMA Interleaving: Efficient memory access for dual-socket servers.
- Thermal Throttling: Intelligent monitoring to prevent hardware overheating.
🛠️ Self-Contained: No Python, no CUDA (unless explicitly enabled), just one binary.

🛠️ Installation & Build

Requires Zig 0.13.0+.

Build from source

# Clone the repository
git clone https://github.com/Cristian/picolm-on-zigsteroids.git
cd picolm-on-zigsteroids

# Build for your architecture (Optimized)
zig build -Doptimize=ReleaseFast

The binary will be available in zig-out/bin/picolm.

📖 Usage

Usage: picolm <model.gguf> [options]

Options:
  -p <prompt>      Input prompt
  -n <int>         Max tokens (default: 256)
  -t <float>       Temperature (default: 0.8)
  -k <float>       Top-p (default: 0.9)
  -s <int>         Seed (default: 42)
  -j <int>         Threads (default: auto)
  -c <int>         Context length
  --q2k, --q4k, --q80, --q8k  Quantization selection
  --info           Show model info
  --memory         Show memory breakdown
  --ccx-optimize   Zen 2/3 CCX pinning (AMD Ryzen)
  --numa-interleave Dual-socket NUMA optimization
  --thermal-limit  Enable thermal monitoring
  -h, --help       Show help

📈 Performance Tip

For maximum performance on AMD Ryzen processors, use the --ccx-optimize flag to pin threads to a single Core Complex, reducing latency.

PicoLM-Zig Ultra — Intelligence shouldn't require a data center.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
src		src
zig-out/bin		zig-out/bin
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.zig		build.zig
build.zig.zon		build.zig.zon

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PicoLM-Zig Ultra

🚀 Key Features

🛠️ Installation & Build

Build from source

📖 Usage

📈 Performance Tip

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PicoLM-Zig Ultra

🚀 Key Features

🛠️ Installation & Build

Build from source

📖 Usage

📈 Performance Tip

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages