Topo CPU AI Chat

This project is a Topo template and follows the Topo Template Format Specification.

Complete LLM chat application optimized for Arm CPU inference.

Features: SVE, NEON

Overview

This project demonstrates running large language models on CPU using llama.cpp compiled with Arm baseline optimizations and accelerated using NEON SIMD and SVE (when supported and enabled).

The stack includes:

llama.cpp server with Arm NEON optimizations (SVE optional)
Quantized SmolLM2-135M-Instruct model bundled in the image
Simple web-based chat interface
No GPU required - pure CPU inference

Prerequisites

Arm Hardware: An Arm system (physical or virtual). Note that SVE support in llama.cpp requires an Armv8.2-A (or newer) CPU with the SVE extension.
Docker: For container orchestration with Topo
LLM Model: Optional when overriding the bundled default; provide a supported single-file GGUF model (e.g., Llama 3.1, Mistral, etc.)

Note: MODEL must point to a supported single-file .gguf model artifact. Use a Hugging Face repo ID to auto-select a CPU-friendly quantization (preferring Q4_K_M), a Hugging Face repo plus exact filename as <repo>:<filename>, or a direct .gguf URL. Sharded GGUFs and multimodal projector files (mmproj) are rejected with a clear error because this template only supports single-file text model GGUFs today. Not all model repos include GGUF quantizations — look for repos with -GGUF in the name. The selected model is baked into the image at /models/model.gguf.

Build-Time Parameters

Parameter	Description	Default
`MODEL`	Hugging Face GGUF repo, `<repo>:<filename>`, or direct `.gguf` URL	`unsloth/SmolLM2-135M-Instruct-GGUF`
`ENABLE_SVE`	Enable SVE optimizations	`OFF`

Usage

The easiest way to deploy is using topo. Download and install topo from here

Clone the project:

topo clone git@github.com:Arm-Examples/topo-v9-cpu-chat.git

Build and Deploy the project:

cd topo-v9-cpu-chat
topo deploy --target <ip-address-of-target>

Common Model Selection Examples

Use a different model:

topo deploy --target <ip-address-of-target> \
  --arg MODEL=bartowski/Qwen_Qwen3.5-0.8B-GGUF

Force an exact GGUF file:

topo deploy --target <ip-address-of-target> \
  --arg MODEL=unsloth/SmolLM2-135M-Instruct-GGUF:SmolLM2-135M-Instruct-Q4_K_M.gguf

Access the Chat Interface

Open your browser to http://<ip-address-of-target>:3000 to start chatting!

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github		.github
llama-inference		llama-inference
simple-chat		simple-chat
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
compose.yaml		compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Topo CPU AI Chat

Overview

Prerequisites

Build-Time Parameters

Usage

Clone the project:

Build and Deploy the project:

Common Model Selection Examples

Access the Chat Interface

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Topo CPU AI Chat

Overview

Prerequisites

Build-Time Parameters

Usage

Clone the project:

Build and Deploy the project:

Common Model Selection Examples

Access the Chat Interface

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages