llama.app : website + unified `llama` binary #23875

ggerganov · 2026-05-29T16:20:21Z

ggerganov
May 29, 2026
Maintainer

Overview

We are launching an official website for llama.cpp: https://llama.app/

The main goal of the website is to provide a simple way for new users to install and run llama.cpp on their machines. The page has an installation command (one-liner) and links/instructions to popular GGUF models on the hub. The current version is a first iteration of many.

During install the cross-platform installer ships a single binary called llama. The binary packs all the user-facing tooling of llama.cpp (i.e. llama-server, llama-cli, etc.) with a single CLI entry point. This is mostly following the git example. Currently we ship binaries for the major operating systems and we plan to iterate and improve the packaging pipeline.

The webpage will also provide helpful instructions for running llama in common use cases: chat, agentic coding, etc. These will be combined with current FOTM models from various quantization providers (e.g. Unsloth, Bartowski, etc.). There will be guidelines for integration with 3rd-party agents, creating or finding the best configuration for your device and tips for utilizing advanced llama.cpp features. We will be iterating on this and would love to hear any feedback from the community on how to improve.

The contents of the webpage are hosted at https://github.com/ggml-org/llama.pages
The unified llama app is here: https://github.com/ggml-org/llama.cpp/tree/master/app
The binaries are built here: https://github.com/ggml-org/llama-install.sh
The binaries are hosted here: https://huggingface.co/buckets/ggml-org/install.sh

HF Team: @ggml-org/hf

donohara · 2026-05-29T17:12:13Z

donohara
May 29, 2026

This is terrific! Thanks to everyone who contributes to this great tool !

0 replies

MerijnHendriks · 2026-05-29T18:00:47Z

MerijnHendriks
May 29, 2026

Congrats on the achievement and the launch of the website!

If llama-server can remain to exist as a single binary and function independently of the llama binary, then I have no complaints.

1 reply

ggerganov May 29, 2026
Maintainer Author

Yes, llama-server remains as is. Simply put, llama serve and llama-server are the same thing. Also llama cli and llama-cli, etc. Same idea as git commit and git-commit.

Jipok · 2026-05-29T18:23:12Z

Jipok
May 29, 2026

When choosing a model, the size is missing. In my opinion, it would be better to either remove selector and leave the links to the HG, or add a size indication.

1 reply

cphlipot May 29, 2026

Aside from the size, i think we could also try to select more sensible defaults. It seems right now we just select whatever the first one in the list is, which seems to result in either selecting the largest or smallest quant depending on the provider, neither of which is probably what the user is most likely to want:

cphlipot · 2026-05-29T19:18:55Z

cphlipot
May 29, 2026

This looks like a great step forward in terms of usability.

I was wondering with the CLI, are there any plans to adjust arugments for better UX, or is the aim to just simply expose existing tools as subcommand with all existing arguments left exactly as-is?

0 replies

aviallon · 2026-05-30T13:17:36Z

aviallon
May 30, 2026

Missed opportunity to register llama.cpp

0 replies

Kangaroux · 2026-05-30T16:52:18Z

Kangaroux
May 30, 2026

Site looks good, two critiques:

Add a brief callout to what llama.cpp is in the hero. If I stumble on the site the first thing I see is a "C++" logo
Nit: would be nice if someone with image editing tools could clean up the llama image to fix the perspective on the computer case https://llama.app/local-ai.png

1 reply

allozaur May 30, 2026
Collaborator

hey, thank you for the feebdack :) improvements are coming

h8f1z · 2026-05-30T20:14:23Z

h8f1z
May 30, 2026

Congrats 👏 🎉

0 replies

luck-tar-gz · 2026-05-31T12:19:41Z

luck-tar-gz
May 31, 2026

Congrats! I saw a little inconsistency though:

While Qwen's, Gemma's and Step's model tags show "XB MoE · YB active", GPT-OSS's model tag does not show that, making it sound more like the Qwen3.6-27B dense model. Converting it to show that it is MoE would be more accurate.

Great work as always!

2 replies

julien-c Jun 1, 2026

do you want to open a quick PR on https://github.com/ggml-org/llama.pages maybe?

luck-tar-gz Jun 1, 2026

Yeah, should be there now.

llama.app : website + unified llama binary #23875

Uh oh!

Uh oh!

ggerganov May 29, 2026 Maintainer

Overview

Replies: 8 comments · 5 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggerganov May 29, 2026 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

allozaur May 30, 2026 Collaborator

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

llama.app : website + unified `llama` binary #23875

ggerganov
May 29, 2026
Maintainer

Replies: 8 comments 5 replies

ggerganov May 29, 2026
Maintainer Author

allozaur May 30, 2026
Collaborator