mozilla-ai
diff --git a/‎.gitignore‎
Lines changed: 2 additions & 0 deletions b/‎.gitignore‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎Makefile‎
Lines changed: 4 additions & 0 deletions b/‎Makefile‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 1 addition & 1 deletion b/‎README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/creating_llamafiles.md‎
Lines changed: 62 additions & 68 deletions b/‎docs/creating_llamafiles.md‎
Lines changed: 62 additions & 68 deletions
diff --git a/‎docs/index.md‎
Lines changed: 5 additions & 3 deletions b/‎docs/index.md‎
Lines changed: 5 additions & 3 deletions
diff --git a/‎docs/quickstart.md‎
Lines changed: 6 additions & 4 deletions b/‎docs/quickstart.md‎
Lines changed: 6 additions & 4 deletions
@@ -10,6 +10,8 @@
 /trace.json
 
 /*.log
+/*.bin
+/*.mp3
 
 .claude
 CLAUDE.md
@@ -15,7 +15,9 @@ include build/rules.mk
 
 include third_party/BUILD.mk
 include llama.cpp/BUILD.mk
+include whisper.cpp/BUILD.mk
 include llamafile/BUILD.mk
+include whisperfile/BUILD.mk
 include tests/BUILD.mk
 endif
 
@@ -24,6 +26,8 @@ endif
 .PHONY: o/$(MODE)/
 o/$(MODE)/:	o/$(MODE)/llamafile	\
 		o/$(MODE)/llama.cpp \
+		o/$(MODE)/whisper.cpp \
+		o/$(MODE)/whisperfile \
 		o/$(MODE)/third_party/zipalign
 
 .PHONY: install
 
@@ -4,7 +4,6 @@
 Mozilla.ai recently adopted the llamafile project, and we're planning an approach for codebase modernization. Please share what you find most valuable about llamafile and what would make it more useful for your work.
 [Read more via the blog](https://blog.mozilla.ai/llamafile-returns/) and add your voice to the discussion [here](https://github.com/mozilla-ai/llamafile/discussions/809).
 
-
 [![ci status](https://github.com/Mozilla-Ocho/llamafile/actions/workflows/ci.yml/badge.svg)](https://github.com/Mozilla-Ocho/llamafile/actions/workflows/ci.yml)<br/>
 [![](https://dcbadge.vercel.app/api/server/YuMNeuKStr)](https://discord.gg/YuMNeuKStr)<br/><br/>
 
@@ -52,6 +51,7 @@ Check the full documentation in the [docs/](docs/) folder or online at [mozilla-
 - [Technical details](https://mozilla-ai.github.io/llamafile/technical_details/)
 - [Security](https://mozilla-ai.github.io/llamafile/security/)
 - [Troubleshooting](https://mozilla-ai.github.io/llamafile/troubleshooting/)
+- [Whisperfile](https://mozilla-ai.github.io/llamafile/whisperfile/) — speech-to-text
 
 
 ## Licensing
 
@@ -1,57 +1,56 @@
-llamafile uses [zipalign](https://github.com/jart/zipalign) to bundle its main
-executable together with model weights and a set of default arguments.
+# Creating a llamafile
+
+A llamafile bundles the llamafile executable, model weights, and a set of
+default arguments into a single self-contained file using the
+[APE](https://justine.lol/ape.html) (Actually Portable Executable) format,
+which supports ZIP as a container for extra data. If you have already
+downloaded a llamafile, you can inspect its contents with
+`unzip -vl <filename.llamafile>` (or on Windows, rename it to `.zip` and
+open it in your ZIP GUI).
+
+## Prerequisites
 
-We are including zipalign as a git submodule and building it together with
-llamafile, so if you managed to successfully compile llamafile you also have
-the `zipalign` executable in the `o/third_party/zipalign` folder. If you want
-to build zipalign alone, just run
+llamafile uses [zipalign](https://github.com/jart/zipalign) to bundle files
+into the executable. It is included as a git submodule and built alongside
+llamafile, so if you have already compiled llamafile you have the `zipalign`
+executable in the `o//third_party/zipalign` folder. To build it on its own:
 
 ```sh
 make o//third_party/zipalign
 ```
 
-> **NOTE:**
-The zipalign tool we are referring to here is not the
-[Android](https://developer.android.com/tools/zipalign) one! Please refer
-to the GitHub repo above for an in-depth description and up-to-date code.
+> [!NOTE]
+> The zipalign tool referenced here is **not** the
+> [Android zipalign](https://developer.android.com/tools/zipalign). See the
+> GitHub repo above for an in-depth description and up-to-date code.
 
-# Creating a llamafile
+## What you need
+
+- **The llamafile executable** — download a prebuilt binary from the
+  [releases page](https://github.com/mozilla-ai/llamafile/releases), or build
+  from source following
+  [these instructions](https://mozilla-ai.github.io/llamafile/source_installation/).
+
+- **Model weights in GGUF format** — download from Hugging Face
+  ([search here](https://huggingface.co/models?library=gguf)), or use weights
+  already on disk from
+  [another application](https://mozilla-ai.github.io/llamafile/quickstart/#running-llamafile-with-models-downloaded-by-third-party-applications).
+
+- **A `.args` file** — specifies default arguments (at minimum, the model
+  path so it loads automatically).
+
+## Examples
+
+### TUI, text-only
 
-All files using the `.llamafile` extension follow the APE
-([Actually Portable Executable](https://justine.lol/ape.html)) format,
-which supports ZIP as a container format for extra data files. In the case
-of llamafiles, this is used to package the main executable (the program
-actually serving the models) together with model weights and a set
-of arguments that are passed by default to the executable when it is run.
-If you have already downloaded a llamafile, you can run
-`unzip -vl <filename.llamafile>` to see its contents (or, if you are
-running Windows, you can change the file extension to `.zip` and open
-it in your default ZIP GUI).
-
-If you want to create a llamafile from scratch, the things you need are:
-
-- the llamafile executable, which you can either download as a binary
-([here](https://huggingface.co/mozilla-ai/llamafile_0.10.0_alpha) is a
-the repository holding the most recent version, 0.10.0 alpha) or build
-from source following
-[these instructions](https://mozilla-ai.github.io/llamafile/source_installation/);
-
-- model weights in GGUF format, which you can download from huggingface
-(you can start your search [here](https://huggingface.co/models?library=gguf)),
-or you can find on your disk if you have already downloaded models using
-[another application](https://mozilla-ai.github.io/llamafile/quickstart/#running-llamafile-with-models-downloaded-by-third-party-applications);
-
-- a `.args` file containing some default arguments (typically at least the model name so it is automatically loaded).
-
-## TUI, text-only
 Let's see how this works in practice with a simple, text-only language
 model, e.g. Qwen3-0.6B:
 
-- [search](https://huggingface.co/models?library=gguf&sort=trending&search=qwen3-0.6b) for the model weights in GGUF format
+- [Search](https://huggingface.co/models?library=gguf&sort=trending&search=qwen3-0.6b) for the model weights in GGUF format
 (for the sake of this example we'll download [these](https://huggingface.co/Qwen/Qwen3-0.6B-GGUF) with Q8 quantization)
-- create a file named `.args` with the following content:
+- Create a file named `.args` with the following content:
 
-```
+```text
 -m
 /zip/Qwen3-0.6B-Q8_0.gguf
 -fa
@@ -75,23 +74,20 @@ on
 ...
 ```
 
-> NOTE: there is one argument per line.
-Most of the arguments are
-optional, except the model name (in this case we are replicating the
-parameters suggested [here](https://huggingface.co/Qwen/Qwen3-0.6B-GGUF)).
-The `/zip/` path is always necessary when one refers to a file packaged
-within the llamafile.
-The `...` argument optionally specifies where any additional CLI arguments
-passed by the user are to be inserted.
+> [!NOTE]
+> There is one argument per line. Most arguments are optional — the model
+> name is the only required one (the above replicates the parameters suggested
+> [here](https://huggingface.co/Qwen/Qwen3-0.6B-GGUF)). The `/zip/` path
+> prefix is required whenever referencing a file packaged inside the llamafile.
+> The `...` token is replaced with any additional CLI arguments the user passes
+> at runtime.
 
-- copy the llamafile executable to the current directory and run zipalign
-to add weights and args. Assuming both llamafile and zipalign have just
-been built:
+- Copy the llamafile executable and run zipalign to embed the weights and args:
 
-```
-cp ./o/llamafile/llamafile Qwen3-0.6B-Q8.llamafile
+```bash
+cp o//llamafile/llamafile Qwen3-0.6B-Q8.llamafile
 
-./o/third_party/zipalign/zipalign -j0 \
+o//third_party/zipalign/zipalign -j0 \
   Qwen3-0.6B-Q8.llamafile \
   Qwen3-0.6B-Q8_0.gguf \
   .args
@@ -102,26 +98,27 @@ cp ./o/llamafile/llamafile Qwen3-0.6B-Q8.llamafile
 Congratulations, you've just made your own LLM executable that's easy to
 share with your friends!
 
-Your new llamafile will start loading the Qwen model in the TUI. Note that
-you can still run it as a web server if you want, with:
+Your new llamafile will start loading the Qwen model in the TUI. You can also
+run it as a web server with:
 
-```
+```bash
 ./Qwen3-0.6B-Q8.llamafile --server
 ```
 
-## Server, multimodal
+### Server, multimodal
+
 Now, let us build another llamafile running a multimodal model served
 via HTTP. If you want to be able to just say:
 
-```sh
+```bash
 ./llava.llamafile
 ```
 
 ...and have it run the web server without having to specify arguments,
-then you can embed both the weights and the following `.args` file
+embed both the weights and the following `.args` file
 (weights used in this example are downloaded from [here](https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf)):
 
-```sh
+```text
 -m
 /zip/llava-v1.6-mistral-7b.Q8_0.gguf
 --mmproj
@@ -135,13 +132,12 @@ then you can embed both the weights and the following `.args` file
 ...
 ```
 
-
 Next, add both the weights and the argument file to the executable:
 
-```sh
-cp ./o/llamafile/llamafile llava.llamafile
+```bash
+cp o//llamafile/llamafile llava.llamafile
 
-./o/third_party/zipalign/zipalign -j0 \
+o//third_party/zipalign/zipalign -j0 \
   llava.llamafile \
   llava-v1.6-mistral-7b.Q8_0.gguf \
   mmproj-model-f16.gguf \
@@ -150,8 +146,6 @@ cp ./o/llamafile/llamafile llava.llamafile
 ./llava.llamafile
 ```
 
-
-
 ## Distribution
 
 One good way to share a llamafile with your friends is by posting it on
 
@@ -5,7 +5,7 @@ Mozilla.ai recently adopted the llamafile project, and we're planning an approac
 [Read more via the blog](https://blog.mozilla.ai/llamafile-returns/) and add your voice to the discussion [here](https://github.com/mozilla-ai/llamafile/discussions/809).
 
 
-[![ci status](https://github.com/Mozilla-Ocho/llamafile/actions/workflows/ci.yml/badge.svg)](https://github.com/Mozilla-Ocho/llamafile/actions/workflows/ci.yml)<br/>
+[![ci status](https://github.com/mozilla-ai/llamafile/actions/workflows/ci.yml/badge.svg)](https://github.com/mozilla-ai/llamafile/actions/workflows/ci.yml)<br/>
 [![](https://dcbadge.vercel.app/api/server/YuMNeuKStr)](https://discord.gg/YuMNeuKStr)<br/><br/>
 
 <img src="images/llamafile-640x640.png" width="320" height="320"
@@ -18,7 +18,9 @@ accessible to both developers and end users. We're doing that by
 combining [llama.cpp](https://github.com/ggerganov/llama.cpp) with [Cosmopolitan Libc](https://github.com/jart/cosmopolitan) into one
 framework that collapses all the complexity of LLMs down to
 a single-file executable (called a "llamafile") that runs
-locally on most computers, with no installation.<br/><br/>
+locally on most computers, with no installation.
+
+llamafile also includes **[whisperfile](whisperfile/index.md)**, a single-file speech-to-text tool built on [whisper.cpp](https://github.com/ggerganov/whisper.cpp) and the same Cosmopolitan packaging. It supports transcription and translation of audio files across all the same platforms, with no installation required.<br/><br/>
 
 <a href="https://builders.mozilla.org/"><img src="images/mozilla-logo-bw-rgb.png" width="150"></a><br/>
 llamafile is a <a href="https://builders.mozilla.org/">Mozilla Builders</a> project.<br/><br/>
@@ -76,4 +78,4 @@ should that be desired.
 The llamafile logo on this page was generated with the assistance of DALL·E 3.
 
 
-[![Star History Chart](https://api.star-history.com/svg?repos=Mozilla-Ocho/llamafile&type=Date)](https://star-history.com/#Mozilla-Ocho/llamafile&Date)
+[![Star History Chart](https://api.star-history.com/svg?repos=mozilla-ai/llamafile&type=Date)](https://star-history.com/#mozilla-ai/llamafile&Date)
@@ -1,3 +1,5 @@
+# Getting Started with llamafile 
+
 The easiest way to try it for yourself is to download our example
 llamafile for the [LLaVA](https://llava-vl.github.io/) model (license: [LLaMA 2](https://ai.meta.com/resources/models-and-libraries/llama-downloads/),
 [OpenAI](https://openai.com/policies/terms-of-use)). LLaVA is a new LLM that can do more
@@ -33,7 +35,7 @@ chmod +x llava-v1.5-7b-q4.llamafile
 
 **Having trouble? See the [Troubleshooting](troubleshooting.md) page.**
 
-### JSON API Quickstart
+## JSON API Quickstart
 
 When llamafile is started, in addition to hosting a web
 UI chat server at <http://127.0.0.1:8080/>, an [OpenAI
@@ -150,7 +152,7 @@ OpenAI API compatible endpoints, including embeddings. It's designed to
 be more reliable. It's better able to recycle context windows across
 multiple slots. To try it, run:
 
-```
+```bash
 llamafile --server --v2 --help
 llamafile --server --v2
 ```
@@ -188,7 +190,7 @@ This section answers the question *"I already have a model downloaded locally by
 
  So if you have downloaded e.g. the `llama-2-7b.Q2_K.gguf` file for `TheBloke/Llama-2-7B-GGUF`, you can run llamafile as follows:
 
-```
+```bash
 cd ~/.cache/lm-studio/models/TheBloke/Llama-2-7B-GGUF
 llamafile -m llama-2-7b.Q2_K.gguf
 ```
@@ -201,7 +203,7 @@ The manifest maps each file related to the model (e.g. GGUF weights, license, pr
 
 Each sha256 digest is also used as a filename in the `~/.ollama/models/blobs` directory (if you look into that directory you'll see *only* those sha256-* filenames). This means you can directly run llamafile by passing the sha256 digest as the model filename. So if e.g. the `llama3:latest` GGUF file digest is `sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29`, you can run llamafile as follows:
 
-```
+```bash
 cd ~/.ollama/models/blobs
 llamafile -m sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29
 ```