Skip to content

Commit 7d85352

Browse files
aittalamangpt
andauthored
Add whisper (#880)
* Updated whisper.cpp submodule from v1.6.2-168 (6739eb83) to v1.8.3 (2eeeba56). * Updated patches scripts + removed old patches * Added whisperfile + extra tools (mic2raw, mic2txt, stream, whisper-server) * Added slurp * Updated docs and man pages --------- Co-authored-by: angpt <anushrigupta@gmail.com>
1 parent 808ed0a commit 7d85352

64 files changed

Lines changed: 1876 additions & 96224 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@
1010
/trace.json
1111

1212
/*.log
13+
/*.bin
14+
/*.mp3
1315

1416
.claude
1517
CLAUDE.md

Makefile

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,9 @@ include build/rules.mk
1515

1616
include third_party/BUILD.mk
1717
include llama.cpp/BUILD.mk
18+
include whisper.cpp/BUILD.mk
1819
include llamafile/BUILD.mk
20+
include whisperfile/BUILD.mk
1921
include tests/BUILD.mk
2022
endif
2123

@@ -24,6 +26,8 @@ endif
2426
.PHONY: o/$(MODE)/
2527
o/$(MODE)/: o/$(MODE)/llamafile \
2628
o/$(MODE)/llama.cpp \
29+
o/$(MODE)/whisper.cpp \
30+
o/$(MODE)/whisperfile \
2731
o/$(MODE)/third_party/zipalign
2832

2933
.PHONY: install

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@
44
Mozilla.ai recently adopted the llamafile project, and we're planning an approach for codebase modernization. Please share what you find most valuable about llamafile and what would make it more useful for your work.
55
[Read more via the blog](https://blog.mozilla.ai/llamafile-returns/) and add your voice to the discussion [here](https://github.com/mozilla-ai/llamafile/discussions/809).
66

7-
87
[![ci status](https://github.com/Mozilla-Ocho/llamafile/actions/workflows/ci.yml/badge.svg)](https://github.com/Mozilla-Ocho/llamafile/actions/workflows/ci.yml)<br/>
98
[![](https://dcbadge.vercel.app/api/server/YuMNeuKStr)](https://discord.gg/YuMNeuKStr)<br/><br/>
109

@@ -52,6 +51,7 @@ Check the full documentation in the [docs/](docs/) folder or online at [mozilla-
5251
- [Technical details](https://mozilla-ai.github.io/llamafile/technical_details/)
5352
- [Security](https://mozilla-ai.github.io/llamafile/security/)
5453
- [Troubleshooting](https://mozilla-ai.github.io/llamafile/troubleshooting/)
54+
- [Whisperfile](https://mozilla-ai.github.io/llamafile/whisperfile/) — speech-to-text
5555

5656

5757
## Licensing

docs/creating_llamafiles.md

Lines changed: 62 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -1,57 +1,56 @@
1-
llamafile uses [zipalign](https://github.com/jart/zipalign) to bundle its main
2-
executable together with model weights and a set of default arguments.
1+
# Creating a llamafile
2+
3+
A llamafile bundles the llamafile executable, model weights, and a set of
4+
default arguments into a single self-contained file using the
5+
[APE](https://justine.lol/ape.html) (Actually Portable Executable) format,
6+
which supports ZIP as a container for extra data. If you have already
7+
downloaded a llamafile, you can inspect its contents with
8+
`unzip -vl <filename.llamafile>` (or on Windows, rename it to `.zip` and
9+
open it in your ZIP GUI).
10+
11+
## Prerequisites
312

4-
We are including zipalign as a git submodule and building it together with
5-
llamafile, so if you managed to successfully compile llamafile you also have
6-
the `zipalign` executable in the `o/third_party/zipalign` folder. If you want
7-
to build zipalign alone, just run
13+
llamafile uses [zipalign](https://github.com/jart/zipalign) to bundle files
14+
into the executable. It is included as a git submodule and built alongside
15+
llamafile, so if you have already compiled llamafile you have the `zipalign`
16+
executable in the `o//third_party/zipalign` folder. To build it on its own:
817

918
```sh
1019
make o//third_party/zipalign
1120
```
1221

13-
> **NOTE:**
14-
The zipalign tool we are referring to here is not the
15-
[Android](https://developer.android.com/tools/zipalign) one! Please refer
16-
to the GitHub repo above for an in-depth description and up-to-date code.
22+
> [!NOTE]
23+
> The zipalign tool referenced here is **not** the
24+
> [Android zipalign](https://developer.android.com/tools/zipalign). See the
25+
> GitHub repo above for an in-depth description and up-to-date code.
1726
18-
# Creating a llamafile
27+
## What you need
28+
29+
- **The llamafile executable** — download a prebuilt binary from the
30+
[releases page](https://github.com/mozilla-ai/llamafile/releases), or build
31+
from source following
32+
[these instructions](https://mozilla-ai.github.io/llamafile/source_installation/).
33+
34+
- **Model weights in GGUF format** — download from Hugging Face
35+
([search here](https://huggingface.co/models?library=gguf)), or use weights
36+
already on disk from
37+
[another application](https://mozilla-ai.github.io/llamafile/quickstart/#running-llamafile-with-models-downloaded-by-third-party-applications).
38+
39+
- **A `.args` file** — specifies default arguments (at minimum, the model
40+
path so it loads automatically).
41+
42+
## Examples
43+
44+
### TUI, text-only
1945

20-
All files using the `.llamafile` extension follow the APE
21-
([Actually Portable Executable](https://justine.lol/ape.html)) format,
22-
which supports ZIP as a container format for extra data files. In the case
23-
of llamafiles, this is used to package the main executable (the program
24-
actually serving the models) together with model weights and a set
25-
of arguments that are passed by default to the executable when it is run.
26-
If you have already downloaded a llamafile, you can run
27-
`unzip -vl <filename.llamafile>` to see its contents (or, if you are
28-
running Windows, you can change the file extension to `.zip` and open
29-
it in your default ZIP GUI).
30-
31-
If you want to create a llamafile from scratch, the things you need are:
32-
33-
- the llamafile executable, which you can either download as a binary
34-
([here](https://huggingface.co/mozilla-ai/llamafile_0.10.0_alpha) is a
35-
the repository holding the most recent version, 0.10.0 alpha) or build
36-
from source following
37-
[these instructions](https://mozilla-ai.github.io/llamafile/source_installation/);
38-
39-
- model weights in GGUF format, which you can download from huggingface
40-
(you can start your search [here](https://huggingface.co/models?library=gguf)),
41-
or you can find on your disk if you have already downloaded models using
42-
[another application](https://mozilla-ai.github.io/llamafile/quickstart/#running-llamafile-with-models-downloaded-by-third-party-applications);
43-
44-
- a `.args` file containing some default arguments (typically at least the model name so it is automatically loaded).
45-
46-
## TUI, text-only
4746
Let's see how this works in practice with a simple, text-only language
4847
model, e.g. Qwen3-0.6B:
4948

50-
- [search](https://huggingface.co/models?library=gguf&sort=trending&search=qwen3-0.6b) for the model weights in GGUF format
49+
- [Search](https://huggingface.co/models?library=gguf&sort=trending&search=qwen3-0.6b) for the model weights in GGUF format
5150
(for the sake of this example we'll download [these](https://huggingface.co/Qwen/Qwen3-0.6B-GGUF) with Q8 quantization)
52-
- create a file named `.args` with the following content:
51+
- Create a file named `.args` with the following content:
5352

54-
```
53+
```text
5554
-m
5655
/zip/Qwen3-0.6B-Q8_0.gguf
5756
-fa
@@ -75,23 +74,20 @@ on
7574
...
7675
```
7776

78-
> NOTE: there is one argument per line.
79-
Most of the arguments are
80-
optional, except the model name (in this case we are replicating the
81-
parameters suggested [here](https://huggingface.co/Qwen/Qwen3-0.6B-GGUF)).
82-
The `/zip/` path is always necessary when one refers to a file packaged
83-
within the llamafile.
84-
The `...` argument optionally specifies where any additional CLI arguments
85-
passed by the user are to be inserted.
77+
> [!NOTE]
78+
> There is one argument per line. Most arguments are optional — the model
79+
> name is the only required one (the above replicates the parameters suggested
80+
> [here](https://huggingface.co/Qwen/Qwen3-0.6B-GGUF)). The `/zip/` path
81+
> prefix is required whenever referencing a file packaged inside the llamafile.
82+
> The `...` token is replaced with any additional CLI arguments the user passes
83+
> at runtime.
8684
87-
- copy the llamafile executable to the current directory and run zipalign
88-
to add weights and args. Assuming both llamafile and zipalign have just
89-
been built:
85+
- Copy the llamafile executable and run zipalign to embed the weights and args:
9086

91-
```
92-
cp ./o/llamafile/llamafile Qwen3-0.6B-Q8.llamafile
87+
```bash
88+
cp o//llamafile/llamafile Qwen3-0.6B-Q8.llamafile
9389

94-
./o/third_party/zipalign/zipalign -j0 \
90+
o//third_party/zipalign/zipalign -j0 \
9591
Qwen3-0.6B-Q8.llamafile \
9692
Qwen3-0.6B-Q8_0.gguf \
9793
.args
@@ -102,26 +98,27 @@ cp ./o/llamafile/llamafile Qwen3-0.6B-Q8.llamafile
10298
Congratulations, you've just made your own LLM executable that's easy to
10399
share with your friends!
104100

105-
Your new llamafile will start loading the Qwen model in the TUI. Note that
106-
you can still run it as a web server if you want, with:
101+
Your new llamafile will start loading the Qwen model in the TUI. You can also
102+
run it as a web server with:
107103

108-
```
104+
```bash
109105
./Qwen3-0.6B-Q8.llamafile --server
110106
```
111107

112-
## Server, multimodal
108+
### Server, multimodal
109+
113110
Now, let us build another llamafile running a multimodal model served
114111
via HTTP. If you want to be able to just say:
115112

116-
```sh
113+
```bash
117114
./llava.llamafile
118115
```
119116

120117
...and have it run the web server without having to specify arguments,
121-
then you can embed both the weights and the following `.args` file
118+
embed both the weights and the following `.args` file
122119
(weights used in this example are downloaded from [here](https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf)):
123120

124-
```sh
121+
```text
125122
-m
126123
/zip/llava-v1.6-mistral-7b.Q8_0.gguf
127124
--mmproj
@@ -135,13 +132,12 @@ then you can embed both the weights and the following `.args` file
135132
...
136133
```
137134

138-
139135
Next, add both the weights and the argument file to the executable:
140136

141-
```sh
142-
cp ./o/llamafile/llamafile llava.llamafile
137+
```bash
138+
cp o//llamafile/llamafile llava.llamafile
143139

144-
./o/third_party/zipalign/zipalign -j0 \
140+
o//third_party/zipalign/zipalign -j0 \
145141
llava.llamafile \
146142
llava-v1.6-mistral-7b.Q8_0.gguf \
147143
mmproj-model-f16.gguf \
@@ -150,8 +146,6 @@ cp ./o/llamafile/llamafile llava.llamafile
150146
./llava.llamafile
151147
```
152148

153-
154-
155149
## Distribution
156150

157151
One good way to share a llamafile with your friends is by posting it on

docs/index.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ Mozilla.ai recently adopted the llamafile project, and we're planning an approac
55
[Read more via the blog](https://blog.mozilla.ai/llamafile-returns/) and add your voice to the discussion [here](https://github.com/mozilla-ai/llamafile/discussions/809).
66

77

8-
[![ci status](https://github.com/Mozilla-Ocho/llamafile/actions/workflows/ci.yml/badge.svg)](https://github.com/Mozilla-Ocho/llamafile/actions/workflows/ci.yml)<br/>
8+
[![ci status](https://github.com/mozilla-ai/llamafile/actions/workflows/ci.yml/badge.svg)](https://github.com/mozilla-ai/llamafile/actions/workflows/ci.yml)<br/>
99
[![](https://dcbadge.vercel.app/api/server/YuMNeuKStr)](https://discord.gg/YuMNeuKStr)<br/><br/>
1010

1111
<img src="images/llamafile-640x640.png" width="320" height="320"
@@ -18,7 +18,9 @@ accessible to both developers and end users. We're doing that by
1818
combining [llama.cpp](https://github.com/ggerganov/llama.cpp) with [Cosmopolitan Libc](https://github.com/jart/cosmopolitan) into one
1919
framework that collapses all the complexity of LLMs down to
2020
a single-file executable (called a "llamafile") that runs
21-
locally on most computers, with no installation.<br/><br/>
21+
locally on most computers, with no installation.
22+
23+
llamafile also includes **[whisperfile](whisperfile/index.md)**, a single-file speech-to-text tool built on [whisper.cpp](https://github.com/ggerganov/whisper.cpp) and the same Cosmopolitan packaging. It supports transcription and translation of audio files across all the same platforms, with no installation required.<br/><br/>
2224

2325
<a href="https://builders.mozilla.org/"><img src="images/mozilla-logo-bw-rgb.png" width="150"></a><br/>
2426
llamafile is a <a href="https://builders.mozilla.org/">Mozilla Builders</a> project.<br/><br/>
@@ -76,4 +78,4 @@ should that be desired.
7678
The llamafile logo on this page was generated with the assistance of DALL·E 3.
7779

7880

79-
[![Star History Chart](https://api.star-history.com/svg?repos=Mozilla-Ocho/llamafile&type=Date)](https://star-history.com/#Mozilla-Ocho/llamafile&Date)
81+
[![Star History Chart](https://api.star-history.com/svg?repos=mozilla-ai/llamafile&type=Date)](https://star-history.com/#mozilla-ai/llamafile&Date)

docs/quickstart.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
# Getting Started with llamafile
2+
13
The easiest way to try it for yourself is to download our example
24
llamafile for the [LLaVA](https://llava-vl.github.io/) model (license: [LLaMA 2](https://ai.meta.com/resources/models-and-libraries/llama-downloads/),
35
[OpenAI](https://openai.com/policies/terms-of-use)). LLaVA is a new LLM that can do more
@@ -33,7 +35,7 @@ chmod +x llava-v1.5-7b-q4.llamafile
3335

3436
**Having trouble? See the [Troubleshooting](troubleshooting.md) page.**
3537

36-
### JSON API Quickstart
38+
## JSON API Quickstart
3739

3840
When llamafile is started, in addition to hosting a web
3941
UI chat server at <http://127.0.0.1:8080/>, an [OpenAI
@@ -150,7 +152,7 @@ OpenAI API compatible endpoints, including embeddings. It's designed to
150152
be more reliable. It's better able to recycle context windows across
151153
multiple slots. To try it, run:
152154

153-
```
155+
```bash
154156
llamafile --server --v2 --help
155157
llamafile --server --v2
156158
```
@@ -188,7 +190,7 @@ This section answers the question *"I already have a model downloaded locally by
188190

189191
So if you have downloaded e.g. the `llama-2-7b.Q2_K.gguf` file for `TheBloke/Llama-2-7B-GGUF`, you can run llamafile as follows:
190192

191-
```
193+
```bash
192194
cd ~/.cache/lm-studio/models/TheBloke/Llama-2-7B-GGUF
193195
llamafile -m llama-2-7b.Q2_K.gguf
194196
```
@@ -201,7 +203,7 @@ The manifest maps each file related to the model (e.g. GGUF weights, license, pr
201203

202204
Each sha256 digest is also used as a filename in the `~/.ollama/models/blobs` directory (if you look into that directory you'll see *only* those sha256-* filenames). This means you can directly run llamafile by passing the sha256 digest as the model filename. So if e.g. the `llama3:latest` GGUF file digest is `sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29`, you can run llamafile as follows:
203205

204-
```
206+
```bash
205207
cd ~/.ollama/models/blobs
206208
llamafile -m sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29
207209
```

0 commit comments

Comments
 (0)