GitHub - bentoml/BentoLlamaCpp: BentoML + llama.cpp

Installation

uv venv -p 3.11

# For M1 Mac
CMAKE_ARGS="-DCMAKE_OSX_ARCHITECTURES=arm64 -DCMAKE_APPLE_SILICON_PROCESSOR=arm64 -DGGML_METAL=on" pip install --upgrade --verbose --force-reinstall --no-cache-dir llama-cpp-python
# For Mac
CMAKE_ARGS="-DGGML_METAL=on" pip install --upgrade --verbose --force-reinstall --no-cache-dir llama-cpp-python

uv pip install -r pyproject.toml

bentoml serve

If you want to use different models:

bentoml serve -f qwq.yaml

It will use Gemma 3 by default here.

Deploy

bentoml deploy

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.bentoignore		.bentoignore
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
qwq.yaml		qwq.yaml
service.py		service.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Deploy

About

Releases

Packages

Contributors 2

Languages

bentoml/BentoLlamaCpp

Folders and files

Latest commit

History

Repository files navigation

Installation

Deploy

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages