tinyvllm

A tiny version of vllm inspired by nano-vllm.

Description

This project aims to build a lightweight inference engine for LLMs, keeping all code in a single file tinyvllm.py for simplicity and educational purposes.

Installation

You can install tinyvllm directly from GitHub:

pip install git+https://github.com/yaoshengzhe/tinyvllm.git

Or install dependencies for development:

pip install .

Quick Start

1. Download Model

Before running the example, download the model from Hugging Face:

huggingface-cli download Qwen/Qwen3-0.6B --local-dir models/Qwen3-0.6B --local-dir-use-symlinks False

2. Run Example

python example.py

Running on Laptop (macOS/CPU)

tinyvllm is designed to be compatible with macOS and CPU-only environments, although performance will be slower than on GPU.

Prerequisites

Ensure you have Python 3.8+ installed. It is recommended to use a virtual environment.

python3 -m venv .venv
source .venv/bin/activate
pip install .

Running the Example

Download the model (if not already done):

huggingface-cli download Qwen/Qwen3-0.6B --local-dir models/Qwen3-0.6B --local-dir-use-symlinks False

Run the example:
```
python example.py
```
On a laptop, it might take a minute to load the model and generate text.

Running Benchmarks on Laptop

You can run the benchmark on your laptop with reduced parameters for testing:

python benchmark.py --backend tinyvllm --num-seqs 2 --max-input-len 10 --max-output-len 10

Benchmark

To benchmark vLLM performance (requires vllm installed):

python benchmark.py --backend vllm

To benchmark nano-vLLM performance (requires nanovllm installed):

python benchmark.py --backend nanovllm

To benchmark tinyvllm performance:

python benchmark.py --backend tinyvllm

Comparative Results

Hardware: a2-ultragpu-1g (NVIDIA A100 80GB x 1) on Google Compute Engine (GCE).

Metric	vLLM	nano-vLLM	tinyvllm
Total Sequences	256	256	TBD
Total Input Tokens	142,827	142,827	TBD
Total Gen Tokens	133,966	133,966	TBD
Total Duration	16.47 s	16.88 s	TBD
Gen Throughput	8,135.63 tok/s	7,938.19 tok/s	TBD
Total Throughput	16,809.38 tok/s	16,401.45 tok/s	TBD

Observation: nano-vLLM achieves ~97.5% of vLLM's generation throughput in this test, demonstrating its efficiency as a lightweight alternative.

Click to see detailed vLLM Benchmark Logs

==================================================
Starting Benchmark...
==================================================
Adding requests: 100%
 256/256 [00:00<00:00, 2912.32it/s]
Processed prompts: 100%
 256/256 [00:16<00:00, 48.64it/s, est. speed input: 8721.47 toks/s, output: 8180.38 toks/s]

##################################################
BENCHMARK RESULTS
##################################################
Metric                    | Value          
-------------------------------------------
Total Sequences           | 256            
Total Input Tokens        | 142827         
Total Gen Tokens          | 133966         
Total Duration            | 16.47 s
-------------------------------------------
Gen Throughput            | 8135.63 tok/s
Total Throughput          | 16809.38 tok/s
##################################################

Click to see detailed nano-vLLM Benchmark Logs

==================================================
Initializing nanovllm Engine...
==================================================
...
##################################################
BENCHMARK RESULTS (nanovllm)
##################################################
Metric                    | Value          
-------------------------------------------
Total Sequences           | 256            
Total Input Tokens        | 142827         
Total Gen Tokens          | 133966         
Total Duration            | 16.88 s
-------------------------------------------
Gen Throughput            | 7938.19 tok/s
Total Throughput          | 16401.45 tok/s
##################################################

Testing

python -m unittest discover tests

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
benchmark.py		benchmark.py
example.py		example.py
pyproject.toml		pyproject.toml
setup.sh		setup.sh
tinyvllm.py		tinyvllm.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tinyvllm

Description

Installation

Quick Start

1. Download Model

2. Run Example

Running on Laptop (macOS/CPU)

Prerequisites

Running the Example

Running Benchmarks on Laptop

Benchmark

Comparative Results

Testing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

tinyvllm

Description

Installation

Quick Start

1. Download Model

2. Run Example

Running on Laptop (macOS/CPU)

Prerequisites

Running the Example

Running Benchmarks on Laptop

Benchmark

Comparative Results

Testing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages