LLM KV Cache Paging Simulator

LLM Inference Simulator that proves that paging and prefix-sharing reduces the memory waste for concurrent decode operations.

> make clean
rm -f llm_sim src/main.o src/sim.o src/mono_kv.o src/page_kv.o src/page_alloc.o src/workload.o
> make llm_sim
cc -O2 -Wall -std=c11 -pthread -Iinclude   -c -o src/main.o src/main.c
cc -O2 -Wall -std=c11 -pthread -Iinclude   -c -o src/sim.o src/sim.c
cc -O2 -Wall -std=c11 -pthread -Iinclude   -c -o src/mono_kv.o src/mono_kv.c
cc -O2 -Wall -std=c11 -pthread -Iinclude   -c -o src/page_kv.o src/page_kv.c
cc -O2 -Wall -std=c11 -pthread -Iinclude   -c -o src/page_alloc.o src/page_alloc.c
cc -O2 -Wall -std=c11 -pthread -Iinclude   -c -o src/workload.o src/workload.c
cc -O2 -Wall -std=c11 -pthread -Iinclude -o llm_sim src/main.o src/sim.o src/mono_kv.o src/page_kv.o src/page_alloc.o src/workload.o -pthread
> ./llm_sim
bytes_per_token = 8192
Monolithic:
  logical_bytes  = 621674496
  physical_bytes = 4294967296
  waste_bytes    = 3673292800 (85.53%)
Paged+Prefix:
  logical_bytes  = 621674496
  physical_bytes = 629800960
  waste_bytes    = 8126464 (1.29%)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
include		include
src		src
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
tutorial.md		tutorial.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM KV Cache Paging Simulator

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM KV Cache Paging Simulator

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages