Running a LLM on the ESP32

Optimizing Llama2.c for the ESP32

With the following changes to llama2.c, I am able to achieve 19.13 tok/s:

Utilizing both cores of the ESP32 during math heavy operations.
Utilizing some special dot product functions from the ESP-DSP library that are designed for the ESP32-S3. These functions utilize some of the few SIMD instructions the ESP32-S3 has.
Maxing out CPU speed to 240 MHz and PSRAM speed to 80MHZ and increasing the instruction cache size.

This requires the ESP-IDF toolchain to be installed

idf.py build
idf.py -p /dev/{DEVICE_PORT} flash

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
main		main
.DS_Store		.DS_Store
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
ESP32_LLM.jpg		ESP32_LLM.jpg
Kconfig.projbuild		Kconfig.projbuild
README.md		README.md
dependencies.lock		dependencies.lock
linker.lf		linker.lf
llm_output.gif		llm_output.gif
partitions.csv		partitions.csv
sdkconfig		sdkconfig
sdkconfig.ci		sdkconfig.ci
sdkconfig.old		sdkconfig.old