-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Hi TFLite Micro folks,
I wanted to share a small MCU language-runtime experiment and ask whether people here would see it as adjacent to the kinds of systems TFLite Micro is meant to support.
We built a public demo line called Engram and deployed it on a commodity ESP32-C3.
Current public numbers:
-
Host-side benchmark capability
LogiQA = 0.392523IFEval = 0.780037
-
Published board proof
LogiQA 642 = 249 / 642 = 0.3878504672897196host_full_match = 642 / 642- runtime artifact size =
1,380,771 bytes
Important scope note:
This is not presented as unrestricted open-input native LLM generation on MCU.
The board-side path is closer to a flash-resident, table-driven runtime with:
- packed token weights
- hashed lookup structures
- fixed compiled probe batches
- streaming fold / checksum style execution over precompiled structures
So this is not a standard dense neural network graph executed by a familiar micro inference runtime. It is closer to a task-specialized language
runtime whose behavior has been pushed into a compact executable form.
Repo:
https://github.com/Alpha-Guardian/Engram
The question I’m interested in is whether systems like this should be thought of as:
- outside the intended scope of TFLite Micro
- adjacent to TFLM-style model deployment
- or potentially part of a broader future category of MCU-scale language inference/runtime systems
If people here have thoughts on where this boundary should be drawn, I’d be very interested.