I specialize in low-level systems programming, hardware-accelerated AI inference, and building localized applications. My work involves bypassing high-level abstractions to achieve native compute performance through pristine C++ architecture and memory optimization.
A localized, privacy-first AI search engine built from scratch in C++.
- Implements custom inference pipelines with an architecture designed for AIPC and Cloud hybrid offloading.
- Focuses on bypassing traditional Python overhead to achieve native latency.
Local transcription studio built with Whisper.cpp and C++.
- Working intimately with raw C/C++ implementations of OpenAI's Whisper model to deeply understand and optimize local speech-to-text.
- Optimized for bare-metal memory allocation and matrix multi-threading.


