Distributed Systems β’ ML Infrastructure β’ Database Internals β’ Video AI
I enjoy building things that are hard to build β
storage engines, distributed systems, multimodal inference pipelines,
GPU-optimized systems, and low-latency backend architectures.
I'm an engineer obsessed with understanding systems from first principles.
I spend most of my time building, breaking, optimizing, and understanding:
- ποΈ Storage Engines & Database Internals
- β‘ Distributed Systems & Event-Driven Architectures
- π§ ML Infrastructure & Multimodal Models
- π₯ Video AI & GPU-Optimized Pipelines
- π¦ Rust Systems Programming
- π High-Performance Backend Systems
I like understanding how things actually work under the hood β from database WALs and scheduling engines to GPU memory bottlenecks in inference systems.
A timestamp-triggered pub/sub database built in Go, designed to deliver events precisely when scheduled.
Current Focus
- At-least-once delivery semantics
- gRPC-based subscriber architecture
- Timestamp-based scheduling engine
- File-backed storage engine
- Event reliability & persistence
- Distributed event delivery
Tech
Go gRPC Kafka PostgreSQL SQLC
Fine-tuning and experimenting with Med-GEMMA models for medical image understanding.
Current Work
- Medical image-text understanding
- Pneumothorax detection
- Attention heatmap visualization
- Explainable multimodal AI
- GPU memory optimization
Models
Med-GEMMA-3 Med-GEMMA-4B-IT
I enjoy understanding systems deeply rather than treating them like black boxes.
- Database internals
- Storage engines
- WAL & compaction systems
- Distributed messaging guarantees
- Kafka internals
- Rust systems programming
- LLM inference optimization
- GPU scheduling & memory bottlenecks
- Multimodal model architectures
- Video understanding systems
- Triton & scalable serving systems
Backend
- Go
- Gin
- gRPC
- SQLC
- PostgreSQL
- Kafka (Confluent Stack)
Architecture
- Event-driven systems
- Distributed services
- Modular backend design
- Low-latency pipelines
ML / Inference
- PyTorch
- Triton Inference Server
- vLLM
- SGLang
- Ollama
- llama.cpp
Focus
- Multimodal models
- Efficient inference
- GPU optimization
- Memory-constrained deployment
- Medical AI
- FFmpeg
- NVIDIA NVENC / NVDEC
- Video preprocessing pipelines
- High-throughput decoding
- GPU-accelerated video workflows
I enjoy engineering systems that need to balance:
Performance β‘
Reliability π‘οΈ
Scalability π
Developer Experience π§©I care deeply about:
Concurrency
Latency
Throughput
Fault tolerance
Storage efficiency
System observabilityflowchart LR
A[Publisher Service]
B[gRPC Gateway]
C[Storage Engine]
D[Scheduling Engine]
E[Delivery Pipeline]
F[Subscriber Service]
A --> B
B --> C
D --> C
D --> E
E --> F
Deliver events at or near scheduled timestamps with:
- Reliable persistence
- Event replay capability
- At-least-once guarantees
- Low operational complexity
Currently exploring:
Going from:
Syntax β Systems Programming β Advanced Problem SolvingLearning through building:
- Storage engines
- Scheduling systems
- File-backed persistence
- WAL patterns
- LSM-tree concepts
Deep diving into:
- Triton Inference Server
- vLLM internals
- Scalable multimodal inference
- GPU scheduling
- Efficient serving architectures
Understand systems deeply.
Build things from first principles.
Optimize for reliability.
Learn publicly.
Stay curious."Build things that teach you how systems actually work."



