I am an AI/ML engineer and researcher focused on building efficient, scalable, and practical AI systems. My interests lie at the intersection of Large Language Models (LLMs), transformer optimization, retrieval systems, and real-world AI infrastructure.
Recently, I have been exploring how modern AI systems can become more compute-efficient and accessible under constrained hardware environments. My research project, NeuroCache, focuses on memory-efficient transformer training through budget-constrained activation offloading and system-level optimization techniques.
Beyond research, I build applied AI systems involving:
- Retrieval-Augmented Generation (RAG)
- LangChain & Agentic Workflows
- AI Automation Pipelines
- Speech & Multimodal AI Systems
- Scalable FastAPI-based AI backends
Portfolio web:- Link
- PyTorch
- TensorFlow
- Scikit-learn
- Hugging Face Transformers
- LangChain
- LangGraph
- RAG Pipelines
- FastAPI
- Docker
- REST APIs
- Async Python
- MongoDB
- Vector Databases
- Efficient Transformer Training
- LLM Optimization
- Memory-Aware AI Systems
- Quantization & PEFT
- Scalable AI Infrastructure
- Multimodal AI
Research project focused on memory-efficient transformer training under low-VRAM GPU environments using controlled activation offloading strategies.
Research Paper:- DOI: 10.13140/RG.2.2.11793.39526
Production-oriented AI trading infrastructure with modular strategy execution, risk-aware automation, and Railway deployment support.
LLM-powered interview platform capable of question generation, response evaluation, and structured AI-driven feedback workflows.
Live:- https://interview-ai-frontend-e5o26ydjv-aayush-kumars-projects-1ae66488.vercel.app/
Multilingual speech emotion detection and empathetic response system integrating speech processing and LLM workflows.
- Efficient training systems for Large Language Models
- Transformer optimization under constrained hardware
- Research in scalable and practical AI infrastructure
- Exploring next-generation AI system architectures
