MS in Applied Data Intelligence student at San José State University (2025–2027) specializing in Generative AI, LLM Fine-tuning, and Machine Learning Model Development. I architect intelligent systems that combine cutting-edge AI capabilities with production-grade software engineering—from fine-tuning large language models and training deep learning networks to building agentic workflows and distributed architectures.
My expertise spans the full machine learning lifecycle: designing and training neural networks from scratch, fine-tuning foundation models for domain-specific tasks, architecting LLM-based agentic systems with LangChain and LangGraph, and deploying scalable microservices. With professional experience as a Software Development Engineer at HP Inc., I bring hands-on knowledge in building production AI systems, distributed data pipelines, and intelligent automation solutions.
Background
- Former Software Development Engineer at HP Inc. (Jul 2023 – Aug 2024)
- B.E. in Information Science and Engineering from Visvesvaraya Technological University (2019-2023)
- Based in San Jose, California | Originally from Bangalore, India
- Building production-ready agentic AI systems with LangChain, LangGraph, and multi-tool orchestration
- Exploring MLOps workflows, LLM fine-tuning techniques, and distributed machine learning training
- Deepening knowledge in scalable data platforms and real-time streaming architectures
- Contributing to open-source AI/ML projects and sharing insights through technical writing
- Creating educational content on Gen AI and Data Engineering on Medium and YouTube
| Project Name | Details | Repository Link |
|---|---|---|
|
RAG Tax Advisory System for International Students Python · LangChain · ChromaDB · Elasticsearch · BM25 · PyMuPDF · sentence-transformers · LLaMA · Google Gemini |
RAG-based chatbot answering U.S. tax questions for international students, grounded in 41 real IRS documents (publications, forms, tax treaties, university guides) extracted page-by-page with PyMuPDF and split into 2,247 chunks. Hybrid retrieval (vector search + BM25 via Reciprocal Rank Fusion) boosted hit rate from 70% to 100%. Dual safety guards, personalized answers conditioned on 7 student profile attributes, and a 5-metric evaluation framework achieving a final LLM-as-a-Judge score of 0.770. | GitHub · Live |
|
Multi-Agent Blog System Ollama · Docker · AWS ECS · HTML/CSS/JS |
Multi-agent workflow using Ollama LLMs (Planner, Reviewer, Finalizer) for automated blog content creation. Web front-end for blog submission with HTML, CSS, and JavaScript. Deployed on Docker + AWS ECS integrating lightweight local LLMs (smollm:1.7b, Phi3:mini). Automated outputs include tags, summaries, and a publishable content package. | GitHub |
|
AI Memory Chatbot Agent FastAPI · MongoDB · Google Gemini · Motor |
Intelligent chatbot with multi-tiered memory architecture: short-term conversational memory, session-based summaries, lifetime user context condensation, and episodic memory retrieval with vector embeddings. Automatic memory consolidation, importance-weighted fact extraction, and context-aware responses using Google Generative AI. | GitHub |
|
Bike-Share Pass Optimizer ReAct · MRKL · DuckDB · Express |
Single-agent ReAct + MRKL workflow analyzing Divvy bike-share trip data to recommend membership vs pay-per-ride pricing. Custom tools (CSV SQL via DuckDB, policy retrieval with web scraping, calculator) with transparent Thought → Action → Observation traces and policy citations for decision justification. | GitHub |
|
Career Counseling Agent Streamlit · Gemini · LangChain |
AI-powered career planning assistant with Skills Gap Analyzer, Resume Scorer with improvement suggestions (0–10 scale), Salary Estimator, and Interview Question Generator for personalized career guidance. | GitHub |
|
Airbnb Prototype with Agentic AI LangChain · FastAPI · React · MySQL |
Full-stack Airbnb-style platform with property listings, bookings, and secure authentication. Agentic AI Concierge using LangChain to generate personalized travel plans and recommendations. LLM-driven workflows integrated with backend APIs for context-aware, goal-oriented user interactions. | GitHub |
| Project Name | Details | Repository Link |
|---|---|---|
|
Stock Data ETL & Data Warehouse Pipeline PostgreSQL · Docker · ETL · Dimensional Modeling · Apache Airflow · Kafka · TimescaleDB |
ETL pipeline for stock market data integrating multiple sources and automating ingestion across 50,567 records with 9 daily Airflow DAGs. Star-schema dimensional data warehouse in TimescaleDB for structured financial analysis. Dockerized workflows for reproducible deployments. 13 analytics visualizations in Tableau and Apache Superset covering YoY trends, volatility analysis, ROE rankings, and correlation heatmaps. | GitHub |
|
Real-Time Flight Delay Prediction Pipeline Apache Kafka · Apache Spark · PySpark · HDFS · Apache Airflow · Docker |
End-to-end big data ML pipeline ingesting 19M+ flight records into HDFS, training GBT and Logistic Regression classifiers with Spark MLlib (3-fold CrossValidator), achieving AUC-ROC 0.94 and F1 0.90. Real-time streaming inference with Kafka + Spark Structured Streaming at 11,648 events/sec (23× above target). Serialized PipelineModels to HDFS eliminating training/serving feature skew across batch and streaming paths. | GitHub |
|
Comprehensive Public Health Analytics Dashboard Python · SQL · Tableau · Pandas · CDC Socrata API |
Multi-source pipeline aggregating CDC PLACES, CDC BRFSS, SAMHSA, and WHO data across 721 US counties. Statistical analysis identifying a significant obesity–diabetes correlation (Pearson r=0.79, R²=0.63, p=1.56e-137) across 630 counties. Tableau dashboards with county-level choropleth maps, regional bar charts, and scatter analytics for non-technical stakeholders. | GitHub |
|
Spotify Data Analysis AWS Glue · Snowflake · Power BI |
ETL pipeline with Spotify API, AWS Glue, and Snowflake. Interactive Power BI dashboards delivering insights on peak listening hours, weekend patterns, and top artists/tracks. | GitHub |
|
Retail Orders Analytics Project Python · Pandas · SQL Server |
End-to-end data pipeline processing a retail orders dataset with Python and Pandas, loaded into SQL Server. Advanced analytics identifying top-performing products, regional sales patterns, monthly trends, and year-over-year growth metrics. | GitHub |
| Project Name | Details | Repository Link |
|---|---|---|
|
EgoHomes: Egocentric Household Activity Dataset Python · MediaPipe · YOLOv8 · SAM 3 · Whisper · OpenCV · FFmpeg · MLX |
Fully automated multimodal annotation pipeline for egocentric household activity video, producing synchronized hand pose, segmentation masks, depth maps, and narration transcripts per frame for robotics foundation model pretraining. Integrates MediaPipe HandLandmarker (21-point hand skeleton), YOLOv8 pose estimation, SAM 3 for segmentation, and Whisper for narration. JSON frame-level annotation schemas with automated quality control. Targeting open release alongside a VLA research paper. | GitHub · Live |
|
4DX Movie Technology Using ML TensorFlow · CNN · Python · OpenCV · Audio Processing |
CNN-based system processing synchronized audio-visual streams to detect dynamic movie events in real-time and trigger physical theater effects (water, wind, seat motion) with millisecond-level precision for immersive 4DX experiences. | — |
|
Face Mask Detection Using ML MobileNetV2 · OpenCV · TensorFlow · Python |
Real-time face mask detection using transfer learning with MobileNetV2, achieving 95%+ accuracy at 30+ FPS. OpenCV-based face detection with multi-face classification, optimized for edge deployment. | — |
|
Credit Card Fraud Detection PCA · Random Forest · Isolation Forest · Python · scikit-learn |
Anomaly detection pipeline for fraudulent transactions in highly imbalanced datasets using PCA dimensionality reduction and ensemble methods (Isolation Forest + Random Forest) with SMOTE oversampling and precision-recall optimization. | — |
| Project Name | Details | Repository Link |
|---|---|---|
|
CheckMyGrade OOP Python Python · OOP · CSV · Encryption |
Console-based student grade management using OOP and CSV persistence. CRUD, search, sort with timing analysis, data encryption, academic reports, and statistical analytics. Array and linked-list backends with role-based menus and comprehensive unit tests for performance validation. | GitHub |
|
Stock Analysis Application Python · OOP · GUI · SQLite |
Object-oriented stock tracking application with console and GUI interfaces. Embedded SQLite database for saving and retrieving stock data, historical price tracking from web APIs and CSV imports, profit/loss report generation, and interactive chart visualization. | GitHub |
|
Distributed Kayak Travel Booking System FastAPI · Kafka · MySQL · MongoDB · Redis |
Distributed travel booking system supporting search, booking, billing, and analytics for flights, hotels, and cars. FastAPI microservices with Kafka and relational + NoSQL databases. AI-powered recommendation service for personalized travel deals and real-time updates with resilient, high-throughput infrastructure. | GitHub |
Software Development Engineer | Jul 2023 – Aug 2024
- Implemented rule-based chatbots for Printer Customer Support to guide users through common troubleshooting
- Prepared and organized data from customer support transcripts and internal troubleshooting documents
- Performed basic text cleaning and keyword extraction to map user queries to predefined intents
- Built decision-based conversation flows using simple rules, conditional logic, and fallback responses
- Integrated chatbot logic with backend support APIs to fetch device status and recommended actions
- Conducted limited exploration with early LLM tools to assess potential improvements in response quality and coverage
Software Engineer Intern | May 2022 – Aug 2022
- Designed RESTful APIs using Node.js and Express with Sequelize ORM for real-time event-driven systems
- Implemented Kafka producers and consumers for distributed message processing
- Built a cross-platform mobile dashboard using React and Ionic for real-time monitoring
San José State University | San Jose, CA
Master of Science in Applied Data Intelligence | Jan 2025 – May 2027 | GPA: 3.5/4.0
Relevant Coursework: Gen AI LLMs, Agentic AI, Machine Learning, Deep Learning, Big Data Algorithms, Distributed Systems, Scalable Data Platforms
Visvesvaraya Technological University | Karnataka, India
Bachelor of Engineering in Information Science and Engineering | Aug 2019 – Jun 2023 | GPA: 7.9/10.0
Relevant Coursework: Data Structures and Algorithms, Database Systems, Software Engineering
