Skip to content
View aneessaheba's full-sized avatar

Block or report aneessaheba

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
aneessaheba/README.md

Hi, I'm Anees Saheba Guddi

Generative AI & LLMs | Distributed Systems | Data Engineering

MS in Applied Data Intelligence @ San José State University

Email LinkedIn Portfolio GitHub YouTube LeetCode Medium Tableau


About Me

MS in Applied Data Intelligence student at San José State University (2025–2027) specializing in Generative AI, LLM Fine-tuning, and Machine Learning Model Development. I architect intelligent systems that combine cutting-edge AI capabilities with production-grade software engineering—from fine-tuning large language models and training deep learning networks to building agentic workflows and distributed architectures.

My expertise spans the full machine learning lifecycle: designing and training neural networks from scratch, fine-tuning foundation models for domain-specific tasks, architecting LLM-based agentic systems with LangChain and LangGraph, and deploying scalable microservices. With professional experience as a Software Development Engineer at HP Inc., I bring hands-on knowledge in building production AI systems, distributed data pipelines, and intelligent automation solutions.

Background

  • Former Software Development Engineer at HP Inc. (Jul 2023 – Aug 2024)
  • B.E. in Information Science and Engineering from Visvesvaraya Technological University (2019-2023)
  • Based in San Jose, California | Originally from Bangalore, India

Currently Working On

  • Building production-ready agentic AI systems with LangChain, LangGraph, and multi-tool orchestration
  • Exploring MLOps workflows, LLM fine-tuning techniques, and distributed machine learning training
  • Deepening knowledge in scalable data platforms and real-time streaming architectures
  • Contributing to open-source AI/ML projects and sharing insights through technical writing
  • Creating educational content on Gen AI and Data Engineering on Medium and YouTube

Projects

Generative AI & Agentic Systems

Project Name Details Repository Link
RAG Tax Advisory System for International Students
Python · LangChain · ChromaDB · Elasticsearch · BM25 · PyMuPDF · sentence-transformers · LLaMA · Google Gemini
RAG-based chatbot answering U.S. tax questions for international students, grounded in 41 real IRS documents (publications, forms, tax treaties, university guides) extracted page-by-page with PyMuPDF and split into 2,247 chunks. Hybrid retrieval (vector search + BM25 via Reciprocal Rank Fusion) boosted hit rate from 70% to 100%. Dual safety guards, personalized answers conditioned on 7 student profile attributes, and a 5-metric evaluation framework achieving a final LLM-as-a-Judge score of 0.770. GitHub · Live
Multi-Agent Blog System
Ollama · Docker · AWS ECS · HTML/CSS/JS
Multi-agent workflow using Ollama LLMs (Planner, Reviewer, Finalizer) for automated blog content creation. Web front-end for blog submission with HTML, CSS, and JavaScript. Deployed on Docker + AWS ECS integrating lightweight local LLMs (smollm:1.7b, Phi3:mini). Automated outputs include tags, summaries, and a publishable content package. GitHub
AI Memory Chatbot Agent
FastAPI · MongoDB · Google Gemini · Motor
Intelligent chatbot with multi-tiered memory architecture: short-term conversational memory, session-based summaries, lifetime user context condensation, and episodic memory retrieval with vector embeddings. Automatic memory consolidation, importance-weighted fact extraction, and context-aware responses using Google Generative AI. GitHub
Bike-Share Pass Optimizer
ReAct · MRKL · DuckDB · Express
Single-agent ReAct + MRKL workflow analyzing Divvy bike-share trip data to recommend membership vs pay-per-ride pricing. Custom tools (CSV SQL via DuckDB, policy retrieval with web scraping, calculator) with transparent Thought → Action → Observation traces and policy citations for decision justification. GitHub
Career Counseling Agent
Streamlit · Gemini · LangChain
AI-powered career planning assistant with Skills Gap Analyzer, Resume Scorer with improvement suggestions (0–10 scale), Salary Estimator, and Interview Question Generator for personalized career guidance. GitHub
Airbnb Prototype with Agentic AI
LangChain · FastAPI · React · MySQL
Full-stack Airbnb-style platform with property listings, bookings, and secure authentication. Agentic AI Concierge using LangChain to generate personalized travel plans and recommendations. LLM-driven workflows integrated with backend APIs for context-aware, goal-oriented user interactions. GitHub

Data Engineering & Analytics

Project Name Details Repository Link
Stock Data ETL & Data Warehouse Pipeline
PostgreSQL · Docker · ETL · Dimensional Modeling · Apache Airflow · Kafka · TimescaleDB
ETL pipeline for stock market data integrating multiple sources and automating ingestion across 50,567 records with 9 daily Airflow DAGs. Star-schema dimensional data warehouse in TimescaleDB for structured financial analysis. Dockerized workflows for reproducible deployments. 13 analytics visualizations in Tableau and Apache Superset covering YoY trends, volatility analysis, ROE rankings, and correlation heatmaps. GitHub
Real-Time Flight Delay Prediction Pipeline
Apache Kafka · Apache Spark · PySpark · HDFS · Apache Airflow · Docker
End-to-end big data ML pipeline ingesting 19M+ flight records into HDFS, training GBT and Logistic Regression classifiers with Spark MLlib (3-fold CrossValidator), achieving AUC-ROC 0.94 and F1 0.90. Real-time streaming inference with Kafka + Spark Structured Streaming at 11,648 events/sec (23× above target). Serialized PipelineModels to HDFS eliminating training/serving feature skew across batch and streaming paths. GitHub
Comprehensive Public Health Analytics Dashboard
Python · SQL · Tableau · Pandas · CDC Socrata API
Multi-source pipeline aggregating CDC PLACES, CDC BRFSS, SAMHSA, and WHO data across 721 US counties. Statistical analysis identifying a significant obesity–diabetes correlation (Pearson r=0.79, R²=0.63, p=1.56e-137) across 630 counties. Tableau dashboards with county-level choropleth maps, regional bar charts, and scatter analytics for non-technical stakeholders. GitHub
Spotify Data Analysis
AWS Glue · Snowflake · Power BI
ETL pipeline with Spotify API, AWS Glue, and Snowflake. Interactive Power BI dashboards delivering insights on peak listening hours, weekend patterns, and top artists/tracks. GitHub
Retail Orders Analytics Project
Python · Pandas · SQL Server
End-to-end data pipeline processing a retail orders dataset with Python and Pandas, loaded into SQL Server. Advanced analytics identifying top-performing products, regional sales patterns, monthly trends, and year-over-year growth metrics. GitHub

Machine Learning & Computer Vision

Project Name Details Repository Link
EgoHomes: Egocentric Household Activity Dataset
Python · MediaPipe · YOLOv8 · SAM 3 · Whisper · OpenCV · FFmpeg · MLX
Fully automated multimodal annotation pipeline for egocentric household activity video, producing synchronized hand pose, segmentation masks, depth maps, and narration transcripts per frame for robotics foundation model pretraining. Integrates MediaPipe HandLandmarker (21-point hand skeleton), YOLOv8 pose estimation, SAM 3 for segmentation, and Whisper for narration. JSON frame-level annotation schemas with automated quality control. Targeting open release alongside a VLA research paper. GitHub · Live
4DX Movie Technology Using ML
TensorFlow · CNN · Python · OpenCV · Audio Processing
CNN-based system processing synchronized audio-visual streams to detect dynamic movie events in real-time and trigger physical theater effects (water, wind, seat motion) with millisecond-level precision for immersive 4DX experiences.
Face Mask Detection Using ML
MobileNetV2 · OpenCV · TensorFlow · Python
Real-time face mask detection using transfer learning with MobileNetV2, achieving 95%+ accuracy at 30+ FPS. OpenCV-based face detection with multi-face classification, optimized for edge deployment.
Credit Card Fraud Detection
PCA · Random Forest · Isolation Forest · Python · scikit-learn
Anomaly detection pipeline for fraudulent transactions in highly imbalanced datasets using PCA dimensionality reduction and ensemble methods (Isolation Forest + Random Forest) with SMOTE oversampling and precision-recall optimization.

Software Engineering & Data Structures

Project Name Details Repository Link
CheckMyGrade OOP Python
Python · OOP · CSV · Encryption
Console-based student grade management using OOP and CSV persistence. CRUD, search, sort with timing analysis, data encryption, academic reports, and statistical analytics. Array and linked-list backends with role-based menus and comprehensive unit tests for performance validation. GitHub
Stock Analysis Application
Python · OOP · GUI · SQLite
Object-oriented stock tracking application with console and GUI interfaces. Embedded SQLite database for saving and retrieving stock data, historical price tracking from web APIs and CSV imports, profit/loss report generation, and interactive chart visualization. GitHub
Distributed Kayak Travel Booking System
FastAPI · Kafka · MySQL · MongoDB · Redis
Distributed travel booking system supporting search, booking, billing, and analytics for flights, hotels, and cars. FastAPI microservices with Kafka and relational + NoSQL databases. AI-powered recommendation service for personalized travel deals and real-time updates with resilient, high-throughput infrastructure. GitHub

Professional Experience

Hewlett Packard (HP) | Bengaluru, India

Software Development Engineer | Jul 2023 – Aug 2024

  • Implemented rule-based chatbots for Printer Customer Support to guide users through common troubleshooting
  • Prepared and organized data from customer support transcripts and internal troubleshooting documents
  • Performed basic text cleaning and keyword extraction to map user queries to predefined intents
  • Built decision-based conversation flows using simple rules, conditional logic, and fallback responses
  • Integrated chatbot logic with backend support APIs to fetch device status and recommended actions
  • Conducted limited exploration with early LLM tools to assess potential improvements in response quality and coverage

Pheuna Technology | Bengaluru, India

Software Engineer Intern | May 2022 – Aug 2022

  • Designed RESTful APIs using Node.js and Express with Sequelize ORM for real-time event-driven systems
  • Implemented Kafka producers and consumers for distributed message processing
  • Built a cross-platform mobile dashboard using React and Ionic for real-time monitoring

Technical Skills

Generative AI & LLMs

Gen AI APIs Prompt Engineering LangChain LangGraph Tool Calling RAG Vector Databases OpenAI Google Gemini HuggingFace Fine-tuning LoRA Ollama

Machine Learning & Deep Learning

Model Training Supervised Learning Unsupervised Learning Neural Networks CNN RNN Transfer Learning Feature Engineering PyTorch TensorFlow scikit-learn Keras

Programming & Frameworks

Python SQL Go TypeScript NumPy Pandas FastAPI Streamlit Flask Node.js Express React HTML5 CSS3 JavaScript

Data & Cloud Systems

PostgreSQL MySQL MongoDB Redis SQLite DuckDB ElasticSearch Docker AWS AWS SageMaker Amazon S3 Amazon EC2 AWS ECS Google Cloud Kubernetes Terraform

Data Engineering & Big Data

Apache Kafka Apache Spark Apache Airflow Apache Hadoop ETL Snowflake AWS Glue TimescaleDB

Data Analysis & Visualization

Matplotlib Seaborn Plotly Tableau Power BI Apache Superset

Tools & Development

Git GitHub VS Code Jupyter Google Colab Postman Jira


Education

San José State University | San Jose, CA
Master of Science in Applied Data Intelligence | Jan 2025 – May 2027 | GPA: 3.5/4.0

Relevant Coursework: Gen AI LLMs, Agentic AI, Machine Learning, Deep Learning, Big Data Algorithms, Distributed Systems, Scalable Data Platforms

Visvesvaraya Technological University | Karnataka, India
Bachelor of Engineering in Information Science and Engineering | Aug 2019 – Jun 2023 | GPA: 7.9/10.0

Relevant Coursework: Data Structures and Algorithms, Database Systems, Software Engineering


GitHub Statistics

GitHub Stats Top Languages

GitHub Streak

GitHub Trophies

Contribution Graph

LeetCode Statistics

LeetCode Stats


Connect With Me

Email LinkedIn Portfolio GitHub YouTube LeetCode Medium Tableau

Popular repositories Loading

  1. airbnb-agentic-ai airbnb-agentic-ai Public

    Full-stack Airbnb-style rental platform with a LangGraph multi-agent AI concierge powered by Google Gemini. Features SSE streaming, tool calling, async pipelines, and Kubernetes deployment on AWS E…

    JavaScript 1

  2. distributed-kayak-booking-system distributed-kayak-booking-system Public

    A distributed Kayak-inspired travel booking system with microservices, Kafka event streaming, Redis caching, MySQL, MongoDB, and an AI concierge agent powered by Gemini 2.5, RAG pipeline, and QLoRA…

    JavaScript 1

  3. RAG-Tax-Advisory-System-for-Intl-Students RAG-Tax-Advisory-System-for-Intl-Students Public

    Production RAG system for international student tax advisory. Hybrid Elasticsearch+BM25 retrieval, LangChain LCEL, LLaMA/Gemini routing, RAGAS evaluation (GPT-4 judge), Prometheus/Grafana observabi…

    Python 1

  4. hadoop-news-analytics hadoop-news-analytics Public

    Distributed word frequency analysis on 5,000 HuffPost news headlines using Apache Hadoop MapReduce and mrjob. Single-node cluster on Docker with HDFS and YARN configured from scratch. Top 50 keywor…

    Python 1

  5. realtime-market-analytics-kafka-spark-hive realtime-market-analytics-kafka-spark-hive Public

    Real-time stock market analytics pipeline using Apache Kafka, Spark Structured Streaming, and Hive. Simulates live OHLC bar data, computes windowed trend signals (BULLISH/BEARISH/NEUTRAL), and visu…

    Python 1

  6. checkmygrade_python_application_UI checkmygrade_python_application_UI Public

    Python LAB

    Python