Skip to content
View dokababa's full-sized avatar

Block or report dokababa

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
dokababa/README.md
Typing SVG

LinkedIn Portfolio Email Visitors


📊 · Daily Classic ML Dataset Spotlight  · 

Heart Disease (Cleveland)
📁 Samples 303
🏷️ Features 13 (age, cholesterol, ECG…)
🎯 Classes 2 (presence/absence)
📅 Introduced 1988 · UCI · Detrano et al.
💡 Fun fact Despite only 303 samples, it's been cited 3,000+ times — proof that a tiny but well-curated dataset can outlast massive ones
No disease     ███████████░░░░░░░░░ 54%
Disease        █████████░░░░░░░░░░░ 46%

⚙️ Auto-updates daily to spotlight a new dataset. Check back tomorrow!


· About Me ·

MS Data Analytics student at UIS and ex-Data Analyst at Tredence Analytics (2+ yrs, Fortune 500 client). I build end-to-end ML pipelines, predictive models, and data visualizations — from churn models driving $4M/month revenue to fine-tuning transformers with QLoRA on a single GPU.

  • 💼 Ex-Data Analyst @ Tredence Analytics · Bangalore, Karnataka (India)
  • 🥽 Graduate Research Assistant (VR Data Visualization Research | Meta Quest 3 + Unity)
  • 🎓 Master of Science in Data Analytics · University of Illinois at Springfield (2024–2025)   
  • 🎓 Bachelor of Technology in Computer Science & Engineering · National Institute of Technology, Hamirpur (2018–2022)   
  • 📍 Based out of Springfield, IL (USA)   

· Tech Stack ·

ML & Deep Learning

TensorFlow PyTorch Keras scikit-learn HuggingFace W&B

Languages & Scripting

Python SQL R C# Bash

Data & Visualization

Pandas NumPy Tableau Power BI Plotly Matplotlib

Engineering & Cloud

Apache Airflow AWS GCP Azure Hadoop SparkSQL

Design & XR

Unity Figma Adobe XD


· Projects ·

Project Description Stack Result
🌍 VasuDev Global news intelligence platform — real-time pulse map of every country with multi-perspective AI briefings, freeform chat, and historical Time Machine React · Leaflet · FastAPI · Supabase · Groq PWA · 195 countries
📋 OpenReturn Free AI-powered conversational assistant for US federal tax guidance. Uses RAG over IRS documents (2023–2025) + Llama 3.3 70B via Groq to provide plain-English line-by-line help and generate a personalized PDF roadmap. Streamlit · LangGraph · LangChain · ChromaDB · Groq (Llama 3.3 70B) · ReportLab Live on Hugging Face Spaces
🤙 Formal → Gen Z Translator Fine-tuned FLAN-T5 with QLoRA on 28k pairs; injected 210 emoji/slang tokens HuggingFace · QLoRA · W&B BLEU 42→58 · ROUGE-L 0.54→0.76
🧮 GSM8K Math Reasoning Fine-tuned FLAN-T5 with 4-bit QLoRA on 8.5k math word problems HuggingFace · QLoRA · BitsAndBytes Exact match eval on GSM8K
📈 Time Series Forecasting Oil temp prediction on ETTh1 — LSTM, 1D CNN, and N-BEATS comparison TensorFlow · N-BEATS MAE 0.76°C vs 1.99°C naive baseline (62% ↑)
🖼️ Image Segmentation U-Net Custom U-Net + VGG16 encoder for multi-class semantic segmentation TensorFlow · U-Net · VGG16 MeanIoU across 8 classes
✂️ RPS Classifier (CNN) 5 CNN architectures + DenseNet121 transfer learning for RPS classification TensorFlow · DenseNet121 ~99% val accuracy
✈️ ANN from Scratch — Regression Two-layer → multi-layer ANN built with TensorFlow primitives; flight price regression TensorFlow · GradientTape RMSE vs Linear Regression baseline
🔢 MNIST ANN from Scratch Multi-class ANN using raw TF ops — GradientTape, matmul, He init, softmax TensorFlow · GradientTape ~91% val accuracy
⚡ MNIST ANN — Improved Same ANN upgraded with Adam, mini-batch GD, BatchNorm, dropout, L2, plateau LR TensorFlow · Adam · BatchNorm ~96% val accuracy
🧪 MNIST with Keras Sequential API progression: baseline → BatchNorm → dropout → Hyperband tuning Keras · Keras Tuner · Hyperband 98.3% val accuracy
⚖️ Recidivism Prediction Lasso, Ridge, RF, GBM, SVM, NN + Shapley values & fairness metrics R · caret · glmnet · keras3 RF best at 79.6% · fairness analysis
🐘 Guess the Animal Game Interactive ML game — model IDs your animal via yes/no questions (AWA2, R + Python) Scikit-learn · Decision Tree · R Clean tree traversal game loop
📰 Fake News Detection Naive Bayes vs ANN comparison for fake news classification R · Keras · NLP Naive Bayes 91.1%
📣 Marketing Campaign KNN KNN & Naive Bayes for campaign outcome prediction R · Scikit-learn KNN 75% · NB 72%

"You are one observation in an infinite dataset. Yet the universe only knows itself through observations like you, which makes you, somehow, all of it. ॐ "

Pinned Loading

  1. flan-t5-genz-translator flan-t5-genz-translator Public

    fine tuning Google's Flan T5 to translate formal speech to GenZ meme language

    Jupyter Notebook

  2. guess-the-animal guess-the-animal Public

    interactive machine learning game where the model predicts what animal you're thinking of

    Jupyter Notebook

  3. OpenReturn OpenReturn Public

    free AI powered federal tax guidance

    Python 1

  4. VasuDev VasuDev Public

    global news

    JavaScript