Skip to content
View dankit's full-sized avatar
  • TBD

Block or report dankit

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
dankit/README.md

Modern applied AI/ML
Projects grounded in practicality for today's world— multi-modal models, finetuning, agents, RAG, and other useful AI/ML tooling.

Currently working with reinforcement learning on unique and challenging domains. Ideas include vision language models and captcha solving

lambda_cloud MCP — View Lambda Cloud GPU capacity in near real time: stock, alerts, optional auto-launch (“Snipe”), instance list/terminate—built because GPU availability was the main bottleneck for months. Supports MCP so that users can agentically orchestrate the entire ML training lifecycle via text message, maintaining full productivity & agency even when they are away from the computer.

discord_style_sft — Multi-turn supervised fine-tuning from raw, noisy personal conversational data with focus on high-quality dialogue SFT to instill writing style and tone/behavior. Uses Unsloth's training library for fused MoE kernels & LoRA on qwen3.5-35b-a3b, training based on model interpretability research papers & my own layer/module probing via gradient analysis, vLLM for efficient inference and parallelized multi-modal evals, and more.

Legal Retrieval Augmented Generation — Hosting models locally for embeddings and ranking, Chroma and Elasticsearch, reciprocal rank fusion for hybrid retrieval, agentic search, chatbot layer. Reranker finetuning on Google Cloud Platform (Kubernetes Engine and Compute Engine) with ephemeral/spot GPU considerations—training infrastructure code. High performance on evals across the board: millions of embeddings and 250,000+ PDF pages from real-world data such as the US Code of Federal Regulations. 3,000 downloads on huggingface and actively growing

LLM vram calculator — Heavily vibe coded, but useful for telling if training or inference will OOM; contains interactive visuals showing the impact that different parameters have on vram: sequence length, batch size, gradient checkpointing, LoRA, optimizers, float precisions, and model weights. Refactor planned for the future.

Foundations: applied ML theory & history (click here)

I spent a long stretch going deep on fundamentals—math, classical ML, to modern deep learning—with hands-on implementations. The main idea was to apply what I learned from research papers along the way.

language-model-pretraining — Roughly 450M-parameter modern dense transformer, trained on the order of 10B tokens; distributed data parallel on 8×A100 GPUs, my own training loops with checkpointing and model loading, PyTorch transformer implementation with SwiGLU, RMSNorm, Grouped Query Attention (GQA), RoPE, etc.

llama_3.1_8b_base_sftInstruction tuning the Llama 3.1 8B base checkpoint with LoRA, quantization exploration, complete with evals across the board (tinyMMLU, IFEval).

Attention is all you need — After learning classical ML theory, worked up to understanding and implementing the original transformers paper in PyTorch: sinusoidal positional encodings, LayerNorm, and different flavors of transformers (encoder-only, decoder-only, encoder-decoder). Some statistics learning for LayerNorm/RMSNorm; geometry/trigonometry for positional encodings, whether sinusoidal or eventually RoPE.

Other — The RAG project above touches on encoder-only transformers and their applications: from the original BERT model to variants (e.g. XLM-RoBERTa) for embeddings and reranking, plus how they're trained.

Pinned Loading

  1. lambda_cloud_MCP lambda_cloud_MCP Public

    GPU availability tracker with alerting & auto-provisioning right as they become available. MCP for agentic handling of environment setups. Fully orchestrated via text message with Poke

    TypeScript

  2. discord_style_sft discord_style_sft Public

    Create high quality data from raw, messy noise. Includes gradient probing for efficient LoRA parameter targeting, unsloth fused MoE training, and evaluation pipeline with vLLM/lmms-harness/custom d…

    Python

  3. Legal-RAG Legal-RAG Public

    hosts models locally for embeddings and ranking, chroma + elasticsearch for storage and reciprocal rank fusion, along with agentic search and chatbot implementation

    Python 1

  4. language-model-pretraining language-model-pretraining Public

    450M parameter dense transformer (before weight tying) , inspired by gpt and llama papers and trained on 10B tokens of web data

    Python 1