|
15 | 15 | sections: |
16 | 16 | intro: |
17 | 17 | - "Data and infrastructure engineer with 4 years of experience bridging traditional data engineering with modern AI systems, specializing in Python, SQL, and cloud platforms (AWS/Azure/GCP). Expertise in data architecture, warehousing, data lakes, and governance." |
18 | | - - "Proven track record of exceptional performance optimization—including **reducing runtime from 27.5-hours to under 5 seconds**, building scalable data pipelines, and production **Agentic LLM systems (>95% accuracy) on H100 clusters**." |
19 | | - - "Delivered measurable business outcomes: data infrastructure enabling a startup's **$6M Series A**, GDPR-compliant extracts for Advent International investor review, enterprise compliance for a Fortune 500 healthcare client, and real-time **data pipelines saving 10-15 FTEs**." |
| 18 | + - "Proven track record of exceptional performance optimization (reducing a financial procedure **from 27.5h to under 5s**), building production **Agentic LLM systems (>95% accuracy) on H100 clusters**, and authoring **open-source tools with 27,000+ PyPI downloads**." |
| 19 | + - "Delivered measurable business outcomes: data infrastructure enabling a startup's **$6M Series A**, unblocking **investor audits for Advent International**, ensuring enterprise compliance for a Fortune 500 client, and automating workflows to save **10-15 FTEs**." |
20 | 20 | technologies: |
21 | 21 | - label: "Data Engineering" |
22 | 22 | details: "SQL, Databricks, BigQuery, MS Fabric, PySpark, DuckDB, DBT, Airflow, Kafka, ADF, Delta Lake, Pandas, Presto/Trino" |
23 | 23 | - label: "AI & LLM Ops" |
24 | | - details: "OpenAI, Gemini, LangChain, RAG, Agentic AI (MCP, A2A), Fine-tuning (SFT/DPO), Vector DBs (Chroma/FAISS), Neo4j" |
| 24 | + details: "OpenAI, Gemini, LangChain, RAG, Agentic AI (MCP, A2A), Fine-tuning (SFT/DPO), Vector DBs (Chroma/Qdrant), Neo4j" |
25 | 25 | - label: "Backend, DevOps & Viz" |
26 | 26 | details: "Python, BASH, Git, FastAPI, PostgreSQL, MongoDB, AWS, Azure, GCP, Docker, K8s, Terraform, CI/CD, Power BI" |
27 | 27 | experience: |
|
36 | 36 | - "**Johnson & Johnson (Healthcare CDP)**: Architected scalable Treasure Data pipelines for regulated healthcare markets (JP/ANZ). Engineered complex transformation logic, including a 12-scenario SQL truth table (validated via TDD) to resolve hierarchical consent conflicts and a custom Python engine to overcome SFMC platform limitations, ensuring strict **GDPR/PII compliance**." |
37 | 37 | - "**Wade Insight (Cloud Migration)**: Led enterprise migration from Azure Data Factory to **Microsoft Fabric Data Factory**, managing ARM template adaptation. Enhanced the SaaS platform with advanced continue-on-failure orchestration and automated health-check processes, reducing manual troubleshooting by 80%." |
38 | 38 | - "**Prospexs (GenAI Product Engineering)**: Built an AI outreach platform (Python/FastAPI/MongoDB) backed by a **700M+ contact database**. Engineered a complex 4-entity personalization engine (Sender/Receiver × Human/Company) using OpenAI/Perplexity to generate bilingual, context-aware communications, **improving response rates by 45%**." |
39 | | - - "**QxLab (LLM Infrastructure)**: Led a 4-person team fine-tuning Llama/Mistral models using Axolotl on **H100 clusters** to achieve **\\>95% accuracy** in a production DAG-based agentic system. Separately, architected a Terabyte-scale cascading deduplication pipeline (MinHash LSH + FAISS) to standardize datasets (SFT/DPO/RLHF). Later open-sourced as DatasetPipeline." |
| 39 | + - "**QxLab (LLM Infrastructure)**: Led a 4-person team fine-tuning Llama/Mistral models using Axolotl on **H100 clusters** to achieve **\\>95% accuracy** in a production DAG-based agentic system. Separately, architected a Terabyte-scale cascading deduplication pipeline (MinHash LSH + FAISS) to standardize datasets (SFT/DPO/RLHF). Later open-sourced as **DatasetPipeline (6.7k+ downloads)**." |
40 | 40 | - "**CV Advisors (Performance Engineering)**: Replaced a 1900-line MSSQL procedure (iterative cursors) into vectorized logic (**DuckDB**/Pandas), processing 80M+ records across 150 clients. **Reduced runtime from 27.5h to \\<5s (\\>99.9% gain)**. Engineered a future-proof architecture capable of scaling beyond RAM limits via memory-optimized categorical typing and DuckDB’s out-of-core processing." |
41 | 41 | # - "**Logical Contract (Legal Tech)**: Implemented an AI-powered legal tech system for generating tailored employment agreements and a legal chatbot for startup inquiries." |
42 | 42 | - "**SlideNinja (GenAI RAG Architecture)**: Developed a GenAI RAG platform (LangChain/ChromaDB) for a **McKinsey partner** during the early GenAI boom, featuring a 'Self-Healing' AI orchestration layer and proprietary 'Geometric Layout Analysis' to map unstructured content into rigid corporate PowerPoint templates." |
|
59 | 59 | start_date: "2018-07" |
60 | 60 | end_date: "2022-06" |
61 | 61 | highlights: |
62 | | - - "**CGPA:** 8.57/10 ([View Certificate](https://drive.google.com/file/d/1xX8XtAlEEnXSgkcJ3jYmYSJOQWRylA6S/view))" |
| 62 | + - "**CGPA:** 8.57/10 ([B.Tech Certificate](https://drive.google.com/file/d/1xX8XtAlEEnXSgkcJ3jYmYSJOQWRylA6S/view))" |
63 | 63 | - "**Google DSC Lead (2020-2021):** Led developer community, organized workshops and tech talks. [Google Dev Profile](https://g.dev/subhayu99)" |
64 | | - - "**CodeChef Chapter Lead (2020-2021):** Hosted 4 competitive programming events (4 to 21 students). [View Certificate](https://drive.google.com/file/d/1NYokFweEGTAfgL3sZO6qeEiyC6HD9pQ7/view)" |
| 64 | + - "**CodeChef Chapter Lead (2020-2021):** Hosted 4 competitive programming events (4 to 21 students). [CodeChef Certificate](https://drive.google.com/file/d/1NYokFweEGTAfgL3sZO6qeEiyC6HD9pQ7/view)" |
65 | 65 | - "**Linux Foundation Scholarship Recipient:** Merit selection for LFS201 Linux System Administration. [Credly Profile](https://www.credly.com/users/subhayu99/badges)" |
66 | 66 | professional_projects: |
67 | 67 | # Website-only projects (show_on_resume: false) |
@@ -172,52 +172,55 @@ cv: |
172 | 172 | - "Created and managed CI/CD pipelines on Azure DevOps, impacting over 50k daily users." |
173 | 173 |
|
174 | 174 | personal_projects: |
175 | | - # TODO: Need to update the downloads count for each as they have grown quite a bit now |
176 | 175 | # Resume + Website projects |
177 | 176 | - name: "SQLStream" |
178 | 177 | date: "Nov 2025" |
179 | 178 | show_on_resume: true |
180 | 179 | highlights: |
181 | | - - "Created a **zero-setup SQL tool** to quickly analyze data files (CSV, Parquet, Markdown, HTML, etc.) via a CLI or Python library, streamlining repetitive data quality checks without database overhead." |
| 180 | + - "Created a **zero-setup SQL tool** to quickly analyze data files (CSV, Parquet, Markdown, HTML) via CLI or Python, streamlining repetitive data quality checks without DB overhead." |
182 | 181 | - "Built a powerful interactive shell with syntax highlighting and history for data exploration, alongside a standard query mode for piping results into other terminal tools." |
183 | | - - "Engineered a flexible system that auto-switches between **DuckDB**, Pandas, or native Python to execute queries, implementing smart optimizations to filter data before loading; published on [PyPI](https://pypi.org/project/sqlstream) with full [documentation](https://subhayu99.github.io/sqlstream)." |
184 | | - - name: "DatasetPipeline" |
185 | | - date: "May 2025" |
186 | | - show_on_resume: true |
187 | | - highlights: |
188 | | - - "Developed a **production-ready CLI tool** for transforming messy datasets into ML-ready formats; supports **SFT, DPO, semantic deduplication**, and quality analysis with plugin architecture." |
189 | | - - "Features smart role mapping, auto-formatting for OpenAI-style training, and reproducible workflows via YAML/JSON configuration for enterprise ML pipelines." |
190 | | - - "Published on [PyPI](https://pypi.org/project/datasetpipeline) and open-sourced on [GitHub](https://github.com/subhayu99/datasetpipeline) with extensible architecture for custom loaders, formatters, and analyzers." |
| 182 | + - "Engineered a flexible engine that auto-switches between **DuckDB**, Pandas, or native Python to execute queries; published on [PyPI](https://pypi.org/project/sqlstream) (**3.4k+ downloads**) with full [documentation](https://subhayu99.github.io/sqlstream)." |
| 183 | + |
191 | 184 | - name: "Smart Commit" |
192 | 185 | date: "May 2025" |
193 | 186 | show_on_resume: true |
194 | 187 | highlights: |
195 | | - - "Built an AI-powered CLI tool using Python/Typer that generates context-aware git commits via OpenAI or Anthropic models, enhancing developer productivity and project history quality." |
196 | | - - "Intelligently adapts to different projects by analyzing the existing tech stack, commit history, and file changes, ensuring contextually relevant and consistent messages." |
197 | | - - "Published on [PyPI](https://pypi.org/project/smart-commit-ai) and [GitHub](https://github.com/subhayu99/smart-commit) with **Model Context Protocol (MCP) server** for direct AI assistant integration." |
| 188 | + - "Built an AI-powered CLI tool using Python/Typer that generates context-aware git commits via OpenAI or Anthropic models, enhancing developer productivity." |
| 189 | + - "Intelligently adapts to different projects by analyzing the existing tech stack, commit history, and file changes, ensuring contextually relevant messages." |
| 190 | + - "Published on [PyPI](https://pypi.org/project/smart-commit-ai) (**5k+ downloads**) and [GitHub](https://github.com/subhayu99/smart-commit) with **Model Context Protocol (MCP)** integration." |
| 191 | + |
198 | 192 | - name: "DocumentAccessPOC" |
199 | 193 | date: "Jan 2025" |
200 | 194 | show_on_resume: true |
201 | 195 | highlights: |
202 | | - - "Designed and **built a zero-trust secure document system** to solve granular access control challenges where traditional RBAC/ACLs fail, ensuring data confidentiality even from system administrators." |
203 | | - - "Implemented a robust cryptographic model featuring end-to-end encryption (AES-GCM) and secure key exchange (RSA) to enforce permissions at a data level, not just application logic." |
| 196 | + - "Designed and **built a zero-trust secure document system** to solve granular access control challenges where traditional RBAC/ACLs fail." |
| 197 | + - "Implemented a robust cryptographic model featuring **end-to-end encryption (AES-GCM)** and secure key exchange (RSA) to enforce permissions at a data level." |
204 | 198 | - "Built a FastAPI interface for secure document sharing and revocation; project is on [GitHub](https://github.com/subhayu99/DocumentAccessPOC) with detailed [documentation](https://subhayu99.github.io/DocumentAccessPOC/)." |
205 | 199 |
|
206 | 200 | # Website-only personal projects |
| 201 | + - name: "DatasetPipeline" |
| 202 | + date: "May 2025" |
| 203 | + show_on_resume: false # <--- FALSE because it's already in QxLab Experience |
| 204 | + highlights: |
| 205 | + - "Developed a **production-ready CLI tool** for transforming messy datasets into ML-ready formats; supports **SFT, DPO, semantic deduplication**, and quality analysis." |
| 206 | + - "Features smart role mapping, auto-formatting for OpenAI-style training, and reproducible workflows via YAML/JSON configuration for enterprise ML pipelines." |
| 207 | + - "Published on [PyPI](https://pypi.org/project/datasetpipeline) (**6.7k+ downloads**) and open-sourced on [GitHub](https://github.com/subhayu99/datasetpipeline) with extensible architecture for custom loaders, formatters, and analyzers." |
| 208 | + |
207 | 209 | - name: "creatree" |
208 | 210 | date: "Feb 2025" |
209 | 211 | show_on_resume: false |
210 | 212 | highlights: |
211 | | - - "Developed a Python CLI tool and library for automating directory structure creation from tree-like strings, solving the repetitive manual setup of project scaffolding and templates." |
212 | | - - "Designed for maximum flexibility with both a Python library for programmatic integration and a CLI supporting stdin/pipes, including a unique feature to inject setup comments into new files." |
213 | | - - "Published on [PyPI](https://pypi.org/project/creatree) and open-sourced on [GitHub](https://github.com/subhayu99/creatree) with both programmatic API and CLI interfaces supporting stdin for flexible scripting workflows." |
| 213 | + - "Developed a Python CLI tool and library for automating directory structure creation from tree-like strings, solving repetitive project scaffolding." |
| 214 | + - "Designed for maximum flexibility with both a Python library for programmatic integration and a CLI supporting stdin/pipes." |
| 215 | + - "Published on [PyPI](https://pypi.org/project/creatree) (**4.3k+ downloads**) and open-sourced on [GitHub](https://github.com/subhayu99/creatree)." |
| 216 | + |
214 | 217 | - name: "BetterPassphrase" |
215 | 218 | date: "Jan 2024" |
216 | 219 | show_on_resume: false |
217 | 220 | highlights: |
218 | | - - "Built a Python CLI tool and library for generating secure, memorable passphrases using grammatically correct phrases with customizable word counts, separators, and capitalization options." |
219 | | - - "Implemented probabilistic security analysis with entropy calculations and parts-of-speech parsing to balance memorability with cryptographic strength for authentication systems." |
220 | | - - "Published on [PyPI](https://pypi.org/project/BetterPassphrase) and open-sourced on [GitHub](https://github.com/subhayu99/BetterPassphrase) with comprehensive CLI interface supporting batch generation and file output for enterprise workflows." |
| 221 | + - "Built a Python CLI tool/library for generating secure, memorable passphrases using grammatically correct phrases with customizable word counts." |
| 222 | + - "Implemented probabilistic security analysis with entropy calculations and parts-of-speech parsing to balance memorability with cryptographic strength." |
| 223 | + - "Published on [PyPI](https://pypi.org/project/BetterPassphrase) (**7.6k+ downloads**) and open-sourced on [GitHub](https://github.com/subhayu99/BetterPassphrase) supporting batch generation." |
221 | 224 | - name: "FINADICT - Financial Prediction App" |
222 | 225 | date: "Sep 2021" |
223 | 226 | show_on_resume: false |
|
0 commit comments