Skip to content

Commit ab429a9

Browse files
committed
refactor(resume): update project metrics and descriptions
- Update download counts for various open-source projects - Refine project descriptions for clarity and impact - Adjust terminology for consistency and modern relevance - Move DatasetPipeline to website-only projects for clarity These changes enhance the resume's accuracy and presentation by providing up-to-date metrics and clearer project descriptions, ensuring consistency across sections.
1 parent 04ca4cc commit ab429a9

1 file changed

Lines changed: 30 additions & 27 deletions

File tree

resume.yaml

Lines changed: 30 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -15,13 +15,13 @@ cv:
1515
sections:
1616
intro:
1717
- "Data and infrastructure engineer with 4 years of experience bridging traditional data engineering with modern AI systems, specializing in Python, SQL, and cloud platforms (AWS/Azure/GCP). Expertise in data architecture, warehousing, data lakes, and governance."
18-
- "Proven track record of exceptional performance optimization—including **reducing runtime from 27.5-hours to under 5 seconds**, building scalable data pipelines, and production **Agentic LLM systems (>95% accuracy) on H100 clusters**."
19-
- "Delivered measurable business outcomes: data infrastructure enabling a startup's **$6M Series A**, GDPR-compliant extracts for Advent International investor review, enterprise compliance for a Fortune 500 healthcare client, and real-time **data pipelines saving 10-15 FTEs**."
18+
- "Proven track record of exceptional performance optimization (reducing a financial procedure **from 27.5h to under 5s**), building production **Agentic LLM systems (>95% accuracy) on H100 clusters**, and authoring **open-source tools with 27,000+ PyPI downloads**."
19+
- "Delivered measurable business outcomes: data infrastructure enabling a startup's **$6M Series A**, unblocking **investor audits for Advent International**, ensuring enterprise compliance for a Fortune 500 client, and automating workflows to save **10-15 FTEs**."
2020
technologies:
2121
- label: "Data Engineering"
2222
details: "SQL, Databricks, BigQuery, MS Fabric, PySpark, DuckDB, DBT, Airflow, Kafka, ADF, Delta Lake, Pandas, Presto/Trino"
2323
- label: "AI & LLM Ops"
24-
details: "OpenAI, Gemini, LangChain, RAG, Agentic AI (MCP, A2A), Fine-tuning (SFT/DPO), Vector DBs (Chroma/FAISS), Neo4j"
24+
details: "OpenAI, Gemini, LangChain, RAG, Agentic AI (MCP, A2A), Fine-tuning (SFT/DPO), Vector DBs (Chroma/Qdrant), Neo4j"
2525
- label: "Backend, DevOps & Viz"
2626
details: "Python, BASH, Git, FastAPI, PostgreSQL, MongoDB, AWS, Azure, GCP, Docker, K8s, Terraform, CI/CD, Power BI"
2727
experience:
@@ -36,7 +36,7 @@ cv:
3636
- "**Johnson & Johnson (Healthcare CDP)**: Architected scalable Treasure Data pipelines for regulated healthcare markets (JP/ANZ). Engineered complex transformation logic, including a 12-scenario SQL truth table (validated via TDD) to resolve hierarchical consent conflicts and a custom Python engine to overcome SFMC platform limitations, ensuring strict **GDPR/PII compliance**."
3737
- "**Wade Insight (Cloud Migration)**: Led enterprise migration from Azure Data Factory to **Microsoft Fabric Data Factory**, managing ARM template adaptation. Enhanced the SaaS platform with advanced continue-on-failure orchestration and automated health-check processes, reducing manual troubleshooting by 80%."
3838
- "**Prospexs (GenAI Product Engineering)**: Built an AI outreach platform (Python/FastAPI/MongoDB) backed by a **700M+ contact database**. Engineered a complex 4-entity personalization engine (Sender/Receiver × Human/Company) using OpenAI/Perplexity to generate bilingual, context-aware communications, **improving response rates by 45%**."
39-
- "**QxLab (LLM Infrastructure)**: Led a 4-person team fine-tuning Llama/Mistral models using Axolotl on **H100 clusters** to achieve **\\>95% accuracy** in a production DAG-based agentic system. Separately, architected a Terabyte-scale cascading deduplication pipeline (MinHash LSH + FAISS) to standardize datasets (SFT/DPO/RLHF). Later open-sourced as DatasetPipeline."
39+
- "**QxLab (LLM Infrastructure)**: Led a 4-person team fine-tuning Llama/Mistral models using Axolotl on **H100 clusters** to achieve **\\>95% accuracy** in a production DAG-based agentic system. Separately, architected a Terabyte-scale cascading deduplication pipeline (MinHash LSH + FAISS) to standardize datasets (SFT/DPO/RLHF). Later open-sourced as **DatasetPipeline (6.7k+ downloads)**."
4040
- "**CV Advisors (Performance Engineering)**: Replaced a 1900-line MSSQL procedure (iterative cursors) into vectorized logic (**DuckDB**/Pandas), processing 80M+ records across 150 clients. **Reduced runtime from 27.5h to \\<5s (\\>99.9% gain)**. Engineered a future-proof architecture capable of scaling beyond RAM limits via memory-optimized categorical typing and DuckDB’s out-of-core processing."
4141
# - "**Logical Contract (Legal Tech)**: Implemented an AI-powered legal tech system for generating tailored employment agreements and a legal chatbot for startup inquiries."
4242
- "**SlideNinja (GenAI RAG Architecture)**: Developed a GenAI RAG platform (LangChain/ChromaDB) for a **McKinsey partner** during the early GenAI boom, featuring a 'Self-Healing' AI orchestration layer and proprietary 'Geometric Layout Analysis' to map unstructured content into rigid corporate PowerPoint templates."
@@ -59,9 +59,9 @@ cv:
5959
start_date: "2018-07"
6060
end_date: "2022-06"
6161
highlights:
62-
- "**CGPA:** 8.57/10 ([View Certificate](https://drive.google.com/file/d/1xX8XtAlEEnXSgkcJ3jYmYSJOQWRylA6S/view))"
62+
- "**CGPA:** 8.57/10 ([B.Tech Certificate](https://drive.google.com/file/d/1xX8XtAlEEnXSgkcJ3jYmYSJOQWRylA6S/view))"
6363
- "**Google DSC Lead (2020-2021):** Led developer community, organized workshops and tech talks. [Google Dev Profile](https://g.dev/subhayu99)"
64-
- "**CodeChef Chapter Lead (2020-2021):** Hosted 4 competitive programming events (4 to 21 students). [View Certificate](https://drive.google.com/file/d/1NYokFweEGTAfgL3sZO6qeEiyC6HD9pQ7/view)"
64+
- "**CodeChef Chapter Lead (2020-2021):** Hosted 4 competitive programming events (4 to 21 students). [CodeChef Certificate](https://drive.google.com/file/d/1NYokFweEGTAfgL3sZO6qeEiyC6HD9pQ7/view)"
6565
- "**Linux Foundation Scholarship Recipient:** Merit selection for LFS201 Linux System Administration. [Credly Profile](https://www.credly.com/users/subhayu99/badges)"
6666
professional_projects:
6767
# Website-only projects (show_on_resume: false)
@@ -172,52 +172,55 @@ cv:
172172
- "Created and managed CI/CD pipelines on Azure DevOps, impacting over 50k daily users."
173173

174174
personal_projects:
175-
# TODO: Need to update the downloads count for each as they have grown quite a bit now
176175
# Resume + Website projects
177176
- name: "SQLStream"
178177
date: "Nov 2025"
179178
show_on_resume: true
180179
highlights:
181-
- "Created a **zero-setup SQL tool** to quickly analyze data files (CSV, Parquet, Markdown, HTML, etc.) via a CLI or Python library, streamlining repetitive data quality checks without database overhead."
180+
- "Created a **zero-setup SQL tool** to quickly analyze data files (CSV, Parquet, Markdown, HTML) via CLI or Python, streamlining repetitive data quality checks without DB overhead."
182181
- "Built a powerful interactive shell with syntax highlighting and history for data exploration, alongside a standard query mode for piping results into other terminal tools."
183-
- "Engineered a flexible system that auto-switches between **DuckDB**, Pandas, or native Python to execute queries, implementing smart optimizations to filter data before loading; published on [PyPI](https://pypi.org/project/sqlstream) with full [documentation](https://subhayu99.github.io/sqlstream)."
184-
- name: "DatasetPipeline"
185-
date: "May 2025"
186-
show_on_resume: true
187-
highlights:
188-
- "Developed a **production-ready CLI tool** for transforming messy datasets into ML-ready formats; supports **SFT, DPO, semantic deduplication**, and quality analysis with plugin architecture."
189-
- "Features smart role mapping, auto-formatting for OpenAI-style training, and reproducible workflows via YAML/JSON configuration for enterprise ML pipelines."
190-
- "Published on [PyPI](https://pypi.org/project/datasetpipeline) and open-sourced on [GitHub](https://github.com/subhayu99/datasetpipeline) with extensible architecture for custom loaders, formatters, and analyzers."
182+
- "Engineered a flexible engine that auto-switches between **DuckDB**, Pandas, or native Python to execute queries; published on [PyPI](https://pypi.org/project/sqlstream) (**3.4k+ downloads**) with full [documentation](https://subhayu99.github.io/sqlstream)."
183+
191184
- name: "Smart Commit"
192185
date: "May 2025"
193186
show_on_resume: true
194187
highlights:
195-
- "Built an AI-powered CLI tool using Python/Typer that generates context-aware git commits via OpenAI or Anthropic models, enhancing developer productivity and project history quality."
196-
- "Intelligently adapts to different projects by analyzing the existing tech stack, commit history, and file changes, ensuring contextually relevant and consistent messages."
197-
- "Published on [PyPI](https://pypi.org/project/smart-commit-ai) and [GitHub](https://github.com/subhayu99/smart-commit) with **Model Context Protocol (MCP) server** for direct AI assistant integration."
188+
- "Built an AI-powered CLI tool using Python/Typer that generates context-aware git commits via OpenAI or Anthropic models, enhancing developer productivity."
189+
- "Intelligently adapts to different projects by analyzing the existing tech stack, commit history, and file changes, ensuring contextually relevant messages."
190+
- "Published on [PyPI](https://pypi.org/project/smart-commit-ai) (**5k+ downloads**) and [GitHub](https://github.com/subhayu99/smart-commit) with **Model Context Protocol (MCP)** integration."
191+
198192
- name: "DocumentAccessPOC"
199193
date: "Jan 2025"
200194
show_on_resume: true
201195
highlights:
202-
- "Designed and **built a zero-trust secure document system** to solve granular access control challenges where traditional RBAC/ACLs fail, ensuring data confidentiality even from system administrators."
203-
- "Implemented a robust cryptographic model featuring end-to-end encryption (AES-GCM) and secure key exchange (RSA) to enforce permissions at a data level, not just application logic."
196+
- "Designed and **built a zero-trust secure document system** to solve granular access control challenges where traditional RBAC/ACLs fail."
197+
- "Implemented a robust cryptographic model featuring **end-to-end encryption (AES-GCM)** and secure key exchange (RSA) to enforce permissions at a data level."
204198
- "Built a FastAPI interface for secure document sharing and revocation; project is on [GitHub](https://github.com/subhayu99/DocumentAccessPOC) with detailed [documentation](https://subhayu99.github.io/DocumentAccessPOC/)."
205199

206200
# Website-only personal projects
201+
- name: "DatasetPipeline"
202+
date: "May 2025"
203+
show_on_resume: false # <--- FALSE because it's already in QxLab Experience
204+
highlights:
205+
- "Developed a **production-ready CLI tool** for transforming messy datasets into ML-ready formats; supports **SFT, DPO, semantic deduplication**, and quality analysis."
206+
- "Features smart role mapping, auto-formatting for OpenAI-style training, and reproducible workflows via YAML/JSON configuration for enterprise ML pipelines."
207+
- "Published on [PyPI](https://pypi.org/project/datasetpipeline) (**6.7k+ downloads**) and open-sourced on [GitHub](https://github.com/subhayu99/datasetpipeline) with extensible architecture for custom loaders, formatters, and analyzers."
208+
207209
- name: "creatree"
208210
date: "Feb 2025"
209211
show_on_resume: false
210212
highlights:
211-
- "Developed a Python CLI tool and library for automating directory structure creation from tree-like strings, solving the repetitive manual setup of project scaffolding and templates."
212-
- "Designed for maximum flexibility with both a Python library for programmatic integration and a CLI supporting stdin/pipes, including a unique feature to inject setup comments into new files."
213-
- "Published on [PyPI](https://pypi.org/project/creatree) and open-sourced on [GitHub](https://github.com/subhayu99/creatree) with both programmatic API and CLI interfaces supporting stdin for flexible scripting workflows."
213+
- "Developed a Python CLI tool and library for automating directory structure creation from tree-like strings, solving repetitive project scaffolding."
214+
- "Designed for maximum flexibility with both a Python library for programmatic integration and a CLI supporting stdin/pipes."
215+
- "Published on [PyPI](https://pypi.org/project/creatree) (**4.3k+ downloads**) and open-sourced on [GitHub](https://github.com/subhayu99/creatree)."
216+
214217
- name: "BetterPassphrase"
215218
date: "Jan 2024"
216219
show_on_resume: false
217220
highlights:
218-
- "Built a Python CLI tool and library for generating secure, memorable passphrases using grammatically correct phrases with customizable word counts, separators, and capitalization options."
219-
- "Implemented probabilistic security analysis with entropy calculations and parts-of-speech parsing to balance memorability with cryptographic strength for authentication systems."
220-
- "Published on [PyPI](https://pypi.org/project/BetterPassphrase) and open-sourced on [GitHub](https://github.com/subhayu99/BetterPassphrase) with comprehensive CLI interface supporting batch generation and file output for enterprise workflows."
221+
- "Built a Python CLI tool/library for generating secure, memorable passphrases using grammatically correct phrases with customizable word counts."
222+
- "Implemented probabilistic security analysis with entropy calculations and parts-of-speech parsing to balance memorability with cryptographic strength."
223+
- "Published on [PyPI](https://pypi.org/project/BetterPassphrase) (**7.6k+ downloads**) and open-sourced on [GitHub](https://github.com/subhayu99/BetterPassphrase) supporting batch generation."
221224
- name: "FINADICT - Financial Prediction App"
222225
date: "Sep 2021"
223226
show_on_resume: false

0 commit comments

Comments
 (0)