|
41 | 41 | start_date: "2022-06" |
42 | 42 | summary: "Delivered end-to-end data and AI solutions for 10+ enterprise clients across healthcare, finance, logistics, and tech sectors. Responsibilities extended beyond technical delivery: conducted 100+ technical interviews, led client onboarding and scoping calls, created PoCs/demos, mentored teammates, and drove adoption of emerging technologies (DuckDB, MCP) across the organization.\\ **Client Engagements:**" |
43 | 43 | highlights: |
44 | | - - "**SBGC Group (Retail Analytics)**: Took ownership of a multi-region Azure/Databricks platform integrating SAP/Shopify/Klaviyo/GA4 to feed Hightouch CDP. Unblocked Advent International due diligence by reverse-engineering Shopify's logic to cut revenue variance from ~10% to \\<0.22%. Root-caused a critical legacy data explosion (slashing volume 99%) and delivered an inventory dashboard worth ~$250k." |
45 | | - - "**AGY Logistics (IoT & Automation)**: Sole architect of a GDP-compliant cold chain system on GCP (BigQuery, Cloud Run). Automated complex manual workflows previously requiring 10-15 US-based operations staff, resulting in massive OpEx savings. Engineered smart time-series merging and statistical analysis on a **4-layer medallion architecture (including a 'Diamond Layer' for sub-second alerts)** to reduce false positives by 70%." |
46 | | - - "**Johnson & Johnson (Healthcare CDP)**: Architected scalable Treasure Data pipelines for regulated healthcare markets (JP/ANZ). Engineered complex transformation logic, including a **12-scenario SQL truth table to resolve hierarchical consent conflicts** and a custom Python engine to overcome SFMC platform limitations, ensuring strict GDPR/PII compliance." |
47 | | - - "**Wade Insight (Cloud Migration)**: Led enterprise migration from ADF to Microsoft Fabric Data Factory, managing ARM template adaptation. Enhanced the SaaS platform with advanced `continue_on_failure` orchestration and automated health-check processes, reducing manual troubleshooting by 80%." |
48 | | - - "**Prospexs (GenAI Product Engineering)**: Built an AI outreach platform (Python/FastAPI/MongoDB) featuring a **complex 4-entity personalization engine** (Sender/Receiver × Human/Company). Integrated OpenAI/Perplexity APIs to generate bilingual, context-aware communications, improving response rates by 45%." |
49 | | - - "**QxLab (LLM Infrastructure)**: Led a 4-person team fine-tuning Llama/Mistral models using Axolotl on H100 clusters to achieve \\>95% accuracy in production agentic system. Separately, architected a Terabyte-scale **cascading deduplication pipeline (MinHash LSH + FAISS)** to standardize datasets (SFT/DPO/RLHF), solving critical bottlenecks. Later released as the open-source project (DatasetPipeline)." |
| 44 | + - "**SBGC Group (Retail Analytics)**: Took ownership of a multi-region Azure/Databricks platform integrating SAP/Shopify/Klaviyo/GA4 to feed Hightouch CDP. Unblocked Advent International due diligence by reverse-engineering Shopify's logic to cut revenue variance from ~10% to \\<0.22%. Root-caused a critical legacy data explosion (slashing volume by 99%) and delivered an inventory dashboard worth ~$250k." |
| 45 | + - "**AGY Logistics (IoT & Automation)**: Sole architect of a GDP-compliant cold chain system on GCP (BigQuery, Cloud Run). Automated complex manual workflows previously requiring 10-15 US-based operations staff, resulting in massive OpEx savings. Engineered smart time-series merging and statistical analysis on a 4-layer medallion architecture (including a 'Diamond Layer' for sub-second alerts) to reduce false positives by 70%." |
| 46 | + - "**Johnson & Johnson (Healthcare CDP)**: Architected scalable Treasure Data pipelines for regulated healthcare markets (JP/ANZ). Engineered complex transformation logic, including a 12-scenario SQL truth table to resolve hierarchical consent conflicts and a custom Python engine to overcome SFMC platform limitations, ensuring strict GDPR/PII compliance." |
| 47 | + - "**Wade Insight (Cloud Migration)**: Led enterprise migration from ADF to Microsoft Fabric Data Factory, managing ARM template adaptation. Enhanced the SaaS platform with advanced continue-on-failure orchestration and automated health-check processes, reducing manual troubleshooting by 80%." |
| 48 | + - "**Prospexs (GenAI Product Engineering)**: Built an AI outreach platform (Python/FastAPI/MongoDB) featuring a complex 4-entity personalization engine (Sender/Receiver × Human/Company). Integrated OpenAI/Perplexity APIs to generate bilingual, context-aware communications, improving response rates by 45%." |
| 49 | + - "**QxLab (LLM Infrastructure)**: Led a 4-person team fine-tuning Llama/Mistral models using Axolotl on H100 clusters to achieve \\>95% accuracy in production agentic system. Separately, architected a Terabyte-scale cascading deduplication pipeline (MinHash LSH + FAISS) to standardize datasets (SFT/DPO/RLHF), solving critical bottlenecks. Later released as the open-source project (DatasetPipeline)." |
50 | 50 | - "**CV Advisors (Performance Engineering)**: Replaced a 1900-line legacy SQL procedure (iterative cursors) into vectorized logic (DuckDB/Pandas), processing 80M+ records across 150 clients. Reduced runtime from 27.5h to \\<5s (\\>99.9% gain). Engineered a future-proof architecture capable of scaling beyond RAM limits via memory-optimized categorical typing and DuckDB’s out-of-core processing." |
51 | 51 | # - "**Logical Contract (Legal Tech)**: Implemented an AI-powered legal tech system for generating tailored employment agreements and a legal chatbot for startup inquiries." |
52 | | - - "**SlideNinja (GenAI RAG Architecture)**: Developed a GenAI RAG platform **(LangChain/ChromaDB)** for a McKinsey partner during the early GenAI boom, featuring a 'Self-Healing' AI orchestration layer and proprietary 'Geometric Layout Analysis' to map unstructured content into rigid corporate PowerPoint templates." |
| 52 | + - "**SlideNinja (GenAI RAG Architecture)**: Developed a GenAI RAG platform (LangChain/ChromaDB) for a McKinsey partner during the early GenAI boom, featuring a 'Self-Healing' AI orchestration layer and proprietary 'Geometric Layout Analysis' to map unstructured content into rigid corporate PowerPoint templates." |
53 | 53 | - "**LoopKitchen (Real-time Data Ingestion)**: Built core data architecture (GCP/BigQuery/FastAPI) enabling a $6M Series A. Migrated fragile legacy scrapers to robust official API integrations (UberEats/DoorDash/Grubhub) for millions of orders and engineered an automated dispute resolution system to directly recover lost revenue." |
54 | 54 |
|
55 | 55 | - company: "FiftyFive Technologies" |
|
186 | 186 | date: "Nov 2025" |
187 | 187 | show_on_resume: true |
188 | 188 | highlights: |
189 | | - - "Created a zero-setup SQL tool to quickly analyze data files (CSV, Parquet, Markdown, HTML, etc.) via a **CLI** or **Python library**, streamlining repetitive data quality checks without database overhead." |
190 | | - - "Built a powerful **interactive shell** with syntax highlighting and history for data exploration, alongside a standard query mode for piping results into other terminal tools." |
191 | | - - "Engineered a flexible system that auto-switches between **DuckDB**, **Pandas**, or native Python to execute queries, implementing smart optimizations to filter data before loading; published on [PyPI](https://pypi.org/project/sqlstream) with full [documentation](https://subhayu99.github.io/sqlstream)." |
| 189 | + - "Created a zero-setup SQL tool to quickly analyze data files (CSV, Parquet, Markdown, HTML, etc.) via a CLI or Python library, streamlining repetitive data quality checks without database overhead." |
| 190 | + - "Built a powerful interactive shell with syntax highlighting and history for data exploration, alongside a standard query mode for piping results into other terminal tools." |
| 191 | + - "Engineered a flexible system that auto-switches between DuckDB, Pandas, or native Python to execute queries, implementing smart optimizations to filter data before loading; published on [PyPI](https://pypi.org/project/sqlstream) with full [documentation](https://subhayu99.github.io/sqlstream)." |
192 | 192 | - name: "DatasetPipeline" |
193 | 193 | date: "May 2025" |
194 | 194 | show_on_resume: true |
|
0 commit comments