class VamsiKrishna:
"""
Senior Data Engineer · 5+ years · India 🇮🇳
Building the pipelines that move the world's data.
"""
stack = {
"cloud" : ["Azure (primary)", "GCP (secondary)", "AWS (tertiary)"],
"processing" : ["PySpark", "Apache Beam", "Kafka", "Delta Lake"],
"azure" : ["ADF", "Databricks", "Synapse", "ADLS Gen2"],
"gcp" : ["BigQuery", "Bigtable", "Cloud Composer"],
"devops" : ["GitHub Actions", "Docker", "Terraform"],
"languages" : ["Python", "SQL", "Scala"],
}
certifications = [
"DP-203 · Azure Data Engineer Associate ✅",
"AZ-900 · Azure Fundamentals ✅",
]
currently = "Building enterprise-grade data platforms & open-source DE tooling"
targeting = ["Microsoft", "Google", "Databricks", "Walmart Global Tech", "PhonePe", "Flipkart"]
available = True # 🍍 Open to senior roles at product-based companies
signature = "🍍" # Because every great pipeline deserves a mark|
Microsoft Certified Azure Data Engineer Associate ✅ Active |
Microsoft Certified Azure Fundamentals ✅ Active |
Each pipeline is a statement. Each architecture, a craft.
|
Enterprise ADF Pipeline Regression Framework Automated regression testing for Azure Data Factory pipelines with GitHub Actions CI/CD, Markdown summary generation, and Databricks notebook validation. Cuts manual QA effort by 80%.
|
High-Throughput BQ → Bigtable at Scale Multi-column-family pipeline with composite row key support via JSON mapping. Built first in PySpark, then migrated to Apache Beam / Dataflow for serverless scale.
|
|
Production SCD Type 1/2 Framework SHA-256-based change detection engine with full CDC pattern support. Delta Lake ACID guarantees with optimized merge strategies for petabyte-scale warehouses.
|
YAML-Driven Enterprise ETL Framework Zero-code pipeline configuration. Converts hardcoded PySpark SQL into a declarative, YAML-driven execution engine. Drop in a config, get a production pipeline.
|
|
The Data Engineer's Interview Bible Curated interview questions, system design breakdowns, PySpark internals, SQL patterns, and real Infosys / product-company question sets. Used by engineers preparing for Microsoft, Google, Databricks.
|
|
| Goal | Status |
|---|---|
| 🏗️ Build open-source PySpark Data Quality Framework | 🔄 In Progress |
| 🌿 Merge first PR into Apache Spark / Delta Lake | 📋 Planned |
| 🤖 Release AI-powered ETL orchestration tooling | 📋 Planned |
| 🏆 Earn Google Professional Data Engineer certification | 📋 Planned |
| 📝 Publish 12+ technical posts on LinkedIn & Dev.to | 🔄 In Progress |
| ⭐ Reach 500+ GitHub stars across repositories | 🔄 11 → 500 |
"Every great data platform starts with one well-designed pipeline."
— V.K. 🍍
Open to senior Data Engineering roles at product-based companies. Building in public. One pipeline at a time. 🍍
