vamsikrishna-dev thevamsikrishna

🍍 The Engineer

class VamsiKrishna:
    """
    Senior Data Engineer · 5+ years · India 🇮🇳
    Building the pipelines that move the world's data.
    """

    stack = {
        "cloud"      : ["Azure (primary)", "GCP (secondary)", "AWS (tertiary)"],
        "processing" : ["PySpark", "Apache Beam", "Kafka", "Delta Lake"],
        "azure"      : ["ADF", "Databricks", "Synapse", "ADLS Gen2"],
        "gcp"        : ["BigQuery", "Bigtable", "Cloud Composer"],
        "devops"     : ["GitHub Actions", "Docker", "Terraform"],
        "languages"  : ["Python", "SQL", "Scala"],
    }

    certifications = [
        "DP-203 · Azure Data Engineer Associate ✅",
        "AZ-900 · Azure Fundamentals ✅",
    ]

    currently    = "Building enterprise-grade data platforms & open-source DE tooling"
    targeting    = ["Microsoft", "Google", "Databricks", "Walmart Global Tech", "PhonePe", "Flipkart"]
    available    = True   # 🍍 Open to senior roles at product-based companies
    signature    = "🍍"   # Because every great pipeline deserves a mark

🏆 Certifications

Microsoft Certified Azure Data Engineer Associate ✅ Active

Microsoft Certified Azure Fundamentals ✅ Active

⚙️ Technical Arsenal

☁️ Cloud Platforms

🔥 Processing & Streaming

🏗️ Azure Ecosystem

🌐 GCP Ecosystem

💻 Languages & Storage

🔧 DevOps & Orchestration

🍍 Signature Projects

Each pipeline is a statement. Each architecture, a craft.

🔁 adf-regression-automation

Enterprise ADF Pipeline Regression Framework

Automated regression testing for Azure Data Factory pipelines with GitHub Actions CI/CD, Markdown summary generation, and Databricks notebook validation. Cuts manual QA effort by 80%.

ADF Databricks Python GitHub Actions

⚡ bigquery-bigtable-pipeline

High-Throughput BQ → Bigtable at Scale

Multi-column-family pipeline with composite row key support via JSON mapping. Built first in PySpark, then migrated to Apache Beam / Dataflow for serverless scale.

BigQuery Bigtable Apache Beam PySpark

🏛️ pyspark-scd-framework

Production SCD Type 1/2 Framework

SHA-256-based change detection engine with full CDC pattern support. Delta Lake ACID guarantees with optimized merge strategies for petabyte-scale warehouses.

PySpark Delta Lake SCD CDC SQL

🎯 dynamic-pyspark-etl

YAML-Driven Enterprise ETL Framework

Zero-code pipeline configuration. Converts hardcoded PySpark SQL into a declarative, YAML-driven execution engine. Drop in a config, get a production pipeline.

PySpark YAML Delta Lake ADF

⭐ awesome-data-engineer-prep

The Data Engineer's Interview Bible

Curated interview questions, system design breakdowns, PySpark internals, SQL patterns, and real Infosys / product-company question sets. Used by engineers preparing for Microsoft, Google, Databricks.

PySpark SQL System Design Azure GCP Interview Prep

📊 GitHub Analytics

🎯 2026 — The Roadmap

Goal	Status
🏗️ Build open-source PySpark Data Quality Framework	🔄 In Progress
🌿 Merge first PR into Apache Spark / Delta Lake	📋 Planned
🤖 Release AI-powered ETL orchestration tooling	📋 Planned
🏆 Earn Google Professional Data Engineer certification	📋 Planned
📝 Publish 12+ technical posts on LinkedIn & Dev.to	🔄 In Progress
⭐ Reach 500+ GitHub stars across repositories	🔄 11 → 500

🤝 Let's Build Something

"Every great data platform starts with one well-designed pipeline."

— V.K. 🍍

Open to senior Data Engineering roles at product-based companies. Building in public. One pipeline at a time. 🍍

Provide feedback

Saved searches

Use saved searches to filter your results more quickly