Skip to content
View thevamsikrishna's full-sized avatar

Block or report thevamsikrishna

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
thevamsikrishna/README.md

Typing SVG


Profile Views   LinkedIn   Portfolio   GitHub



🍍  The Engineer

class VamsiKrishna:
    """
    Senior Data Engineer · 5+ years · India 🇮🇳
    Building the pipelines that move the world's data.
    """

    stack = {
        "cloud"      : ["Azure (primary)", "GCP (secondary)", "AWS (tertiary)"],
        "processing" : ["PySpark", "Apache Beam", "Kafka", "Delta Lake"],
        "azure"      : ["ADF", "Databricks", "Synapse", "ADLS Gen2"],
        "gcp"        : ["BigQuery", "Bigtable", "Cloud Composer"],
        "devops"     : ["GitHub Actions", "Docker", "Terraform"],
        "languages"  : ["Python", "SQL", "Scala"],
    }

    certifications = [
        "DP-203 · Azure Data Engineer Associate ✅",
        "AZ-900 · Azure Fundamentals ✅",
    ]

    currently    = "Building enterprise-grade data platforms & open-source DE tooling"
    targeting    = ["Microsoft", "Google", "Databricks", "Walmart Global Tech", "PhonePe", "Flipkart"]
    available    = True   # 🍍 Open to senior roles at product-based companies
    signature    = "🍍"   # Because every great pipeline deserves a mark


🏆  Certifications

Microsoft Certified Azure Data Engineer Associate ✅  Active

Microsoft Certified Azure Fundamentals ✅  Active



⚙️  Technical Arsenal

☁️  Cloud Platforms

Azure GCP AWS

🔥  Processing & Streaming

Apache Spark PySpark Apache Kafka Apache Beam

🏗️  Azure Ecosystem

Azure Data Factory Azure Databricks Delta Lake Azure Synapse ADLS Gen2

🌐  GCP Ecosystem

BigQuery Bigtable Cloud Composer Dataflow

💻  Languages & Storage

Python SQL Scala Delta Lake

🔧  DevOps & Orchestration

GitHub Actions Docker Terraform Apache Airflow



🍍  Signature Projects

Each pipeline is a statement. Each architecture, a craft.


Enterprise ADF Pipeline Regression Framework

Automated regression testing for Azure Data Factory pipelines with GitHub Actions CI/CD, Markdown summary generation, and Databricks notebook validation. Cuts manual QA effort by 80%.

ADF  Databricks  Python  GitHub Actions

Build

High-Throughput BQ → Bigtable at Scale

Multi-column-family pipeline with composite row key support via JSON mapping. Built first in PySpark, then migrated to Apache Beam / Dataflow for serverless scale.

BigQuery  Bigtable  Apache Beam  PySpark

Production SCD Type 1/2 Framework

SHA-256-based change detection engine with full CDC pattern support. Delta Lake ACID guarantees with optimized merge strategies for petabyte-scale warehouses.

PySpark  Delta Lake  SCD  CDC  SQL

YAML-Driven Enterprise ETL Framework

Zero-code pipeline configuration. Converts hardcoded PySpark SQL into a declarative, YAML-driven execution engine. Drop in a config, get a production pipeline.

PySpark  YAML  Delta Lake  ADF

The Data Engineer's Interview Bible

Curated interview questions, system design breakdowns, PySpark internals, SQL patterns, and real Infosys / product-company question sets. Used by engineers preparing for Microsoft, Google, Databricks.

PySpark  SQL  System Design  Azure  GCP  Interview Prep

Stars   Forks



📊  GitHub Analytics

 

GitHub Streak



🎯  2026 — The Roadmap

Goal Status
🏗️  Build open-source PySpark Data Quality Framework 🔄 In Progress
🌿  Merge first PR into Apache Spark / Delta Lake 📋 Planned
🤖  Release AI-powered ETL orchestration tooling 📋 Planned
🏆  Earn Google Professional Data Engineer certification 📋 Planned
📝  Publish 12+ technical posts on LinkedIn & Dev.to 🔄 In Progress
⭐  Reach 500+ GitHub stars across repositories 🔄 11 → 500


🤝  Let's Build Something


"Every great data platform starts with one well-designed pipeline."

— V.K. 🍍


LinkedIn   Portfolio   Email


Open to senior Data Engineering roles at product-based companies. Building in public. One pipeline at a time. 🍍



Pinned Loading

  1. awesome-data-engineer-prep awesome-data-engineer-prep Public

    End-to-end AI-powered Data Engineering preparation repository covering SQL, PySpark, Azure, AWS, GCP, Data Modeling, Pipelines, and real-world projects with interview-focused explanations.

    Python 12 8

  2. adf-regression-automation adf-regression-automation Public

    Enterprise-grade Azure Data Factory regression testing framework with automated validation, PySpark-based reconciliation, Databricks execution, and GitHub Actions CI/CD integration.

    Python 1

  3. bigquery-bigtable-pipeline bigquery-bigtable-pipeline Public

    A production-grade, TB-scale data pipeline that moves data from BigQuery to Cloud Bigtable using either Apache Beam (Dataflow) or PySpark (Dataproc), with configurable row key strategies, dead lett…

    Python 1

  4. dynamic-pyspark-etl dynamic-pyspark-etl Public

    Metadata-driven PySpark ETL framework that uses YAML-based configurations to build scalable, reusable, and enterprise-ready data pipelines with minimal code changes.

    Python 1

  5. portfolios portfolios Public

    Creative and interactive portfolio projects featuring cinematic UI, animations, AI-assisted designs, 3D assets, and modern responsive web experiences for developers and data engineers.

    HTML 1

  6. pyspark-scd-framework pyspark-scd-framework Public

    🔧 Built a production-grade PySpark SCD Framework from scratch — the kind that runs in real banking and retail data platforms.

    Python 1