Vishal Gunjal vishalgunjalSWE

Site Reliability Engineer | Platform Engineering | Distributed Systems

Early-career engineer with senior-level systems thinking. Building production-grade cloud platforms demonstrating reliability, observability, and automation principles used by Google, Netflix, and Uber.

Current: Production-grade Kubernetes platforms, SRE observability, GitOps at scale
Approach: Build systems that teach industry patterns, not tutorials
Philosophy: Infrastructure should be boring (reliable), not exciting (breaking)

💼 Open to opportunities: DevOps Engineer | SRE | Platform Engineer | Cloud Engineer
📍 Location: Pune, India (Open to Remote & Relocation)

🚀 What I Do

I specialize in cloud-native infrastructure and platform engineering, with hands-on experience building:

Microservices platforms on AWS EKS with event-driven architecture (RabbitMQ, Kafka)
Infrastructure as Code using Terraform with modular, reusable patterns
GitOps workflows with ArgoCD for declarative, drift-free deployments
Full observability stacks (Prometheus, Grafana, ELK, Jaeger)
DevSecOps pipelines with automated security scanning (SonarQube, Trivy)

💼 Current Focus

🔨 Building production-grade DevOps projects demonstrating enterprise patterns
📚 Deep-diving into Kubernetes (RBAC, Network Policies, Security, Operators)
🔐 Implementing DevSecOps practices (shift-left security, policy-as-code)
📊 Designing SRE observability systems (SLIs, SLOs, error budgets)
🤖 Exploring MLOps and AI-driven infrastructure automation

🧠 Engineering Philosophy (Borrowed from Google SRE)

Principle	What It Means	How I Apply It
Everything Fails	Design for failure, not success	Multi-AZ, circuit breakers, graceful degradation
Toil is the Enemy	Automate repetitive work	GitOps, drift detection, self-healing
Observability ≠ Monitoring	Understand unknowns	Distributed tracing, correlation IDs, SLOs
Security by Default	Zero trust	RBAC, Network Policies, no hardcoded secrets
Error Budgets	Balance velocity and reliability	SLI/SLO tracking, controlled risk

I don't believe in:

Manual deployments ("works on my machine" syndrome)
Infrastructure without monitoring
Code without tests or automation without guardrails

🛠️ Tech Stack

AWS	Azure	GCP	Kubernetes	Docker	Terraform
Ansible	Helm	Linux	Jenkins	GitHub Actions	ArgoCD
Grafana	Prometheus	ELK	Go	Python	Git

🌟 What Sets Me Apart (Early Career with Senior Thinking)

1. I Think in Systems, Not Tools

Most engineers: "I know Docker, Kubernetes, Terraform"

Me: "I understand distributed systems failure modes and design infrastructure that degrades gracefully. I use Kubernetes for declarative state reconciliation and self-healing, not because it's trendy."

2. I Design for Failure

Most engineers: "My app works in testing"

Me: "I've tested:

What happens when RabbitMQ goes down? (DLQ prevents message loss)
What if Redis crashes? (Cache-aside handles misses)
What if AWS loses an AZ? (Multi-AZ with auto-failover)"

3. I Document Decisions

Most engineers: "I built it"

Me: "I documented:

WHY I chose RabbitMQ over Kafka (trade-off analysis)
Architecture diagrams (system design)
Runbooks (production operations)
What I learned from failures"

🧠 Systems Thinking & Engineering

I explore the trade-offs in distributed systems, documenting my journey from "how it works" to "why it breaks."

"I'm fascinated by systems that scale, self-heal, and never go down."

📝 Part 10/10: The SRE Mindset- Engineering Systems That Do Not Depend on You — 16 Jan 2026
📝 DevSecOps: Engineering Security as a Non-Negotiable Quality Gate — 14 Jan 2026
📝 FinOps — How SREs Turn “Cost Centers” into “Efficiency Engines” — 10 Jan 2026
📝 GitOps at Scale: Why “Sync” is the New “Apply” — Architecting a Self-Healing Multi-Cluster Platform — 07 Jan 2026
📝 Networking — The SRE’s Guide to the 504 Gateway Timeout — 04 Jan 2026

I document the "why" behind my code — deep dives into Engineering Systems, FinOps, Scalability, and SRE practices.

🌐 Community Involvement

Active Participation:

Google Developer Group (GDG) Pune - Cloud-native discussions, hands-on labs
CNCF Community - Kubernetes, service mesh, observability
AWS User Group Pune - Best practices, architecture patterns
Atlassian Community Pune - CI/CD, DevOps automation

🎯 2026 Goals

Technical:

✅ Build 4 production-grade cloud platforms (End-to-End)
🔄 Contribute to CNCF projects (Kubernetes, Prometheus, ArgoCD)
📚 Deep-dive into Kubernetes operators and CRDs
🔐 Master service mesh (Istio/Linkerd) and zero-trust networking
🤖 Explore MLOps and infrastructure for ML workloads

Professional:

📝 Publish 25+ in-depth technical articles
🎤 Present at CNCF Pune and AWS User Group
💼 Land first DevOps/SRE role as a early-career engineer
🌟 Contribute to open-source (Kubernetes, Terraform providers, Helm charts)

Learning:

📖 Complete AWS DevOps Professional certification
📖 Complete CKA and CKS certifications before 2027
📖 Study distributed systems papers (Raft, Paxos, CAP theorem)

💡 Questions That Keep Me Up at Night

→ How does Kubernetes handle split-brain in etcd?
→ What's the optimal error budget for a new service?
→ How do you design alerts that don't cause fatigue?
→ What's the CAP theorem trade-off in my architecture?
→ How would Netflix design this system?
→ What's the failure mode I haven't considered?

I don't just want to use tools. I want to understand the engineering decisions behind them.

Currently Reading:

📖 Site Reliability Engineering (Google SRE Book)
📖 Designing Data-Intensive Applications (Martin Kleppmann)
📖 Kubernetes Patterns (Bilgin Ibryam)
📖 Raft Consensus Paper (understanding distributed systems)

📈 Contribution Graph

⚡ "Automation is not about replacing humans, it's about freeing them to do what they do best."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly