|
4 | 4 |
|
5 | 5 |
|
6 | 6 | <!-- Introduction --> |
7 | | - |
8 | | -**DevOps Engineer | Site Reliability Engineering | Cloud-Native Platform Architecture** |
9 | | - |
10 | | -Building production-grade cloud platforms on AWS with Kubernetes, Terraform, and GitOps. Passionate about automation, observability, and system reliability. |
11 | | - |
| 7 | +**Site Reliability Engineer | Platform Engineering | Distributed Systems** |
| 8 | + |
| 9 | +> Early-career engineer with senior-level systems thinking. Building production-grade cloud platforms demonstrating reliability, observability, and automation principles used by Google, Netflix, and Uber. |
| 10 | +``` |
| 11 | +Current: Production-grade Kubernetes platforms, SRE observability, GitOps at scale |
| 12 | +Approach: Build systems that teach industry patterns, not tutorials |
| 13 | +Philosophy: Infrastructure should be boring (reliable), not exciting (breaking) |
| 14 | +``` |
12 | 15 | --- |
13 | 16 | **πΌ Open to opportunities:** DevOps Engineer | SRE | Platform Engineer | Cloud Engineer |
14 | 17 | **π Location:** Pune, India (Open to Remote & Relocation) |
@@ -60,20 +63,20 @@ I specialize in **cloud-native infrastructure** and **platform engineering**, wi |
60 | 63 |
|
61 | 64 | --- |
62 | 65 |
|
63 | | -## π Engineering Philosophy |
| 66 | +## π§ Engineering Philosophy (Borrowed from Google SRE) |
64 | 67 |
|
65 | | -**I believe in:** |
66 | | -- **Infrastructure as Code** - Everything reproducible, version-controlled, tested |
67 | | -- **GitOps Over ClickOps** - Declarative state, automated reconciliation |
68 | | -- **Observability First** - Metrics, logs, traces before production |
69 | | -- **Security by Default** - RBAC, Network Policies, zero-trust networking |
70 | | -- **SRE Principles** - SLIs/SLOs, error budgets, toil reduction |
| 68 | +| Principle | What It Means | How I Apply It | |
| 69 | +|-----------|---------------|----------------| |
| 70 | +| **Everything Fails** | Design for failure, not success | Multi-AZ, circuit breakers, graceful degradation | |
| 71 | +| **Toil is the Enemy** | Automate repetitive work | GitOps, drift detection, self-healing | |
| 72 | +| **Observability β Monitoring** | Understand unknowns | Distributed tracing, correlation IDs, SLOs | |
| 73 | +| **Security by Default** | Zero trust | RBAC, Network Policies, no hardcoded secrets | |
| 74 | +| **Error Budgets** | Balance velocity and reliability | SLI/SLO tracking, controlled risk | |
71 | 75 |
|
72 | 76 | **I don't believe in:** |
73 | | -- Manual deployments or "works on my machine" syndrome |
| 77 | +- Manual deployments ("works on my machine" syndrome) |
74 | 78 | - Infrastructure without monitoring |
75 | 79 | - Code without tests or automation without guardrails |
76 | | - |
77 | 80 | --- |
78 | 81 |
|
79 | 82 | ## π οΈ Tech Stack |
@@ -141,6 +144,38 @@ I specialize in **cloud-native infrastructure** and **platform engineering**, wi |
141 | 144 | </tr> |
142 | 145 | </table> |
143 | 146 |
|
| 147 | +--- |
| 148 | +## π What Sets Me Apart (Early Career with Senior Thinking) |
| 149 | + |
| 150 | +### **1. I Think in Systems, Not Tools** |
| 151 | + |
| 152 | +Most engineers: "I know Docker, Kubernetes, Terraform" |
| 153 | + |
| 154 | +Me: "I understand distributed systems failure modes and design infrastructure that degrades gracefully. I use Kubernetes for declarative state reconciliation and self-healing, not because it's trendy." |
| 155 | + |
| 156 | +--- |
| 157 | + |
| 158 | +### **2. I Design for Failure** |
| 159 | + |
| 160 | +Most engineers: "My app works in testing" |
| 161 | + |
| 162 | +Me: "I've tested: |
| 163 | +- What happens when RabbitMQ goes down? (DLQ prevents message loss) |
| 164 | +- What if Redis crashes? (Cache-aside handles misses) |
| 165 | +- What if AWS loses an AZ? (Multi-AZ with auto-failover)" |
| 166 | + |
| 167 | +--- |
| 168 | + |
| 169 | +### **3. I Document Decisions** |
| 170 | + |
| 171 | +Most engineers: "I built it" |
| 172 | + |
| 173 | +Me: "I documented: |
| 174 | +- WHY I chose RabbitMQ over Kafka (trade-off analysis) |
| 175 | +- Architecture diagrams (system design) |
| 176 | +- Runbooks (production operations) |
| 177 | +- What I learned from failures" |
| 178 | + |
144 | 179 | --- |
145 | 180 |
|
146 | 181 | <!-- ## π Technical Writing |
@@ -213,18 +248,23 @@ I explore the trade-offs in distributed systems, documenting my journey from "ho |
213 | 248 |
|
214 | 249 | --- |
215 | 250 |
|
216 | | -## π‘ What Drives Me |
217 | | - |
218 | | -I'm fascinated by **systems that scale, self-heal, and never go down.** |
219 | | - |
220 | | -Questions that keep me up at night: |
221 | | -- How does Kubernetes reconcile desired vs actual state? |
222 | | -- What trade-offs did AWS make in EKS networking design? |
223 | | -- How do Netflix and Google achieve 99.99% uptime? |
224 | | -- What's the right balance between consistency and availability? |
225 | | -- How do you design alerts that don't cause alert fatigue? |
226 | | - |
227 | | -**I don't just want to use tools - I want to understand how they work under the hood.** |
| 251 | +## π‘ Questions That Keep Me Up at Night |
| 252 | +``` |
| 253 | +β How does Kubernetes handle split-brain in etcd? |
| 254 | +β What's the optimal error budget for a new service? |
| 255 | +β How do you design alerts that don't cause fatigue? |
| 256 | +β What's the CAP theorem trade-off in my architecture? |
| 257 | +β How would Netflix design this system? |
| 258 | +β What's the failure mode I haven't considered? |
| 259 | +``` |
| 260 | + |
| 261 | +**I don't just want to use tools. I want to understand the engineering decisions behind them.** |
| 262 | + |
| 263 | +**Currently Reading:** |
| 264 | +- π Site Reliability Engineering (Google SRE Book) |
| 265 | +- π Designing Data-Intensive Applications (Martin Kleppmann) |
| 266 | +- π Kubernetes Patterns (Bilgin Ibryam) |
| 267 | +- π Raft Consensus Paper (understanding distributed systems) |
228 | 268 |
|
229 | 269 | --- |
230 | 270 |
|
|
0 commit comments