A pragmatic approach to cutting through dogma and making engineering choices that deliver real value
Version: 1.0a
Author: Jean Luck Sossmeier
Date: October 2025
Repository: github.com/jeansossmeier/worth-framework
- Introduction: The Problem with Technology Dogmas
- The WORTH Principle: Core Framework
- Decision Scorecard & Methodology
- Integration Points
- Anti-Patterns & Failure Modes
- Measurement & Validation
- Operationalizing the Framework
Software engineering is drowning in principles. KISS, DRY, YAGNI, SOLID, STUPID, CUPID—the list grows monthly. Each promises clarity, yet teams still waste months on Kubernetes setups they don't need, microservices that slow them down, and clean code rules that add more ceremony than value.
The problem isn't the principles themselves. The problem is treating them as universal laws rather than context-dependent tools.
Behind every architectural debate, every technology choice, every process adoption, one question gets skipped:
Is it worth it—here, now, for us?
Not "is it best practice?" Not "what does Big Tech do?" Not "what's trending on Hacker News?"
Worth it. In your context. With your team. On your budget. For your timeline.
Consider these common scenarios:
Language Wars: Developers argue whether Python, Go, or Rust is "best"—ignoring that the answer depends entirely on your domain, team skills, hiring market, and performance requirements.
Clean Code Extremism: Teams wrap every primitive in a Value Object because a book said so—adding hundreds of lines of boilerplate to a throwaway migration script.
"You Are Not Netflix": A three-person startup burns four weeks setting up Kubernetes because "it's how you scale"—never shipping the product that needed to prove market fit first.
Process Bloat: A seed-stage company adopts SAFe with two-week sprints and full ceremonies—while the market shifts weekly and competitors move daily.
Each of these decisions might be right somewhere. They're wrong when blindly applied without asking whether they're worth it in your specific situation.
WORTH is a meta-principle that sits above all other engineering guidelines. It doesn't replace KISS, DRY, or YAGNI—it helps you decide when to apply them.
Think of it as a five-question litmus test you run before adopting any technology, pattern, or practice:
- Weigh the problem
- Outcome > overhead
- Right-sized for the team
- Time-to-value
- Horizon flexibility
If you can't answer "yes" to at least three of these five questions with clear evidence, don't adopt it yet. Revisit when your context changes.
Tech Leads & Staff Engineers: Battling over-engineering and decision fatigue while trying to ship reliably.
CTOs & Founders: Controlling cloud spend, reducing talent churn, and ensuring technology choices support business goals.
Senior Developers: Needing guard-rails to push back on dogma without seeming obstinate.
Teams of Any Size: From solo developers to hundred-person engineering orgs—context-aware decision-making scales.
This isn't a book you read cover-to-cover and shelve. It's a working framework you return to every time you face a technology decision.
- Part I gives you the core principle and scoring methodology
- Part II expands the dimensions you need to consider
- Part III applies WORTH to specific domains (code, architecture, product)
- Part IV shows you how to operationalize it in pull requests, ADRs, and team rituals
- Part V provides extended case studies and guidance for specific contexts
If you're a solo developer or indie hacker, skip directly to the Solo Developer / Indie Hacker Mode section (embedded in Section 16) for simplified guidance tailored to one-person projects.
Start with the core framework in Section 2. Use the scorecard in Section 3 on your next decision. Refer back to domain-specific chapters as needed.
WORTH is a pragmatic litmus test for every architectural, process, or tooling choice.
| Letter | Question | Signals It's NOT Worth It |
|---|---|---|
| W – Weigh the problem | Is the pain concrete, frequent, and business-critical? | Buzz-driven adoption, vague "might need it later" |
| O – Outcome > overhead | Will this clearly improve customer or developer outcomes more than it costs in time, money, or complexity? | Long payback period, marginal uplift, hidden ops costs |
| R – Right-sized for the team | Can the current team build and run it confidently within working hours? | Reliance on "future hires", hero culture, steep learning curves |
| T – Time-to-value | Does it accelerate delivery now or within the quarter you're planning? | Multi-month migrations delaying revenue or feedback |
| H – Horizon flexibility | Does it keep sensible options open without locking you into tech or vendors? | Irreversible bets, vendor lock-in, explosive surface area |
You need BOTH of these to proceed:
- Qualitative check: Crisp, evidence-based "yes" on at least three of the five letters
- Quantitative check: Total score ≥ 15/25 (see detailed scoring in Section 3)
Why both checks matter:
- The "3 of 5" check prevents lukewarm "everything is a 3" scorecards
- The total threshold ensures sufficient overall value
- Strong yes on 3 dimensions might score [5,5,5,1,1] = 17 ✅ (passes both)
- Neutral on all five might score [3,3,3,3,3] = 15
⚠️ (passes total but lacks clear wins)
If either check fails, don't adopt it yet. Re-evaluate when context changes.
KISS tells you to keep solutions simple once you've decided to build them.
DRY avoids duplication inside the chosen solution.
YAGNI warns against speculative features.
WORTH precedes all of them: it ensures you're even solving the right problem at the right scale before optimizing its internal design.
Start by quantifying the pain you're trying to solve.
Key Questions:
- How many people or systems are affected?
- How frequently does this pain occur?
- What's the business impact? (Lost revenue, customer churn, developer hours wasted)
- Is this pain growing or stable?
Red Flags:
- "We might need this later" (speculative)
- "Everyone's talking about it" (hype-driven)
- "Best practice" without evidence
- Problem affects one person once
Good Signals:
- Daily friction costing measurable hours
- Customer-facing issue impacting retention
- Growing technical debt with clear cost trajectory
- Regulatory or security requirement
Example: A team debates adopting GraphQL. The current REST API requires frontend developers to make 5-7 calls to render a dashboard. This happens for every page load across 10,000 daily users. That's real, frequent, measurable pain. ✅
Counter-example: A team considers GraphQL because "it's flexible and modern." No evidence of current API causing problems. ❌
Calculate whether the benefit outweighs the total cost.
Key Questions:
- What metric improves? (Speed, quality, cost, developer experience)
- By how much?
- What's the implementation cost? (Calendar time, dollars, cognitive load)
- What's the ongoing cost? (Maintenance, operations, training)
Red Flags:
- Payback period > 1 year in a fast-changing environment
- Marginal improvements (5% faster build when builds take 30 seconds)
- Hidden operational costs not factored in
- "Hard to quantify" benefits without any proxy metric
Good Signals:
- Clear ROI calculation
- Benefit appears within current planning horizon
- Total cost of ownership (TCO) fully mapped
- Net improvement in DORA metrics (deployment frequency, lead time, MTTR, change fail rate)
Example: Migrating to a PaaS costs $400/month but saves 20 developer hours/month on infrastructure management. At $100/hour blended rate, that's $2,000 in value for $400 cost. Clear win. ✅
Regional Cost Note: This framework uses US market rates circa 2024-2025. Adjust for your region:
- US/Western Europe: 1.0x (baseline:
$100-150/hourblended rate) - SF Bay Area/NYC: 1.5x (
$150-225/hour) - Eastern Europe: 0.5x (
$50-75/hour) - India/South Asia: 0.3x (
$30-45/hour) - Latin America: 0.4x (
$40-60/hour)
Recalculate ROI using your actual rates to ensure accurate WORTH scoring.
Counter-example: Adding Kubernetes to reduce deployment time from 5 minutes to 3 minutes, costing 2 weeks of setup and ongoing complexity. Marginal gain, huge overhead. ❌
Assess whether your team can realistically build and maintain this.
Key Questions:
- Do at least two people on the team already understand this technology?
- Can the team operate it without heroic effort or on-call burnout?
- Does it fit within the team's cognitive load budget?
- Can you hire or train for gaps within your timeline?
Red Flags:
- "We'll hire someone who knows this"
- Single expert (bus factor of 1)
- Requires learning 3+ new technologies simultaneously
- Team already maxed out on complexity
Good Signals:
- Multiple team members have experience
- Technology fits existing mental models
- Clear documentation and community support
- Graceful learning curve
Example: A team of experienced Java developers considers Spring Boot vs. Micronaut. Both fit their skill set and mental models. ✅
Counter-example: Same Java team considers rewriting everything in Rust to "learn something new" despite zero Rust experience and aggressive ship dates. ❌
Determine if the investment delivers value within your planning horizon.
Key Questions:
- When do we see the first benefit?
- Does this accelerate or delay current commitments?
- What's the opportunity cost of not working on other features?
- Can we stage the rollout to get partial value sooner?
Red Flags:
- Multi-month migration blocking all feature work
- Benefit only appears after full completion (no incremental wins)
- Faster alternatives exist but are dismissed as "not best practice"
- Timeline keeps extending due to unforeseen complexity
Good Signals:
- Value delivered within current quarter
- Can be piloted with subset of users/services
- Improves team velocity starting now
- Clear milestones with measurable progress
Example: Adopting feature flags delivers value immediately—first feature uses flags for safe rollout within one sprint. ✅
Counter-example: Rewriting the entire app to microservices with no customer value until completion 8 months from now. ❌
Evaluate whether this choice keeps options open or creates lock-in.
Key Questions:
- How easy is it to reverse this decision?
- Does this vendor or technology have a healthy ecosystem?
- Can we migrate away without rewriting everything?
- Does this create an explosion of surface area (new services, configs, dependencies)?
Red Flags:
- Proprietary technology with no export path
- Architecture that requires rewrite to change
- Vendor with declining market share or community
- Creates dependencies across entire codebase
Good Signals:
- Standards-based approach (SQL, HTTP, JSON)
- Clean interfaces that allow swapping implementations
- Strong community, multiple vendors
- Reversible via feature flags or strangler pattern
Example: Using PostgreSQL (open standard, multiple hosts, dump/restore works everywhere) vs. proprietary cloud database with custom query language. PostgreSQL wins on flexibility. ✅
Counter-example: Building entire architecture around a startup vendor's beta SDK with no export functionality. ❌
Use this lightweight process for any technology, pattern, or practice decision.
Step 1: State the Issue (One Sentence)
Force clarity by writing a single sentence problem statement.
Example: "Release lead-time increased from 3 days to 9 days after adding the reporting service."
Step 2: Score Each Dimension (1-5)
| Dimension | Score | Evidence |
|---|---|---|
| W – Weigh the problem | 1-5 | How big, frequent, and critical is the pain? |
| O – Outcome > overhead | 1-5 | Does benefit clearly exceed cost? |
| R – Right-sized team | 1-5 | Can we build and run this confidently? |
| T – Time-to-value | 1-5 | Do we see value within current quarter? |
| H – Horizon flexibility | 1-5 | Can we reverse or adapt this choice? |
Scoring Guide:
- 5: Strong yes, clear evidence
- 4: Probably yes, good indicators
- 3: Neutral or mixed
- 2: Probably no, concerns outweigh benefits
- 1: Clear no, red flags present
Step 3: Calculate Total
Add the five scores:
- Total < 15: Postpone or kill
- 15-20: Prototype or pilot first
- 21-25: Proceed with confidence
Calibration Note: These thresholds are starting points based on typical team contexts. Adjust for your organization:
- Small teams (< 15 people): Consider lowering to 12+/17+/20+ as fewer concrete problems exist at small scale
- Large teams (50+ people): Consider raising to 17+/21+/24+ as more problems emerge at scale
- Risk-averse domains (fintech, healthcare): Raise thresholds; require higher confidence
- Fast-moving startups: Lower thresholds slightly; bias toward action
Re-calibrate quarterly based on decision outcomes. If >30% of "proceed" decisions fail, raise your thresholds.
Step 4: Document in ADR
Record your scoring in an Architecture Decision Record so the rationale is searchable and auditable when context changes.
Problem Statement: "We need to deploy our app somewhere and want it to scale."
| Dimension | Score | Evidence |
|---|---|---|
| W – Weigh | 2 | Problem is tiny—current single-server traffic is under 100 users/day |
| O – Outcome | 1 | Setup takes 2-4 weeks; benefit is handling 10x traffic we don't have |
| R – Right-sized | 1 | No Kubernetes experience on team; steep learning curve |
| T – Time-to-value | 1 | Delays MVP by 3+ weeks; could use PaaS and ship this week |
| H – Horizon | 3 | k8s is flexible but overkill creates lock-in to complexity |
Total: 8/25
Verdict: ❌ Skip. Use a PaaS (Heroku, Render, Railway) or simple VM with CI/CD. Re-evaluate at 10,000+ daily active users or when hiring experienced SRE.
Problem Statement: "Release collisions between teams slow deployment frequency."
| Dimension | Score | Evidence |
|---|---|---|
| W – Weigh | 4 | Three teams frequently blocked waiting for others' deploys |
| O – Outcome | 3 | Faster independent deploys, but ops overhead and latency increase |
| R – Right-sized | 3 | Team has two SREs with microservices experience |
| T – Time-to-value | 2 | Migration takes 4-6 months; could start with modules first |
| H – Horizon | 4 | Can phase in gradually; doesn't require full rewrite |
Total: 16/25
Verdict:
Problem Statement: "We duplicate the same query patterns in five different files."
| Dimension | Score | Evidence |
|---|---|---|
| W – Weigh | 2 | Annoyance but only affects 2-3 developers occasionally |
| O – Outcome | 3 | Reduces duplication, increases consistency; minimal maintenance |
| R – Right-sized | 5 | Team is SQL-savvy; this is a simple abstraction |
| T – Time-to-value | 4 | Can implement in one day; immediate benefit |
| H – Horizon | 4 | Easy to remove or replace if we change databases |
Total: 18/25
Verdict: ✅ Probably worth it. Low cost, raises consistency, reversible. Proceed if it doesn't block higher-priority work.
Add a "WORTH Check" section:
## WORTH Assessment (for non-trivial changes)
- [ ] **W** – What problem does this solve? Is it concrete and frequent?
- [ ] **O** – Does the benefit exceed the complexity added?
- [ ] **R** – Can the team maintain this without heroic effort?
- [ ] **T** – Does this accelerate or delay current goals?
- [ ] **H** – Is this decision reversible or does it create lock-in?
**Total Score**: __/25Include WORTH scoring in every ADR:
## Context
[Describe the problem]
## WORTH Analysis
| Dimension | Score | Rationale |
|-----------|-------|-----------|
| Weigh | 4 | Daily friction costing 5 dev hours/week |
| Outcome | 4 | Saves `$10k/mo`; costs `$2k` setup + `$500/mo` |
| Right-sized | 3 | Two engineers have experience; third will train |
| Time-to-value | 4 | First value in 2 weeks; full ROI in 3 months |
| Horizon | 5 | Standards-based; multiple vendors available |
**Total**: 20/25 → **Proceed**
## Decision
[Record the choice]
## Consequences
[Expected outcomes]
## Revisit Trigger
Re-score if: team shrinks below 2 experienced operators, cost increases 2x, or migration takes >4 weeksRotate a "devil's advocate" who:
- Picks one live component
- Re-scores it with current context
- Proposes simplification if score drops below 15
This catches accumulating complexity and prevents zombie technologies.
The five core WORTH questions capture 80% of decisions. For high-stakes choices, expand your evaluation with these additional lenses.
A choice can be "cheap" but still catastrophic if it postpones revenue or feedback.
Why It Matters: Every week you delay shipping is lost learning, lost revenue, and lost market position. Quantifying this forces trade-offs into the open.
How to Measure:
- Express benefit in dollars per month shipped earlier
- Use Don Reinertsen's formula: Cost of Delay = (Weekly Value) × (Weeks Delayed)
- Plot cash flow over 12-24 months to visualize payback
Example: A feature migration to microservices delays shipping by 3 months. If each month of delay costs $50k in lost revenue, the migration carries a hidden $150k cost even if engineering time is "free."
Diagnostic Questions:
- Can we express benefit in
$/month? - What's the opportunity cost of delaying other work?
- Is the ROI horizon longer than our funding runway?
High-skill tools are fine until cognitive load crosses what a single team can safely juggle.
Why It Matters: Teams have limited mental bandwidth. Each new technology, pattern, or service consumes attention that could go toward delivering features.
How to Measure:
- Team Topologies suggests teams handle one main responsibility + one interaction mode
- Count technologies in the stack (languages, frameworks, databases, queues, deployment platforms)
- Survey developers: "Do you feel overloaded?" (1-5 scale)
Example: A team already managing React, Node.js, PostgreSQL, Redis, and AWS considers adding Kafka. If they're already near cognitive limits, adding Kafka might tip them into constant firefighting.
Diagnostic Questions:
- Would this push the team above its cognitive budget?
- Can we remove something else to make room?
- Is the team already in "survival mode" on current stack?
Further Reading:
- Team Topologies by Matthew Skelton and Manuel Pais
- Martin Fowler's cognitive load article
If a decision lowers deployment frequency or raises change-fail percentage, beware: you've traded adaptability for "best practice" theater.
Why It Matters: The four DORA metrics (deployment frequency, lead time for changes, time to restore service, change failure rate) predict organizational performance.
How to Measure:
- Run an A/B comparison: pilot branch vs. baseline
- Track metrics for 2-4 sprints before and after adoption
- Watch for regression in any of the four keys
DORA Four Key Metrics:
| Metric | What It Measures | Elite Performance |
|---|---|---|
| Deployment Frequency | How often you deploy to production | Multiple times per day |
| Lead Time for Changes | Time from commit to production | Less than one hour |
| Time to Restore Service | Time to recover from incidents | Less than one hour |
| Change Failure Rate | % of deployments causing incidents | 0-15% |
Note for Non-Continuous Teams: DORA metrics reflect continuous delivery practices. If your team operates on planned release cycles (quarterly enterprise software, regulated deployments), substitute relevant metrics:
- Release Quality: Defects per release, customer-reported issues
- Feature Velocity: Story points or features delivered per cycle
- Customer Satisfaction: NPS, support ticket volume
- Incident Rate: Production issues per release
The principle remains: measure delivery effectiveness in your context.
Example: A team adopting microservices sees deployment frequency drop from 5/week to 1/week because coordinated releases require more planning. The architecture might be "better" in theory but worse in practice.
Diagnostic Questions:
- How will this affect our DORA metrics?
- Can we pilot on one service and measure before scaling?
- Are we optimizing for scalability we don't need while sacrificing speed?
Further Reading:
- Accelerate by Nicole Forsgren, Jez Humble, and Gene Kim
- DORA State of DevOps Reports
Model upside as options, not certainties. Real Options thinking values flexibility over dogma and lets you stage investments.
Why It Matters: Technology decisions under uncertainty are options, not obligations. You can defer commitment, pilot, and learn before full investment.
How to Apply:
- Treat decisions as call options with an "exercise price" (cost to fully adopt)
- Value the right to defer until you have more information
- Stage investments: small pilot, partial rollout, full adoption
Example: Instead of rewriting the entire app in a new framework, extract one module, ship it, measure results, then decide whether to continue.
Diagnostic Questions:
- Can we defer this decision via feature flags or pilot lanes?
- What would we learn from a small experiment first?
- What's the cost of waiting 3 months for more data?
Further Reading:
- "Real Options" by Tom Copeland and Vladimir Antikarov
- Chris Matts' work on Real Options in software
Wardley Mapping keeps you from reinventing commodities (e.g., bespoke logging) while spotting areas worth in-house mastery.
Why It Matters: Not all components deserve equal investment. Commodities should be bought; differentiators should be built and mastered.
How to Use:
- Map your value chain from user needs to underlying components
- Place each component on the evolution axis: Genesis → Custom → Product → Commodity
- Invest in differentiators; buy or use open-source commodities
Example: Custom logging is a commodity—use Datadog or ELK. Your unique pricing algorithm is a differentiator—build and optimize it.
Diagnostic Questions:
- Is this component a differentiator or a utility?
- Are we building what we should buy?
- Are we outsourcing what gives us competitive advantage?
Further Reading:
- Wardley Maps by Simon Wardley
- learnwardleymapping.com
Sunk-cost fallacy, resume-driven development, or "that's how FAANG does it" can skew scoring. Force a devil's-advocate review to neutralize bias.
Why It Matters: Cognitive biases make us defend past investments, chase novelty, or copy others without understanding context.
Common Biases:
- Sunk-cost fallacy: "We've already spent 2 weeks on this, can't stop now"
- Resume-driven development: "I want to learn Rust, so let's use Rust"
- Conformity bias: "Google uses it, so we should too"
- Availability bias: "I just read about this, so it must be important"
How to Counter:
- Ask: "If another team proposed this, would we still agree?"
- Rotate the "no" voice—assign someone to argue against the decision
- Separate "what I want to learn" from "what the business needs"
Diagnostic Questions:
- What bias might be influencing this decision?
- Would we make this choice if starting from scratch today?
- Are we justifying a conclusion or genuinely evaluating?
Regulated domains (health, fintech) may hard-block certain stacks. Map legal constraints early or your WORTH score is fiction.
Why It Matters: Legal requirements trump architecture preferences. Compliance failures carry fines, shutdowns, and reputation damage.
Key Regulations:
- HIPAA (health): Encryption, audit logs, breach notification
- PCI DSS (payments): Tokenization, network segmentation, logging
- GDPR/LGPD (privacy): Data minimization, consent, right to deletion
- SOC 2 (enterprise): Security controls, availability, confidentiality
Diagnostic Questions:
- Which regulations apply to our domain?
- Does this technology meet compliance requirements out-of-box?
- What's the audit burden of this choice?
TCO often dwarfs initial build effort. Factor on-call load, infra cost curves, licensing, and vendor escalations.
Why It Matters: A "free" open-source tool that requires 40 hours/month of maintenance costs $48k/year at a $100/hour blended rate.
Components of TCO:
- Capital: Initial development, training, setup
- Operational: Infrastructure, licenses, support contracts
- Maintenance: Patches, upgrades, bug fixes
- Opportunity Cost: Developer time not spent on features
How to Calculate:
- Project 24-month cash flow for "run" costs
- Include hidden costs: on-call, debugging, tribal knowledge loss
- Compare against alternatives (build vs. buy)
Example: Kubernetes costs zero to download but requires dedicated SRE time (estimate $150k/year salary + overhead). A PaaS costs $10k/year but needs minimal ops attention. True cost comparison: k8s = $150k+, PaaS = $10k.
Energy footprint or community support can be differentiators in hiring and PR.
Why It Matters: Green tech attracts talent and customers. Open-source contributions build community goodwill and reduce vendor lock-in.
Considerations:
- Data center energy consumption (cloud regions vary 10x in carbon intensity)
- Open-source sustainability (are we giving back or just taking?)
- Worker conditions (are gig workers treated fairly?)
Diagnostic Questions:
- What's the estimated kWh/month?
- Is the open-source community healthy?
- Does this align with company values?
Early adoption can yield moat-building knowledge or saddle you with a dying ecosystem.
Why It Matters: Bleeding-edge technology can become an asset (early expertise) or a liability (abandoned by vendors).
How to Assess:
- Check Gartner Hype Cycle or ThoughtWorks Tech Radar
- Monitor job posting volume (proxy for ecosystem health)
- Evaluate community momentum (GitHub stars, commits, issues closed)
Diagnostic Questions:
- Is this technology in the "trough of disillusionment" or climbing toward mainstream?
- Will we build competitive advantage from early mastery?
- Is the ecosystem growing or dying?
For high-stakes decisions, use this expanded canvas:
## Problem Statement
[One sentence + cost of delay]
## Options Considered
1. Baseline (do nothing or minimal change)
2. Incremental tweak
3. Ambitious leap
## Scoring Table
| Dimension | Option 1 | Option 2 | Option 3 | Weight |
|-----------|----------|----------|----------|--------|
| W – Weigh | 3 | 4 | 5 | High |
| O – Outcome | 2 | 4 | 3 | High |
| R – Right-sized | 5 | 4 | 2 | High |
| T – Time-to-value | 5 | 3 | 1 | High |
| H – Horizon | 4 | 4 | 2 | Medium |
| **Core WORTH** | **19** | **19** | **13** | - |
| Cost of Delay | 4 | 3 | 1 | High |
| Cognitive Load | 5 | 3 | 1 | High |
| DORA Impact | 3 | 4 | 2 | High |
| Risk | 5 | 3 | 2 | Medium |
| Strategic Fit | 3 | 4 | 5 | Medium |
| **Extended Total** | **39** | **36** | **24** | - |
## Bias Check
- Option 1: Low risk might mean stagnation
- Option 2: Balanced but requires discipline
- Option 3: Exciting but high failure risk
## Decision
[Chosen option + rationale]
## Kill Criteria
Revert if: [Specific measurable conditions]
## Review Date
[Auto-remind 90 days later]This process works for a one-page ADR or a 50-team program.
Write one sentence. Make it numeric if possible.
Example: "Release lead-time ballooned from 3 days to 9 days after adding the reporting service."
Always consider baseline, incremental, and ambitious approaches.
| Label | Description | Estimated Effort |
|---|---|---|
| A – Baseline | Keep monolith, add read-replica database | 2 weeks |
| B – Incremental | Extract reports into separate process + queue | 6 weeks |
| C – Ambitious | Full microservice split + Kubernetes autoscaling | 12 weeks |
Use 1-5 scale. Mark unknowns as -1 multiplier.
| Dimension | Option A | Option B | Option C |
|---|---|---|---|
| W – Weigh | 4 | 4 | 4 |
| O – Outcome | 3 | 4 | 3 |
| R – Right-sized | 5 | 4 | 2 |
| T – Time-to-value | 5 | 3 | 1 |
| H – Horizon | 4 | 4 | 2 |
| Core Total | 21 | 19 | 12 |
Flag critical assumptions.
Example: Option C wins only if we hire two SREs in next 30 days (unlikely). Mark as red flag.
Make the call and define failure conditions.
Decision: Pick Option B (extract reports into separate process).
Kill Switch: If DORA lead time stays above 5 days after 2 sprints, rollback to Option A and revisit.
Set auto-reminder 90 days later in your ADR system to revisit the score with new data.
Context:
- €300/month budget
- No SRE on team
- Goal: Revenue in 90 days
WORTH Outcome: Not worth it
Rationale:
- W = 2: Problem is tiny (< 100 users/day)
- O = 1: Setup takes 4 weeks, blows budget
- R = 1: No k8s experience; steep learning curve
- T = 1: Delays shipping by 1 month
- H = 3: Flexible but creates complexity lock-in
Total: 8/25
Verdict: ❌ Skip. Use a PaaS (Railway, Render, Heroku). k8s adds 4+ weeks with zero customer value. Re-evaluate at 10,000+ DAU or when hiring experienced SRE.
What They Did: Deployed on Railway for $20/month. Shipped MVP in 2 weeks. Gained 500 paying customers before revisiting infrastructure.
Context:
- 40 developers across 4 teams
- PCI compliance requirements
- Release collisions weekly
- Current: Modular monolith
WORTH Outcome: Worth it in stages
Rationale:
- W = 4: Real pain—teams blocking each other
- O = 3: Faster scaling, but ops overhead increases
- R = 3: Two SREs on staff with experience
- T = 2: Full migration takes 6 months
- H = 3: Can phase in gradually
Total: 15/25
Verdict:
What They Did:
- Extracted payments service (high isolation, PCI boundary)
- Ran for 3 months, measured deployment frequency (+40%)
- Extracted auth service
- Kept reporting and admin in monolith (low change rate, high coupling)
Result: Hybrid architecture. Teams ship independently where it matters. Avoided microservices sprawl.
Context: In a well-documented case, Amazon Prime Video cut infrastructure costs by 90% by consolidating their video quality monitoring service from a distributed microservices architecture into a single, more efficient process.
Why It's a Key Example: This was not an abandonment of microservices. It was a targeted application of WORTH thinking to a specific component where the distributed approach had become too costly and complex. It proves that even at massive scale, architectural choices are component-specific, not all-or-nothing dogmas. The rest of Amazon's platform remains one of the largest microservice deployments in the world.
Original Problem: Needed to scale quality monitoring across thousands of streams.
Initial Distributed Approach:
- Split monitoring into 5+ separate services
- Used AWS Step Functions for orchestration
- Hit scaling limits and cost overruns for this specific use case
WORTH Re-evaluation:
- W = 3: Problem still exists but architecture didn't solve it
- O = 1: Infrastructure costs exploded relative to value
- R = 2: Operational complexity became unsustainable
- T = 1: Development velocity tanked
- H = 2: Hard to back out, but they did it anyway
Total: 9/25 (after implementation)
What They Did: Consolidated the monitoring service into a single process. Kept logical module boundaries within the service. Reduced monitoring costs 90%. Increased processing throughput 3x.
Lesson: Revisit WORTH scores yearly. Scale and cost assumptions can flip, making previously "correct" architecture wrong. Also: architectural decisions are component-specific, not all-or-nothing. You can consolidate one service while keeping others distributed.
Further Reading:
- Prime Video Tech Blog post on the migration
- Discussions on Hacker News and r/programming
WORTH is a meta-filter. Before adopting any coding rule—Uncle Bob's Clean Code, Jeff Bay's Object Calisthenics, SOLID principles—run the five questions.
| If the rule says… | Ask with WORTH… | Worth it when… | Not worth it when… |
|---|---|---|---|
| "Wrap every primitive in a Value Object" (Object Calisthenics #3) | O – Outcome > overhead: Does the extra class reveal real behavior or validation? | Domain-heavy codebases (money, dates, IDs) where bugs are expensive | One-off scripts, data processing, throwaway ETL |
| "Functions should do one thing" (Clean Code) | R – Right-sized: Do we have time to refactor, and will it help juniors reason faster? | Core modules touched daily, high change rate | Legacy subsystem slated for decommission in 3 months |
| "No 'else' statements" (Object Calisthenics #2) | H – Horizon flexibility: Does banning else keep options open, or push us into contorted polymorphism? | Code that thrives on polymorphic behavior (strategy engines) | Simple data flows where a guard clause reads clearer |
| "Depend upon abstractions, not concretions" (SOLID-D) | W – Weigh the problem: Are we actually swapping implementations? | Plugin systems, multi-cloud adapters | One cloud, one database, stable for years |
The original rule books give general guidance. WORTH makes sure they're cost-effective in your context, today.
## Which Principle Are We Invoking?
Clean Code: "Functions should do one thing"
## WORTH Score
- [ ] W – Weigh: Functions currently 200+ lines, causing 3 bugs/week
- [ ] O – Outcome: Refactor saves 5 hours/week debugging
- [ ] R – Right-sized: Team comfortable with Extract Method pattern
- [ ] T – Time-to-value: Complete in 3 days
- [ ] H – Horizon: Easier to test and modify going forward
**Total**: 4/5 → ProceedIf WORTH ≤ 3 out of 5, tag #possible-over-engineering for reviewer attention.
Quarterly, pick one principle adopted last quarter. Re-run WORTH. If score dropped (team changed, scope shifted), schedule removal or simplification.
Using WORTH with clean code rules prevents two extremes:
- Dogma-driven engineering: Blindly applying every rule because "the book says so"
- Chaos-driven shortcuts: Rejecting discipline because "we're moving fast"
Instead, every guideline must pay rent in the form of clearer code, faster delivery, or lower defects that outweigh its cognitive and calendar cost.
What It Is: Singleton, Tight-coupling, Untestability, Premature-optimization, Indescriptive naming, Duplication
WORTH Check: Outcome > overhead – Will refactoring pay back quickly in fewer bugs or faster changes?
Worth It: Core modules touched daily; every hour saved in debugging repays fast
Not Worth It: Throwaway migration script that dies next sprint
What It Is: Composability, Unix-philosophy, Predictability, Idiomatic, Domain-based
WORTH Check: Right-sized for team – Can your devs design tiny, composable units without paralysis?
Worth It: Green-field services where micro-features ship independently
Not Worth It: Legacy blob where carving out composable pieces freezes delivery
What It Is: Fast, Independent, Repeatable, Self-validating, Timely
WORTH Check: Time-to-value – Do cleaner tests shorten feedback loops enough to justify extra mocks and fixtures?
Worth It: High-change domains (pricing engines, rules) where hours matter
Not Worth It: Stable ETL job touched twice a year; quick smoke test is cheaper
What It Is: Independent, Negotiable, Valuable, Estimable, Small, Testable
WORTH Check: Weigh the problem – Is story churn causing rework or developer idle time?
Worth It: Distributed teams sprinting in parallel needing clear boundaries
Not Worth It: Solo developer on hobby app—sticky notes beat fine-grained backlog
What It Is: Specific, Measurable, Achievable, Relevant, Time-bound
WORTH Check: Horizon flexibility – Does locking target and date help focus, or stifle exploration?
Worth It: Delivery roadmaps where dates affect marketing or compliance
Not Worth It: Early R&D spikes where discovery is the whole point
What It Is: ACID (Atomicity, Consistency, Isolation, Durability) vs. BASE (Basically-Available, Soft-state, Eventually-consistent)
WORTH Check: Outcome > overhead – Which failure hurts worse: stale reads or blocked writes?
ACID Worth It: Banking ledger behind a queue
BASE Worth It: Feed counters, analytics dashboards
What It Is: Consistency, Availability, Partition tolerance; under a network partition, you must choose between Consistency and Availability
WORTH Check: Risk & uncertainty – Which guarantee keeps revenue safest under network splits?
Choose AP (Availability + Partition): Global SaaS where up-to-date cart less critical than staying online
Choose CP (Consistency + Partition): Stock trading engine; tolerate brief downtime for correctness
- Name the acronym you're tempted to enforce
- Run WORTH questions (+ risk lens if stakes high)
- If < 3 clear "yes" answers, drop or postpone
Result: Every principle earns its keep instead of bloating rituals or code.
This is the most contentious architecture debate. WORTH cuts through the noise.
WORTH Profile:
- W = 2-3: Single team, low coordination overhead
- O = 5: Simple deployment, low ops cost
- R = 5: Fits most team skill sets
- T = 5: Ships fast, refactors easily
- H = 4: Can extract services later via strangler pattern
Signals:
- Team size < 10 developers
- Single deployment cadence acceptable
- Domain boundaries still emerging
- Database transactions simplify consistency
Example: Early-stage SaaS, internal tools, MVP validation
WORTH Profile:
- W = 4-5: Multiple teams, release collision pain
- O = 3-4: Ops overhead justified by independent scaling
- R = 3-4: Team has SRE/DevOps expertise
- T = 2-3: Migration takes months but unlocks velocity
- H = 4: Polyglot flexibility, independent evolution
Signals:
- 3+ teams needing independent deploy cadence
- Clear bounded contexts (payments, auth, catalog, etc.)
- Different scaling needs per service
- Regulatory boundaries (PCI isolation)
Example: Scale-ups post-Series B, organizations with 40+ engineers
WORTH Profile: Often scores highest (20-23/25)
Characteristics:
- Logical modules with strong boundaries
- Shared database but separate schemas
- Single deployment artifact
- Can extract services later when justified
When It's Best: Most of the time, especially for teams in the 10-40 developer range.
| Option | W (Problem) | O (Outcome) | R (Team) | T (Time) | H (Horizon) | Total |
|---|---|---|---|---|---|---|
| Bare VM | 2 (simple needs) | 4 (cheap, fast) | 5 (everyone knows) | 5 (deploy today) | 3 (harder to scale) | 19 |
| PaaS | 3 (want auto-scale) | 5 (instant value) | 5 (no ops needed) | 5 (deploy today) | 4 (vendor lock manageable) | 22 |
| Kubernetes | 4 (complex needs) | 3 (powerful but heavy) | 2 (need expertise) | 1 (weeks to setup) | 5 (max flexibility) | 15 |
Verdict for Most Teams < 50 Developers: PaaS wins (Heroku, Render, Railway, Fly.io)
When k8s Becomes Worth It:
- 50+ services needing orchestration
- Multi-cloud strategy required
- Cost optimization at massive scale (>
$50k/monthinfra spend) - Team includes dedicated SREs
Why People Repeat It: Tribal pride, blog benchmarks, echo chambers
Reality Check: Match runtime, ecosystem, hiring market, and performance envelope to your needs. Switching costs eclipse eloquence.
WORTH Analysis:
| Situation | Language Choice | Why |
|---|---|---|
| ML/Data Science | Python | Ecosystem (NumPy, PyTorch, scikit-learn) |
| Network Services | Go | Concurrency, deployment simplicity |
| Systems Programming | Rust/C++ | Performance, memory safety |
| Web Backends | Node.js, Python, Ruby, Go | Team skill + ecosystem fit |
When It Pays Off: Domains with clear incumbent (Python for ML, Go for network daemons)
When It Hurts: Polyglot hobby projects that quietly become production—ops chaos, talent shortage, fragmented tooling
WORTH Question: Does this language choice improve outcomes more than the cost of switching, hiring, and maintaining two stacks?
Why People Repeat It: Books sell certainty; lint rules give easy metrics
Reality Check: Each rule adds lines, indirection, review time. Measure gains in defect rate or comprehension, not aesthetic purity.
Spectrum:
- Too Rigid: 300-line class for a 10-line feature because "SRP demands it"
- Too Loose: 2000-line God class doing everything
- Middle Ground: Modules with clear responsibilities, reasonable cohesion, pragmatic extraction when pain hits
WORTH Question: Does this clean code practice reduce bugs or speed changes more than the boilerplate costs?
When It Pays Off: Core libraries touched daily where bugs are costly
When It Hurts: Short-lived scripts, migration throwaway code, experiments racing deadlines
Why People Repeat It: Copying Big Tech success stories without context
Reality Check: k8s ⇒ steep cognitive load; microservices ⇒ extra latency, ops, observability. Ask: "Will a single VM + CI keep us shipping faster for the next 12 months?"
The "You Are Not Netflix" Principle:
- Netflix: 1000+ engineers, millions of subscribers, polyglot services
- You: 3 engineers, 100 users, single product
WORTH Question: Does the complexity pay off before we run out of runway?
When It Pays Off: Multiple cross-functional teams needing independent deploys, strict uptime SLOs
When It Hurts: Three-dev startup chasing product-market fit with < $1k/month budget
What to Do Instead:
- Start with PaaS or single server
- Add modules/boundaries in monolith
- Extract 1-2 services when pain justifies it
- Scale infrastructure when revenue supports it
Why People Repeat It: Comfort for managers, audit trails for enterprises
Reality Check: Process speed ∝ team size × coupling. Bureaucracy kills flow in small squads.
Spectrum:
- Too Rigid: SAFe for 5-person startup (planning ceremonies take longer than building)
- Too Loose: No standups, no retrospectives, chaos
- Middle Ground: Lightweight rituals scaled to team size
WORTH Question: Does this process improve delivery speed or just create overhead?
| Team Size | Appropriate Process |
|---|---|
| 1-3 devs | Kanban board, weekly sync, monthly retro |
| 4-10 devs | Scrum-lite: standups, sprints, retros |
| 10-50 devs | Scrum + architecture review + quarterly planning |
| 50+ devs | Shape Up, SAFe, or custom scaled framework |
When It Pays Off: Regulated scale-ups (fintech, health) where compliance beats velocity
When It Hurts: Seed-stage ventures where market changes faster than sprint review cycle
Add a WORTH delta checkbox for non-trivial changes:
## WORTH Delta (for changes adding complexity)
Does this change raise our cognitive load score?
- [ ] Adds new technology/framework
- [ ] Increases coupling between modules
- [ ] Requires new operational expertise
If yes to any, WORTH justification required.
**WORTH Score**: __/25 (if applicable)Post DORA metric diffs against the feature branch so reviewers see immediate delivery impact:
# .github/workflows/worth-check.yml
name: WORTH Metrics
on: pull_request
jobs:
dora-diff:
runs-on: ubuntu-latest
steps:
- name: Calculate lead time diff
run: |
# Compare current branch vs main
# Post diff to PR commentVisible Metrics:
- Build time change
- Test duration change
- Estimated deployment frequency impact
Every story > 2 days of effort must link to its parent WORTH-scored ADR.
Example:
## Story: Add GraphQL endpoint for dashboard
**Parent ADR**: #147 (GraphQL Migration - WORTH Score 18/25)
**Justification**: Reduces frontend API calls from 7 to 1, improving load time by 40%Rotate a "devil's advocate" who:
- Selects one live component or technology
- Re-scores it with current context
- Proposes kill/simplify plan if score < 15
Example Retro:
- Q1 2024: Adopted Redis for caching (score: 19/25)
- Q4 2024: Team size halved, cache hit rate 12%, ops burden high
- Re-score: 11/25
- Action: Remove Redis, use in-memory cache, save
$200/month+ 5 hours/week ops
Symptom: Technology choice justified by "industry standard" or "I want to learn this"
Why It Happens: Individual career goals override business needs
Guard-rail: Bias gate—mandatory counter-proposal from someone outside the team who argues for simpler alternative
Example:
- Proposed: Rewrite in Rust because "it's the future"
- Counter: Keep Python, invest saved time in user research
- WORTH Score: Rust = 9/25, Python = 21/25
- Decision: Stay with Python
Symptom: Dozens of tiny services, yet releases still tied together; DORA lead time unchanged or worse
Why It Happens: Copying Big Tech patterns without understanding context
Guard-rail: Check DORA lead time quarterly. If unchanged after 6 months, merge modules back
Example:
- Split monolith into 15 services
- Deployment frequency: Before 5/week → After 1/week
- Lead time: Before 2 days → After 5 days
- Action: Consolidated 10 services back into 3 larger modules
- Result: Deployment frequency back to 4/week
Symptom: One expert owns critical infrastructure (k8s, database); vacations freeze deploys
Why It Happens: Hiring/training gaps; over-reliance on individual
Guard-rail: WORTH "Right-sized" must score ≥ 3 → at least two on-call capable developers
Example:
- k8s managed by single SRE
- SRE leaves company
- 6 weeks of deployment freeze while team ramps up
- Prevention: Require pair rotation, documentation, training budget
Symptom: Extended WORTH scoring sessions lasting days; no decision made
Why It Happens: Perfectionism, fear of commitment, unclear decision-maker
Guard-rail: Time-box WORTH canvas to 2 working days maximum. Default to lowest-cost prototype if still undecided.
Example:
- Team debates database choice for 3 weeks
- Intervention: "We'll pilot PostgreSQL for 2 sprints. If it doesn't work, we switch. Decision made, moving on."
Symptom: Scores are artificially inflated or manipulated to justify a predetermined outcome. This nullifies the framework's purpose, turning it into a tool for validating biases rather than challenging them.
How Teams Game the Score:
- Inflating scores: Marking unknowns or subjective opinions as 5/5 without evidence.
- Cherry-picking evidence: Ignoring contradictory data that would lower a score.
- Skipping the bias gate: Not having a neutral party or "devil's advocate" review the scoring.
- Ignoring re-scoring: Never revisiting past decisions to validate the original score against real-world outcomes.
Guard-Rails:
- Require concrete evidence for any score of 4 or 5.
- Rotate the "devil's advocate" or "no" voice for bias checks.
- Post scorecards publicly within the engineering team for peer review.
- Calendar auto-reminders for quarterly re-scores of major decisions.
Track these to validate WORTH-driven decisions:
| Metric | How WORTH Helps |
|---|---|
| Deployment Frequency | T (Time-to-value) ensures choices accelerate shipping |
| Lead Time for Changes | O (Outcome) prevents overhead that slows delivery |
| Time to Restore | R (Right-sized) ensures team can operate without heroics |
| Change Failure Rate | W (Weigh) focuses on real problems, not speculative fixes |
How to Measure: Tools like Sleuth, LinearB, or custom scripts parsing git + deploy logs
Quarterly developer survey:
Rate 1-5 (1 = low, 5 = overloaded):
1. I feel overwhelmed by the number of technologies I need to know
2. I can debug production issues without asking for help
3. I understand how most of our systems work
4. I have time to learn new skills
Score < 12 = Healthy
Score 12-16 = Warning
Score > 16 = Overloaded (simplify stack)Track time from "proposal" to "decision made":
- Before WORTH: Median 3 weeks of debate
- After WORTH: Median 3 days with scorecard
Goal: Reduce decision paralysis while maintaining quality
What percentage of decisions get reversed within 6 months?
- Too High (> 30%): Scoring too optimistically; need more rigor
- Too Low (< 5%): Maybe too conservative; missing opportunities
- Healthy (10-20%): Acceptable learning rate
Track infrastructure and operational costs quarterly:
## Q3 2024 Infra Costs
| Category | Cost | Change vs. Q2 |
|----------|------|---------------|
| Cloud (AWS) | `$8,200` | -15% (removed unused k8s cluster) |
| SaaS Tools | `$3,400` | +5% (added monitoring) |
| On-call Hours | 40 hrs | -30% (simplified stack) |
**WORTH Impact**: Killed over-engineered service mesh (score dropped to 9/25). Saved `$1,500/month` + reduced on-call burden.Decision: Migrate from REST to GraphQL for dashboard API
Initial WORTH Score: 17/25
- W = 4 (7 API calls per page load)
- O = 4 (expect 40% faster loads)
- R = 3 (team learning GraphQL)
- T = 3 (2-sprint migration)
- H = 3 (can revert via feature flag)
Measurement Plan:
- Week 0: Baseline page load time (2.1s), API call count (7), developer velocity (8 stories/sprint)
- Week 4: Pilot on one dashboard
- Week 8: Full rollout
- Week 12: Re-measure
Results:
- Page load: 1.3s (38% faster) ✅
- API calls: 1 (86% reduction) ✅
- Developer velocity: 6 stories/sprint (25% slower) ❌
- On-call incidents: +2 (query complexity bugs) ❌
Re-score WORTH: 14/25 (dropped from 17)
- O = 3 (speed gain offset by velocity loss)
- R = 2 (team still struggling)
Action: Pause further rollout. Keep GraphQL for existing dashboards but don't expand. Re-evaluate in 6 months after team upskills.
To make WORTH a team-wide habit, it needs to be embedded in the daily rituals of software development. The goal is to make the five questions an automatic, low-friction part of decision-making.
- Pick one pending decision (technology, architecture, process).
- Fill out the WORTH scorecard with the team (15 minutes max).
- Share the result and make a call with explicit kill criteria.
- Set a 90-day reminder to re-score the decision with real data.
Repeat this process. Over time, asking "Is it worth it?" becomes muscle memory, helping your team stop chasing hype and start compounding value.
Context:
- 2 developers (full-stack)
- €300/month runway
- MVP targeting small businesses
- Goal: Revenue within 90 days
Pain Point: Need to deploy app; considering infrastructure options.
Options:
| Option | Setup Time | Monthly Cost | Ops Burden |
|---|---|---|---|
| A: DigitalOcean VM | 1 day | €20 | Medium (manual deploys) |
| B: Railway PaaS | 2 hours | €20-40 | Minimal |
| C: Kubernetes (EKS) | 2-3 weeks | €150+ | High |
WORTH Scores:
| Dimension | VM | Railway | k8s |
|---|---|---|---|
| W – Weigh | 3 | 3 | 2 |
| O – Outcome | 4 | 5 | 1 |
| R – Right-sized | 4 | 5 | 1 |
| T – Time-to-value | 5 | 5 | 1 |
| H – Horizon | 3 | 4 | 5 |
| Total | 19 | 22 | 10 |
Decision: Railway (score: 22/25)
Rationale: Maximize time-to-value. Can ship MVP this week. Budget-friendly. Easy migration path if growth demands it.
Implementation:
- Deployed in 3 hours
- Auto-deploy on git push
- First customer within 2 weeks
6-Month Outcomes:
- 500 paying customers
$12kMRR- Infrastructure cost:
$85/month - Zero downtime incidents
- WORTH re-score: 23/25 (H improved from 4 to 5 based on proven migration path; could now scale to k8s if needed)
Lessons Learned:
- PaaS removed all infrastructure distraction
- Delayed optimization was correct call
- Now at scale where k8s might be justified (revisiting at $50k MRR)
Context:
- 40 developers, 4 teams
- Series C funded
- PCI DSS compliance required
- Modular monolith causing deployment conflicts
Pain Point: Teams blocking each other 2-3 times/week waiting for deployment windows. Lead time increased from 3 days to 7 days.
Options:
| Option | Migration Time | Complexity | Team Impact |
|---|---|---|---|
| A: Tighten module boundaries | 1 month | Low | Minimal |
| B: Extract 2-3 services (payments, auth) | 3-4 months | Medium | 2 teams affected |
| C: Full microservices (10+ services) | 8-12 months | High | All teams disrupted |
WORTH Scores:
| Dimension | A: Modules | B: Hybrid | C: Full |
|---|---|---|---|
| W – Weigh | 3 | 4 | 4 |
| O – Outcome | 3 | 4 | 3 |
| R – Right-sized | 5 | 4 | 2 |
| T – Time-to-value | 5 | 3 | 1 |
| H – Horizon | 4 | 5 | 4 |
| Total | 20 | 20 | 14 |
Decision: Option B (Hybrid) - score tied with A, but better horizon flexibility tipped decision
Rationale:
- PCI compliance easier with separate payments service
- Auth used by all teams, high coupling risk
- Keep reporting, admin, backoffice in monolith (low change frequency)
Implementation:
- Month 1-2: Extract payments service (clear PCI boundary)
- Month 3: Measure: deployment frequency +30%, lead time -1 day
- Month 4-5: Extract auth service
- Month 6: Stabilize, measure
6-Month Outcomes:
- Deployment frequency: 2/week → 6/week
- Lead time: 7 days → 3.5 days
- Change failure rate: 12% → 15% (slight increase during transition)
- Infrastructure cost: +$800/month (acceptable)
- Team satisfaction: +25% (survey)
WORTH Re-score: 21/25 (validated)
Lessons Learned:
- Hybrid architecture was sweet spot
- Didn't extract everything—kept stable domains in monolith
- PCI isolation made compliance audit 50% faster
- Now template for future service extraction
Why It's a Dogma: TDD can add indirection and slow exploratory work; even its creators debate universal value.
Reality Check: TDD shines for:
- Safety-critical code (billing engines, medical devices)
- Long-lived domains with clear requirements
- Refactoring existing code
TDD struggles for:
- Green-field spikes exploring problem space
- Data science notebooks
- One-off migration scripts
WORTH Application:
| Context | W | O | R | T | H | Total | Verdict |
|---|---|---|---|---|---|---|---|
| Billing engine | 5 | 5 | 4 | 3 | 5 | 22 | ✅ TDD worth it |
| One-off migration | 2 | 1 | 3 | 1 | 3 | 10 | ❌ Post-hoc tests sufficient |
Why It's a Dogma: "Never think about servers again!" marketing obscures trade-offs.
Reality Check:
- Wins: Spiky traffic, infrastructure cost tied to usage, zero idle costs
- Loses: Cold-start latency, opaque debugging, vendor lock-in
WORTH Application:
| Scenario | Cold Start Acceptable? | Debugging Needs | Lock-In Risk | Verdict |
|---|---|---|---|---|
| Image resize API | Yes (async) | Low | Low | ✅ Lambda |
| Real-time chat | No | High | High | ❌ PaaS/VM |
Rollback Story: A startup moved to Lambda for all APIs. Debugging production issues took 5x longer (no SSH, CloudWatch delays). After 4 months, moved critical path back to EC2, kept async jobs on Lambda. Hybrid won.
Why It's a Dogma: Loose coupling sounds great in theory.
Reality Check:
- Wins: Multiple consumers, async workflows, audit trails
- Loses: Traceability nightmares, idempotency complexity, eventual consistency headaches
WORTH Decision Matrix:
| Use Case | Consistency Needs | Traceability | Latency | Verdict |
|---|---|---|---|---|
| Order processing | Strong | High | Low | ❌ Request/response + DB transaction |
| Analytics events | Eventual OK | Medium | High OK | ✅ Event stream |
Why It's a Dogma: Flexible queries sound universally better than REST.
Reality Check:
- Wins: Rich clients querying complex object graphs, mobile apps with bandwidth constraints
- Loses: N+1 query problems, complex caching, hard to secure joins, overkill for CRUD
WORTH Sidebar:
| API Pattern | When to Use |
|---|---|
| REST | Simple CRUD, public APIs, need HTTP caching |
| gRPC | Internal services, performance-critical, typed contracts |
| GraphQL | Rich client queries, reducing round trips, mobile apps |
Problem: WORTH can score "yes" but legal requirements veto the choice.
Solution: Add compliance checklist before scoring:
## Compliance Gate (complete BEFORE WORTH scoring)
- [ ] PCI DSS requirements (if handling payments)
- [ ] HIPAA requirements (if handling health data)
- [ ] GDPR/LGPD (if handling EU/Brazil personal data)
- [ ] SOC 2 controls (if selling to enterprises)
- [ ] Threat model reviewed (if internet-facing)
If ANY are blocked, decision is vetoed regardless of WORTH score.Why It Matters: SQL vs. NoSQL wars rival language wars in heat/lack of nuance.
WORTH Application:
| Storage Type | When It Wins | When It Loses |
|---|---|---|
| PostgreSQL (ACID) | Relational data, strong consistency, complex queries | Massive write throughput, schemaless flexibility |
| MongoDB (BASE) | Schemaless flexibility, horizontal scaling | Complex joins, strong consistency needs |
| Redis | Caching, sessions, pub/sub | Primary data store, durability critical |
| NewSQL (CockroachDB) | Need both ACID + horizontal scale | Low-write workloads, budget-constrained |
WORTH Question: Does our data access pattern justify the operational complexity of this database?
Dogma: "Full OTEL stack or you're flying blind"
Reality: Observability has costs (infrastructure, cognitive load, vendor bills)
WORTH Spectrum:
| Stage | Logging | Metrics | Tracing | APM | Cost |
|---|---|---|---|---|---|
| MVP (< 1000 users) | Structured logs (free tier) | Basic (uptime, errors) | None | None | $0-50/mo |
| Growth (1k-50k users) | Centralized (ELK/Datadog) | RED metrics | Critical paths | Optional | $200-1k/mo |
| Scale (50k+ users) | Full OTEL | Full dashboard | Distributed | Full APM | $2k+/mo |
WORTH Question: Does the observability investment reduce MTTR enough to justify the cost?
Dogma 1: "Only hire seniors"
Reality: Balanced teams (seniors + mid-level + juniors) often outperform all-senior teams due to:
- Cost efficiency
- Diverse perspectives
- Knowledge transfer culture
- Hunger/growth mindset
WORTH Question: Does hiring exclusively seniors improve delivery more than the 2x salary cost?
Dogma 2: "10x engineer myth"
Reality: "10x teams" exist; "10x solo engineers" are rare and create bus-factor risk.
WORTH Question: Are we building for individual heroics or team resilience?
Blind Spot: Most books ignore one-person projects.
WORTH Solo Mode:
Default Answers for Solo Devs:
- W: Only solve pains you feel daily
- O: Optimize for shipping, not elegance
- R: Stick to technologies you already know
- T: MVP in days/weeks, not months
- H: Avoid vendor lock-in (you can't negotiate)
One-Page Solo Canvas:
## Solo Dev WORTH Checklist
- [ ] Do I actually need this, or is it "best practice" theater?
- [ ] Can I build and run this alone in < 1 week?
- [ ] Will this help me ship faster or just add complexity?
- [ ] Can I reverse this in a weekend if it doesn't work?
If < 3 checks pass, skip it.# WORTH Decision Canvas
**Date**: ___________ **Decision-Maker**: ___________ **Reviewer**: ___________
## Problem Statement (one sentence)
[What pain are we solving?]
**Cost of Delay**: $______/week OR _______ hours/week
---
## Options Considered
| Option | Description | Estimated Effort |
|--------|-------------|------------------|
| A (Baseline) | | |
| B (Incremental) | | |
| C (Ambitious) | | |
---
## WORTH Scorecard (1-5 scale, unknown = -1)
| Dimension | A | B | C | Weight |
|-----------|---|---|---|--------|
| **W** – Weigh the problem | | | | High |
| **O** – Outcome > overhead | | | | High |
| **R** – Right-sized team | | | | High |
| **T** – Time-to-value | | | | High |
| **H** – Horizon flexibility | | | | Medium |
| **CORE TOTAL** | | | | |
### Extended Lenses (optional, high-stakes only)
| Dimension | A | B | C |
|-----------|---|---|---|
| Cost of Delay | | | |
| Cognitive Load | | | |
| DORA Impact | | | |
| Compliance | ✅/❌ | ✅/❌ | ✅/❌ |
---
## Bias Check
**Potential biases influencing this decision**:
- [ ] Sunk-cost fallacy
- [ ] Resume-driven development
- [ ] Conformity ("everyone's doing it")
- [ ] Availability bias
**Counter-argument for preferred option**: ___________
---
## Decision
**Chosen**: Option ____
**Rationale**: ___________
**Kill Criteria** (revert if):
1. ___________
2. ___________
**Review Date**: ___________ (90 days from decision)
---
## Signatures
**Approved**: ___________ (Tech Lead)
**Reviewed**: ___________ (Devil's Advocate)| Dogma | Reality Check | Pays Off When | Hurts When |
|---|---|---|---|
| "X is the best language" | Match runtime, ecosystem, hiring market to needs | Domain has clear incumbent (Python-ML, Go-network) | Polyglot chaos; ops fragmentation |
| "Follow every Clean Code rule" | Weigh defects saved vs. boilerplate added | Core libraries touched daily | Throwaway migration scripts |
| "Start with k8s & microservices" | Cognitive load check; outcome > overhead | Multi-team, strict SLO scale-ups | 3-dev startup pre-PMF |
| "Heavyweight process for all" | Time-to-value; process ∝ team size | Regulated enterprises | Seed-stage pivoting weekly |
| "TDD everywhere" | Test value vs. exploration speed | Safety-critical long-lived code | Green-field spikes, notebooks |
| "Serverless cures ops" | Cold-start vs. observability trade-off | Spiky traffic, budget tied to usage | Steady low-latency workloads |
| "EDA > request/response" | Traceability vs. loose coupling | Multi-consumer async workflows | Single-team ACID transactions |
| "GraphQL over REST" | Query flexibility vs. caching complexity | Rich client queries, mobile | Simple CRUD, public APIs |
| Acronym | Core Idea | WORTH Check | Worth It | Not Worth It |
|---|---|---|---|---|
| STUPID | Six code smells to avoid | Outcome > overhead | Daily-touched modules | One-off scripts |
| CUPID | Joyful code properties | Right-sized team | Green-field micro-features | Legacy blob refactor |
| FIRST | Fast, independent tests | Time-to-value | High-change domains | Yearly ETL jobs |
| INVEST | Story slicing criteria | Weigh the problem | Parallel distributed teams | Solo dev hobby projects |
| SMART | Goal-setting framework | Horizon flexibility | Compliance-driven roadmaps | R&D exploration spikes |
| ACID/BASE | DB guarantee trade-offs | Outcome > overhead | Banking (ACID), Social feeds (BASE) | - |
| CAP | Distributed systems trade | Risk lens | Trading (CP), SaaS cart (AP) | - |
| Pattern | When It Wins | WORTH Red Flags |
|---|---|---|
| Monolith | Single team, emerging domain, rapid iteration | Team > 15, deployment conflicts, different scaling needs |
| Modular Monolith | 10-40 devs, clear modules, shared DB acceptable | Need polyglot, PCI isolation, independent scaling |
| Microservices | 40+ devs, multiple teams, clear bounded contexts | < 10 devs, single team, shared transactions common |
| Serverless | Spiky traffic, async jobs, cost ∝ usage | Real-time requirements, deep debugging needs |
| Option | Setup Time | Monthly Cost | Ops Burden | Flexibility | Best For |
|---|---|---|---|---|---|
| Bare VM | 1 day | $20-100 | Medium | Low | Solo dev, learning, static sites |
| PaaS | 2 hours | $20-500 | Minimal | Medium | Startups, small teams, rapid iteration |
| Kubernetes | 2-4 weeks | $200+ | High | Very High | 50+ services, multi-cloud, large teams |
Dogma: "Everything must be IaC (Terraform, Pulumi) or you're unprofessional"
Reality Check: IaC shines in multi-environment estates (dev, staging, prod × 3 regions = 9 configs). For a single VM, manual terraform apply can be heavier than the work itself.
WORTH Application:
| Infrastructure Scale | IaC Worth It? | Why |
|---|---|---|
| 1-2 VMs | ❌ No | Manual setup faster; fewer changes |
| 3-10 servers, multiple envs | ✅ Yes | Reproducibility saves time; disaster recovery |
| 50+ resources, multi-region | ✅ Definitely | Manual impossible; drift prevention critical |
Dogma: "Feature flags everywhere for continuous deployment"
Reality Check: Flags multiply configuration drift and kill readability when product has < 100 users and tight feedback loops.
WORTH Application:
| Stage | Flag Complexity | Worth It? | Rationale |
|---|---|---|---|
| MVP (< 100 users) | High (10+ flags) | ❌ No | Just deploy and monitor |
| Growth (1k-10k users) | Medium (targeted rollouts) | ✅ Yes | Gradual rollout reduces risk |
| Scale (100k+ users) | Managed (kill-switch only) | ✅ Definitely | Instant rollback critical |
Dogma: "Agile = daily standups + two-week sprints, non-negotiable"
Reality Check: Async tools (Linear, Tuple, Loom) yield same visibility for remote micro-teams without synchronous overhead.
WORTH Application:
| Team Situation | Synchronous Standups | Async Updates |
|---|---|---|
| Co-located team, 5-10 devs | ✅ 15-min standup works | - |
| Remote team, < 5 devs, tight communication | ❌ Overhead > value | ✅ Daily Slack updates |
| Remote team, 10+ devs, coordination heavy | ✅ Yes, but 2-3x/week | ✅ + async for off-days |
Dogma: "Only merge via long-running release branches (Git Flow)"
Reality Check: Trunk-based development with small PRs often halves merge conflicts and lead time.
WORTH Application:
| Deployment Frequency | Git Flow | Trunk-Based |
|---|---|---|
| Monthly releases | ✅ Acceptable | - |
| Weekly releases | ✅ Better flow | |
| Daily+ releases | ❌ Bottleneck | ✅ Essential |
Dogma: "DDD means full bounded-context everything with event storming and aggregates"
Reality Check: Heavy DDD artifacts pay off in large, domain-rich organizations—not in simple CRUD SaaS v1.
WORTH Application:
| Domain Complexity | Full DDD | Lite DDD (just bounded contexts) |
|---|---|---|
| Simple CRUD (admin panel, CMS) | ❌ Overkill | ✅ Basic modules enough |
| Medium (e-commerce, SaaS) | ✅ Yes, skip event sourcing | |
| Complex (fintech, healthcare) | ✅ Justified | - |
ADR (Architecture Decision Record): Document capturing important architectural decisions with context, options, and rationale.
DORA Metrics: Four key metrics (deployment frequency, lead time, MTTR, change failure rate) predicting software delivery performance.
Cognitive Load: Mental effort required to understand and operate a system; teams have finite capacity.
Cost of Delay: Economic impact of postponing value delivery; calculated as (weekly value) × (weeks delayed).
PaaS (Platform as a Service): Cloud hosting that abstracts infrastructure (Heroku, Railway, Render).
Real Options: Financial concept applied to software: valuing the right to defer decisions until more information is available.
TCO (Total Cost of Ownership): Full lifecycle cost including development, operations, maintenance, and opportunity costs.
Wardley Mapping: Strategic planning tool showing component evolution from genesis to commodity.
-
Accelerate by Nicole Forsgren, Jez Humble, Gene Kim The science behind DORA metrics and high-performing teams
-
Team Topologies by Matthew Skelton, Manuel Pais Organizing teams for fast flow and cognitive load management
-
The Principles of Product Development Flow by Donald Reinertsen Cost of Delay, queuing theory, and economic decision-making
-
Wardley Maps by Simon Wardley Strategic context and component evolution
-
Working Effectively with Legacy Code by Michael Feathers Pragmatic refactoring without dogma
-
A Philosophy of Software Design by John Ousterhout Complexity management and design trade-offs
- DORA Research: dora.dev
- Team Topologies: teamtopologies.com
- Wardley Mapping: learnwardleymapping.com
- ThoughtWorks Tech Radar: thoughtworks.com/radar
- Martin Fowler's Blog: martinfowler.com
This framework improves through real-world application. Share your WORTH decisions:
Template:
## Decision: [Title]
**Context**: [Team size, domain, constraints]
**WORTH Score**: __/25
**Outcome**: [What happened]
**Lesson**: [What you'd do differently]Submit case studies and feedback to: github.com/jeansossmeier/worth-framework
Great engineering isn't about accumulating principles. It's about consistently choosing the simplest solution that delivers outsized value in your context.
WORTH turns that judgment into a repeatable, team-wide habit by asking five questions:
- Is the pain real and business-critical? (Weigh)
- Does benefit exceed cost? (Outcome)
- Can the team operate it? (Right-sized)
- Does it deliver value soon? (Time)
- Does it keep options open? (Horizon)
If you can't answer "yes" to at least three with clear evidence, don't adopt it yet.
Not another best practice: WORTH sits above principles, helping you decide which to apply when.
Context-aware: What works for Netflix doesn't work for your three-person startup. WORTH forces explicit context.
Measurable: Scorecards, DORA metrics, and TCO calculations replace gut feelings with evidence.
Iterative: Quarterly re-scoring catches when "right" becomes "wrong" as context shifts.
Actionable: Templates, checklists, and automation make it operational, not theoretical.
- Pick one pending decision (technology, architecture, process)
- Fill out the WORTH scorecard (5 minutes)
- Share with your team for bias check
- Make the call with explicit kill criteria
- Set 90-day reminder to re-score with real data
That's it. One decision. Five questions. Repeat.
Over time, asking "Is it worth it?" becomes muscle memory. Your team stops chasing hype and starts compounding value.
The best engineers don't memorize frameworks. They ask better questions.
Is it worth it?
Everything else follows from that.
Go build something worth building.