WORTH: A Framework for Context-Driven Technology Decisions

A pragmatic approach to cutting through dogma and making engineering choices that deliver real value

Version: 1.0a
Author: Jean Luck Sossmeier
Date: October 2025
Repository: github.com/jeansossmeier/worth-framework

Part I: Foundation

1. Introduction: The Problem with Technology Dogmas

Software engineering is drowning in principles. KISS, DRY, YAGNI, SOLID, STUPID, CUPID—the list grows monthly. Each promises clarity, yet teams still waste months on Kubernetes setups they don't need, microservices that slow them down, and clean code rules that add more ceremony than value.

The problem isn't the principles themselves. The problem is treating them as universal laws rather than context-dependent tools.

The Missing Question

Behind every architectural debate, every technology choice, every process adoption, one question gets skipped:

Is it worth it—here, now, for us?

Not "is it best practice?" Not "what does Big Tech do?" Not "what's trending on Hacker News?"

Worth it. In your context. With your team. On your budget. For your timeline.

Why Context Matters More Than Dogma

Consider these common scenarios:

Language Wars: Developers argue whether Python, Go, or Rust is "best"—ignoring that the answer depends entirely on your domain, team skills, hiring market, and performance requirements.

Clean Code Extremism: Teams wrap every primitive in a Value Object because a book said so—adding hundreds of lines of boilerplate to a throwaway migration script.

"You Are Not Netflix": A three-person startup burns four weeks setting up Kubernetes because "it's how you scale"—never shipping the product that needed to prove market fit first.

Process Bloat: A seed-stage company adopts SAFe with two-week sprints and full ceremonies—while the market shifts weekly and competitors move daily.

Each of these decisions might be right somewhere. They're wrong when blindly applied without asking whether they're worth it in your specific situation.

What WORTH Provides

WORTH is a meta-principle that sits above all other engineering guidelines. It doesn't replace KISS, DRY, or YAGNI—it helps you decide when to apply them.

Think of it as a five-question litmus test you run before adopting any technology, pattern, or practice:

Weigh the problem
Outcome > overhead
Right-sized for the team
Time-to-value
Horizon flexibility

If you can't answer "yes" to at least three of these five questions with clear evidence, don't adopt it yet. Revisit when your context changes.

Who This Framework Serves

Tech Leads & Staff Engineers: Battling over-engineering and decision fatigue while trying to ship reliably.

CTOs & Founders: Controlling cloud spend, reducing talent churn, and ensuring technology choices support business goals.

Senior Developers: Needing guard-rails to push back on dogma without seeming obstinate.

Teams of Any Size: From solo developers to hundred-person engineering orgs—context-aware decision-making scales.

How to Use This Document

This isn't a book you read cover-to-cover and shelve. It's a working framework you return to every time you face a technology decision.

Part I gives you the core principle and scoring methodology
Part II expands the dimensions you need to consider
Part III applies WORTH to specific domains (code, architecture, product)
Part IV shows you how to operationalize it in pull requests, ADRs, and team rituals
Part V provides extended case studies and guidance for specific contexts

If you're a solo developer or indie hacker, skip directly to the Solo Developer / Indie Hacker Mode section (embedded in Section 16) for simplified guidance tailored to one-person projects.

Start with the core framework in Section 2. Use the scorecard in Section 3 on your next decision. Refer back to domain-specific chapters as needed.

2. The WORTH Principle: Core Framework

WORTH is a pragmatic litmus test for every architectural, process, or tooling choice.

The Five Questions

Letter	Question	Signals It's NOT Worth It
W – Weigh the problem	Is the pain concrete, frequent, and business-critical?	Buzz-driven adoption, vague "might need it later"
O – Outcome > overhead	Will this clearly improve customer or developer outcomes more than it costs in time, money, or complexity?	Long payback period, marginal uplift, hidden ops costs
R – Right-sized for the team	Can the current team build and run it confidently within working hours?	Reliance on "future hires", hero culture, steep learning curves
T – Time-to-value	Does it accelerate delivery now or within the quarter you're planning?	Multi-month migrations delaying revenue or feedback
H – Horizon flexibility	Does it keep sensible options open without locking you into tech or vendors?	Irreversible bets, vendor lock-in, explosive surface area

Rule of Thumb

You need BOTH of these to proceed:

Qualitative check: Crisp, evidence-based "yes" on at least three of the five letters
Quantitative check: Total score ≥ 15/25 (see detailed scoring in Section 3)

Why both checks matter:

The "3 of 5" check prevents lukewarm "everything is a 3" scorecards
The total threshold ensures sufficient overall value
Strong yes on 3 dimensions might score [5,5,5,1,1] = 17 ✅ (passes both)
Neutral on all five might score [3,3,3,3,3] = 15 ⚠️ (passes total but lacks clear wins)

If either check fails, don't adopt it yet. Re-evaluate when context changes.

Why WORTH Sits Above Other Principles

KISS tells you to keep solutions simple once you've decided to build them.

DRY avoids duplication inside the chosen solution.

YAGNI warns against speculative features.

WORTH precedes all of them: it ensures you're even solving the right problem at the right scale before optimizing its internal design.

W – Weigh the Problem

Start by quantifying the pain you're trying to solve.

Key Questions:

How many people or systems are affected?
How frequently does this pain occur?
What's the business impact? (Lost revenue, customer churn, developer hours wasted)
Is this pain growing or stable?

Red Flags:

"We might need this later" (speculative)
"Everyone's talking about it" (hype-driven)
"Best practice" without evidence
Problem affects one person once

Good Signals:

Daily friction costing measurable hours
Customer-facing issue impacting retention
Growing technical debt with clear cost trajectory
Regulatory or security requirement

Example: A team debates adopting GraphQL. The current REST API requires frontend developers to make 5-7 calls to render a dashboard. This happens for every page load across 10,000 daily users. That's real, frequent, measurable pain. ✅

Counter-example: A team considers GraphQL because "it's flexible and modern." No evidence of current API causing problems. ❌

O – Outcome > Overhead

Calculate whether the benefit outweighs the total cost.

Key Questions:

What metric improves? (Speed, quality, cost, developer experience)
By how much?
What's the implementation cost? (Calendar time, dollars, cognitive load)
What's the ongoing cost? (Maintenance, operations, training)

Red Flags:

Payback period > 1 year in a fast-changing environment
Marginal improvements (5% faster build when builds take 30 seconds)
Hidden operational costs not factored in
"Hard to quantify" benefits without any proxy metric

Good Signals:

Clear ROI calculation
Benefit appears within current planning horizon
Total cost of ownership (TCO) fully mapped
Net improvement in DORA metrics (deployment frequency, lead time, MTTR, change fail rate)

Example: Migrating to a PaaS costs $400/month but saves 20 developer hours/month on infrastructure management. At $100/hour blended rate, that's $2,000 in value for $400 cost. Clear win. ✅

Regional Cost Note: This framework uses US market rates circa 2024-2025. Adjust for your region:

US/Western Europe: 1.0x (baseline: $100-150/hour blended rate)
SF Bay Area/NYC: 1.5x ($150-225/hour)
Eastern Europe: 0.5x ($50-75/hour)
India/South Asia: 0.3x ($30-45/hour)
Latin America: 0.4x ($40-60/hour)

Recalculate ROI using your actual rates to ensure accurate WORTH scoring.

Counter-example: Adding Kubernetes to reduce deployment time from 5 minutes to 3 minutes, costing 2 weeks of setup and ongoing complexity. Marginal gain, huge overhead. ❌

R – Right-Sized for the Team

Assess whether your team can realistically build and maintain this.

Key Questions:

Do at least two people on the team already understand this technology?
Can the team operate it without heroic effort or on-call burnout?
Does it fit within the team's cognitive load budget?
Can you hire or train for gaps within your timeline?

Red Flags:

"We'll hire someone who knows this"
Single expert (bus factor of 1)
Requires learning 3+ new technologies simultaneously
Team already maxed out on complexity

Good Signals:

Multiple team members have experience
Technology fits existing mental models
Clear documentation and community support
Graceful learning curve

Example: A team of experienced Java developers considers Spring Boot vs. Micronaut. Both fit their skill set and mental models. ✅

Counter-example: Same Java team considers rewriting everything in Rust to "learn something new" despite zero Rust experience and aggressive ship dates. ❌

T – Time-to-Value

Determine if the investment delivers value within your planning horizon.

Key Questions:

When do we see the first benefit?
Does this accelerate or delay current commitments?
What's the opportunity cost of not working on other features?
Can we stage the rollout to get partial value sooner?

Red Flags:

Multi-month migration blocking all feature work
Benefit only appears after full completion (no incremental wins)
Faster alternatives exist but are dismissed as "not best practice"
Timeline keeps extending due to unforeseen complexity

Good Signals:

Value delivered within current quarter
Can be piloted with subset of users/services
Improves team velocity starting now
Clear milestones with measurable progress

Example: Adopting feature flags delivers value immediately—first feature uses flags for safe rollout within one sprint. ✅

Counter-example: Rewriting the entire app to microservices with no customer value until completion 8 months from now. ❌

H – Horizon Flexibility

Evaluate whether this choice keeps options open or creates lock-in.

Key Questions:

How easy is it to reverse this decision?
Does this vendor or technology have a healthy ecosystem?
Can we migrate away without rewriting everything?
Does this create an explosion of surface area (new services, configs, dependencies)?

Red Flags:

Proprietary technology with no export path
Architecture that requires rewrite to change
Vendor with declining market share or community
Creates dependencies across entire codebase

Good Signals:

Standards-based approach (SQL, HTTP, JSON)
Clean interfaces that allow swapping implementations
Strong community, multiple vendors
Reversible via feature flags or strangler pattern

Example: Using PostgreSQL (open standard, multiple hosts, dump/restore works everywhere) vs. proprietary cloud database with custom query language. PostgreSQL wins on flexibility. ✅

Counter-example: Building entire architecture around a startup vendor's beta SDK with no export functionality. ❌

3. Decision Scorecard & Methodology

The Five-Minute Scorecard

Use this lightweight process for any technology, pattern, or practice decision.

Step 1: State the Issue (One Sentence)

Force clarity by writing a single sentence problem statement.

Example: "Release lead-time increased from 3 days to 9 days after adding the reporting service."

Step 2: Score Each Dimension (1-5)

Dimension	Score	Evidence
W – Weigh the problem	1-5	How big, frequent, and critical is the pain?
O – Outcome > overhead	1-5	Does benefit clearly exceed cost?
R – Right-sized team	1-5	Can we build and run this confidently?
T – Time-to-value	1-5	Do we see value within current quarter?
H – Horizon flexibility	1-5	Can we reverse or adapt this choice?

Scoring Guide:

5: Strong yes, clear evidence
4: Probably yes, good indicators
3: Neutral or mixed
2: Probably no, concerns outweigh benefits
1: Clear no, red flags present

Step 3: Calculate Total

Add the five scores:

Total < 15: Postpone or kill
15-20: Prototype or pilot first
21-25: Proceed with confidence

Calibration Note: These thresholds are starting points based on typical team contexts. Adjust for your organization:

Small teams (< 15 people): Consider lowering to 12+/17+/20+ as fewer concrete problems exist at small scale
Large teams (50+ people): Consider raising to 17+/21+/24+ as more problems emerge at scale
Risk-averse domains (fintech, healthcare): Raise thresholds; require higher confidence
Fast-moving startups: Lower thresholds slightly; bias toward action

Re-calibrate quarterly based on decision outcomes. If >30% of "proceed" decisions fail, raise your thresholds.

Step 4: Document in ADR

Record your scoring in an Architecture Decision Record so the rationale is searchable and auditable when context changes.

Worked Examples

Example 1: Kubernetes for a 3-Developer Startup

Problem Statement: "We need to deploy our app somewhere and want it to scale."

Dimension	Score	Evidence
W – Weigh	2	Problem is tiny—current single-server traffic is under 100 users/day
O – Outcome	1	Setup takes 2-4 weeks; benefit is handling 10x traffic we don't have
R – Right-sized	1	No Kubernetes experience on team; steep learning curve
T – Time-to-value	1	Delays MVP by 3+ weeks; could use PaaS and ship this week
H – Horizon	3	k8s is flexible but overkill creates lock-in to complexity

Total: 8/25

Verdict: ❌ Skip. Use a PaaS (Heroku, Render, Railway) or simple VM with CI/CD. Re-evaluate at 10,000+ daily active users or when hiring experienced SRE.

Example 2: Microservices vs. Modular Monolith (Series B Scale-Up)

Problem Statement: "Release collisions between teams slow deployment frequency."

Dimension	Score	Evidence
W – Weigh	4	Three teams frequently blocked waiting for others' deploys
O – Outcome	3	Faster independent deploys, but ops overhead and latency increase
R – Right-sized	3	Team has two SREs with microservices experience
T – Time-to-value	2	Migration takes 4-6 months; could start with modules first
H – Horizon	4	Can phase in gradually; doesn't require full rewrite

Total: 16/25

Verdict: ⚠️ Prototype. Start with a modular monolith using strong boundaries (modules, separate databases). Extract 1-2 services as pilot. Measure DORA metrics. Expand if successful.

Example 3: Custom SQL Abstraction Layer

Problem Statement: "We duplicate the same query patterns in five different files."

Dimension	Score	Evidence
W – Weigh	2	Annoyance but only affects 2-3 developers occasionally
O – Outcome	3	Reduces duplication, increases consistency; minimal maintenance
R – Right-sized	5	Team is SQL-savvy; this is a simple abstraction
T – Time-to-value	4	Can implement in one day; immediate benefit
H – Horizon	4	Easy to remove or replace if we change databases

Total: 18/25

Verdict: ✅ Probably worth it. Low cost, raises consistency, reversible. Proceed if it doesn't block higher-priority work.

Embedding the Scorecard in Your Workflow

Pull Request Template

Add a "WORTH Check" section:

## WORTH Assessment (for non-trivial changes)
- [ ] **W** – What problem does this solve? Is it concrete and frequent?
- [ ] **O** – Does the benefit exceed the complexity added?
- [ ] **R** – Can the team maintain this without heroic effort?
- [ ] **T** – Does this accelerate or delay current goals?
- [ ] **H** – Is this decision reversible or does it create lock-in?

**Total Score**: __/25

Architecture Decision Record (ADR)

Include WORTH scoring in every ADR:

## Context
[Describe the problem]

## WORTH Analysis
| Dimension | Score | Rationale |
|-----------|-------|-----------|
| Weigh | 4 | Daily friction costing 5 dev hours/week |
| Outcome | 4 | Saves `$10k/mo`; costs `$2k` setup + `$500/mo` |
| Right-sized | 3 | Two engineers have experience; third will train |
| Time-to-value | 4 | First value in 2 weeks; full ROI in 3 months |
| Horizon | 5 | Standards-based; multiple vendors available |

**Total**: 20/25 → **Proceed**

## Decision
[Record the choice]

## Consequences
[Expected outcomes]

## Revisit Trigger
Re-score if: team shrinks below 2 experienced operators, cost increases 2x, or migration takes >4 weeks

Quarterly Architecture Review

Rotate a "devil's advocate" who:

Picks one live component
Re-scores it with current context
Proposes simplification if score drops below 15

This catches accumulating complexity and prevents zombie technologies.

Part II: Deep Dive

4. Expanded Dimensions

The five core WORTH questions capture 80% of decisions. For high-stakes choices, expand your evaluation with these additional lenses.

Cost of Delay & ROI Horizon

A choice can be "cheap" but still catastrophic if it postpones revenue or feedback.

Why It Matters: Every week you delay shipping is lost learning, lost revenue, and lost market position. Quantifying this forces trade-offs into the open.

How to Measure:

Express benefit in dollars per month shipped earlier
Use Don Reinertsen's formula: Cost of Delay = (Weekly Value) × (Weeks Delayed)
Plot cash flow over 12-24 months to visualize payback

Example: A feature migration to microservices delays shipping by 3 months. If each month of delay costs $50k in lost revenue, the migration carries a hidden $150k cost even if engineering time is "free."

Diagnostic Questions:

Can we express benefit in $/month?
What's the opportunity cost of delaying other work?
Is the ROI horizon longer than our funding runway?

Cognitive Load & Team Topology

High-skill tools are fine until cognitive load crosses what a single team can safely juggle.

Why It Matters: Teams have limited mental bandwidth. Each new technology, pattern, or service consumes attention that could go toward delivering features.

How to Measure:

Team Topologies suggests teams handle one main responsibility + one interaction mode
Count technologies in the stack (languages, frameworks, databases, queues, deployment platforms)
Survey developers: "Do you feel overloaded?" (1-5 scale)

Example: A team already managing React, Node.js, PostgreSQL, Redis, and AWS considers adding Kafka. If they're already near cognitive limits, adding Kafka might tip them into constant firefighting.

Diagnostic Questions:

Would this push the team above its cognitive budget?
Can we remove something else to make room?
Is the team already in "survival mode" on current stack?

Further Reading:

Team Topologies by Matthew Skelton and Manuel Pais
Martin Fowler's cognitive load article

Delivery Performance (DORA Metrics)

If a decision lowers deployment frequency or raises change-fail percentage, beware: you've traded adaptability for "best practice" theater.

Why It Matters: The four DORA metrics (deployment frequency, lead time for changes, time to restore service, change failure rate) predict organizational performance.

How to Measure:

Run an A/B comparison: pilot branch vs. baseline
Track metrics for 2-4 sprints before and after adoption
Watch for regression in any of the four keys

DORA Four Key Metrics:

Metric	What It Measures	Elite Performance
Deployment Frequency	How often you deploy to production	Multiple times per day
Lead Time for Changes	Time from commit to production	Less than one hour
Time to Restore Service	Time to recover from incidents	Less than one hour
Change Failure Rate	% of deployments causing incidents	0-15%

Note for Non-Continuous Teams: DORA metrics reflect continuous delivery practices. If your team operates on planned release cycles (quarterly enterprise software, regulated deployments), substitute relevant metrics:

Release Quality: Defects per release, customer-reported issues
Feature Velocity: Story points or features delivered per cycle
Customer Satisfaction: NPS, support ticket volume
Incident Rate: Production issues per release

The principle remains: measure delivery effectiveness in your context.

Example: A team adopting microservices sees deployment frequency drop from 5/week to 1/week because coordinated releases require more planning. The architecture might be "better" in theory but worse in practice.

Diagnostic Questions:

How will this affect our DORA metrics?
Can we pilot on one service and measure before scaling?
Are we optimizing for scalability we don't need while sacrificing speed?

Further Reading:

Accelerate by Nicole Forsgren, Jez Humble, and Gene Kim
DORA State of DevOps Reports

Risk & Uncertainty (Real Options Thinking)

Model upside as options, not certainties. Real Options thinking values flexibility over dogma and lets you stage investments.

Why It Matters: Technology decisions under uncertainty are options, not obligations. You can defer commitment, pilot, and learn before full investment.

How to Apply:

Treat decisions as call options with an "exercise price" (cost to fully adopt)
Value the right to defer until you have more information
Stage investments: small pilot, partial rollout, full adoption

Example: Instead of rewriting the entire app in a new framework, extract one module, ship it, measure results, then decide whether to continue.

Diagnostic Questions:

Can we defer this decision via feature flags or pilot lanes?
What would we learn from a small experiment first?
What's the cost of waiting 3 months for more data?

Further Reading:

"Real Options" by Tom Copeland and Vladimir Antikarov
Chris Matts' work on Real Options in software

Strategic Context (Wardley Mapping)

Wardley Mapping keeps you from reinventing commodities (e.g., bespoke logging) while spotting areas worth in-house mastery.

Why It Matters: Not all components deserve equal investment. Commodities should be bought; differentiators should be built and mastered.

How to Use:

Map your value chain from user needs to underlying components
Place each component on the evolution axis: Genesis → Custom → Product → Commodity
Invest in differentiators; buy or use open-source commodities

Example: Custom logging is a commodity—use Datadog or ELK. Your unique pricing algorithm is a differentiator—build and optimize it.

Diagnostic Questions:

Is this component a differentiator or a utility?
Are we building what we should buy?
Are we outsourcing what gives us competitive advantage?

Further Reading:

Wardley Maps by Simon Wardley
learnwardleymapping.com

Bias Gate

Sunk-cost fallacy, resume-driven development, or "that's how FAANG does it" can skew scoring. Force a devil's-advocate review to neutralize bias.

Why It Matters: Cognitive biases make us defend past investments, chase novelty, or copy others without understanding context.

Common Biases:

Sunk-cost fallacy: "We've already spent 2 weeks on this, can't stop now"
Resume-driven development: "I want to learn Rust, so let's use Rust"
Conformity bias: "Google uses it, so we should too"
Availability bias: "I just read about this, so it must be important"

How to Counter:

Ask: "If another team proposed this, would we still agree?"
Rotate the "no" voice—assign someone to argue against the decision
Separate "what I want to learn" from "what the business needs"

Diagnostic Questions:

What bias might be influencing this decision?
Would we make this choice if starting from scratch today?
Are we justifying a conclusion or genuinely evaluating?

Compliance & Security

Regulated domains (health, fintech) may hard-block certain stacks. Map legal constraints early or your WORTH score is fiction.

Why It Matters: Legal requirements trump architecture preferences. Compliance failures carry fines, shutdowns, and reputation damage.

Key Regulations:

HIPAA (health): Encryption, audit logs, breach notification
PCI DSS (payments): Tokenization, network segmentation, logging
GDPR/LGPD (privacy): Data minimization, consent, right to deletion
SOC 2 (enterprise): Security controls, availability, confidentiality

Diagnostic Questions:

Which regulations apply to our domain?
Does this technology meet compliance requirements out-of-box?
What's the audit burden of this choice?

Op-Ex vs. Cap-Ex (Total Cost of Ownership)

TCO often dwarfs initial build effort. Factor on-call load, infra cost curves, licensing, and vendor escalations.

Why It Matters: A "free" open-source tool that requires 40 hours/month of maintenance costs $48k/year at a $100/hour blended rate.

Components of TCO:

Capital: Initial development, training, setup
Operational: Infrastructure, licenses, support contracts
Maintenance: Patches, upgrades, bug fixes
Opportunity Cost: Developer time not spent on features

How to Calculate:

Project 24-month cash flow for "run" costs
Include hidden costs: on-call, debugging, tribal knowledge loss
Compare against alternatives (build vs. buy)

Example: Kubernetes costs zero to download but requires dedicated SRE time (estimate $150k/year salary + overhead). A PaaS costs $10k/year but needs minimal ops attention. True cost comparison: k8s = $150k+, PaaS = $10k.

Environmental & Social Impact

Energy footprint or community support can be differentiators in hiring and PR.

Why It Matters: Green tech attracts talent and customers. Open-source contributions build community goodwill and reduce vendor lock-in.

Considerations:

Data center energy consumption (cloud regions vary 10x in carbon intensity)
Open-source sustainability (are we giving back or just taking?)
Worker conditions (are gig workers treated fairly?)

Diagnostic Questions:

What's the estimated kWh/month?
Is the open-source community healthy?
Does this align with company values?

Market Timing & Competitive Landscape

Early adoption can yield moat-building knowledge or saddle you with a dying ecosystem.

Why It Matters: Bleeding-edge technology can become an asset (early expertise) or a liability (abandoned by vendors).

How to Assess:

Check Gartner Hype Cycle or ThoughtWorks Tech Radar
Monitor job posting volume (proxy for ecosystem health)
Evaluate community momentum (GitHub stars, commits, issues closed)

Diagnostic Questions:

Is this technology in the "trough of disillusionment" or climbing toward mainstream?
Will we build competitive advantage from early mastery?
Is the ecosystem growing or dying?

The Extended WORTH Canvas

For high-stakes decisions, use this expanded canvas:

## Problem Statement
[One sentence + cost of delay]

## Options Considered
1. Baseline (do nothing or minimal change)
2. Incremental tweak
3. Ambitious leap

## Scoring Table
| Dimension | Option 1 | Option 2 | Option 3 | Weight |
|-----------|----------|----------|----------|--------|
| W – Weigh | 3 | 4 | 5 | High |
| O – Outcome | 2 | 4 | 3 | High |
| R – Right-sized | 5 | 4 | 2 | High |
| T – Time-to-value | 5 | 3 | 1 | High |
| H – Horizon | 4 | 4 | 2 | Medium |
| **Core WORTH** | **19** | **19** | **13** | - |
| Cost of Delay | 4 | 3 | 1 | High |
| Cognitive Load | 5 | 3 | 1 | High |
| DORA Impact | 3 | 4 | 2 | High |
| Risk | 5 | 3 | 2 | Medium |
| Strategic Fit | 3 | 4 | 5 | Medium |
| **Extended Total** | **39** | **36** | **24** | - |

## Bias Check
- Option 1: Low risk might mean stagnation
- Option 2: Balanced but requires discipline
- Option 3: Exciting but high failure risk

## Decision
[Chosen option + rationale]

## Kill Criteria
Revert if: [Specific measurable conditions]

## Review Date
[Auto-remind 90 days later]

5. Real-World Application

A Cookbook Workflow

This process works for a one-page ADR or a 50-team program.

Step 1: Snapshot the Pain

Write one sentence. Make it numeric if possible.

Example: "Release lead-time ballooned from 3 days to 9 days after adding the reporting service."

Step 2: Generate Three Options Minimum

Always consider baseline, incremental, and ambitious approaches.

Label	Description	Estimated Effort
A – Baseline	Keep monolith, add read-replica database	2 weeks
B – Incremental	Extract reports into separate process + queue	6 weeks
C – Ambitious	Full microservice split + Kubernetes autoscaling	12 weeks

Step 3: Score Every Option Across WORTH + Extended Lenses

Use 1-5 scale. Mark unknowns as -1 multiplier.

Dimension	Option A	Option B	Option C
W – Weigh	4	4	4
O – Outcome	3	4	3
R – Right-sized	5	4	2
T – Time-to-value	5	3	1
H – Horizon	4	4	2
Core Total	21	19	12

Step 4: Highlight Biggest Sensitivity

Flag critical assumptions.

Example: Option C wins only if we hire two SREs in next 30 days (unlikely). Mark as red flag.

Step 5: Decide + Set Kill Switch

Make the call and define failure conditions.

Decision: Pick Option B (extract reports into separate process).

Kill Switch: If DORA lead time stays above 5 days after 2 sprints, rollback to Option A and revisit.

Step 6: Calendar a Review

Set auto-reminder 90 days later in your ADR system to revisit the score with new data.

6. Practical Examples

Case 1: Three-Developer Bootstrapped SaaS Debates Kubernetes

Context:

€300/month budget
No SRE on team
Goal: Revenue in 90 days

WORTH Outcome: Not worth it

Rationale:

W = 2: Problem is tiny (< 100 users/day)
O = 1: Setup takes 4 weeks, blows budget
R = 1: No k8s experience; steep learning curve
T = 1: Delays shipping by 1 month
H = 3: Flexible but creates complexity lock-in

Total: 8/25

Verdict: ❌ Skip. Use a PaaS (Railway, Render, Heroku). k8s adds 4+ weeks with zero customer value. Re-evaluate at 10,000+ DAU or when hiring experienced SRE.

What They Did: Deployed on Railway for $20/month. Shipped MVP in 2 weeks. Gained 500 paying customers before revisiting infrastructure.

Case 2: Fintech Scale-Up (Series C) Considers Microservices

Context:

40 developers across 4 teams
PCI compliance requirements
Release collisions weekly
Current: Modular monolith

WORTH Outcome: Worth it in stages

Rationale:

W = 4: Real pain—teams blocking each other
O = 3: Faster scaling, but ops overhead increases
R = 3: Two SREs on staff with experience
T = 2: Full migration takes 6 months
H = 3: Can phase in gradually

Total: 15/25

Verdict: ⚠️ Prototype. Start with strongest bounded contexts. Extract payment processing and auth as separate services. Measure DORA metrics. Expand if successful.

What They Did:

Extracted payments service (high isolation, PCI boundary)
Ran for 3 months, measured deployment frequency (+40%)
Extracted auth service
Kept reporting and admin in monolith (low change rate, high coupling)

Result: Hybrid architecture. Teams ship independently where it matters. Avoided microservices sprawl.

Case 3: Amazon Prime Video's Targeted Service Consolidation

Context: In a well-documented case, Amazon Prime Video cut infrastructure costs by 90% by consolidating their video quality monitoring service from a distributed microservices architecture into a single, more efficient process.

Why It's a Key Example: This was not an abandonment of microservices. It was a targeted application of WORTH thinking to a specific component where the distributed approach had become too costly and complex. It proves that even at massive scale, architectural choices are component-specific, not all-or-nothing dogmas. The rest of Amazon's platform remains one of the largest microservice deployments in the world.

Original Problem: Needed to scale quality monitoring across thousands of streams.

Initial Distributed Approach:

Split monitoring into 5+ separate services
Used AWS Step Functions for orchestration
Hit scaling limits and cost overruns for this specific use case

WORTH Re-evaluation:

W = 3: Problem still exists but architecture didn't solve it
O = 1: Infrastructure costs exploded relative to value
R = 2: Operational complexity became unsustainable
T = 1: Development velocity tanked
H = 2: Hard to back out, but they did it anyway

Total: 9/25 (after implementation)

What They Did: Consolidated the monitoring service into a single process. Kept logical module boundaries within the service. Reduced monitoring costs 90%. Increased processing throughput 3x.

Lesson: Revisit WORTH scores yearly. Scale and cost assumptions can flip, making previously "correct" architecture wrong. Also: architectural decisions are component-specific, not all-or-nothing. You can consolidate one service while keeping others distributed.

Further Reading:

Prime Video Tech Blog post on the migration
Discussions on Hacker News and r/programming

Part III: Domain Applications

7. Code Principles: Applying WORTH to Clean Code, SOLID, and Object Calisthenics

WORTH is a meta-filter. Before adopting any coding rule—Uncle Bob's Clean Code, Jeff Bay's Object Calisthenics, SOLID principles—run the five questions.

How WORTH Sits Above Code Principles

If the rule says…	Ask with WORTH…	Worth it when…	Not worth it when…
"Wrap every primitive in a Value Object" (Object Calisthenics #3)	O – Outcome > overhead: Does the extra class reveal real behavior or validation?	Domain-heavy codebases (money, dates, IDs) where bugs are expensive	One-off scripts, data processing, throwaway ETL
"Functions should do one thing" (Clean Code)	R – Right-sized: Do we have time to refactor, and will it help juniors reason faster?	Core modules touched daily, high change rate	Legacy subsystem slated for decommission in 3 months
"No 'else' statements" (Object Calisthenics #2)	H – Horizon flexibility: Does banning else keep options open, or push us into contorted polymorphism?	Code that thrives on polymorphic behavior (strategy engines)	Simple data flows where a guard clause reads clearer
"Depend upon abstractions, not concretions" (SOLID-D)	W – Weigh the problem: Are we actually swapping implementations?	Plugin systems, multi-cloud adapters	One cloud, one database, stable for years

Key Point

The original rule books give general guidance. WORTH makes sure they're cost-effective in your context, today.

Practical Integration

Design Doc Header

## Which Principle Are We Invoking?
Clean Code: "Functions should do one thing"

## WORTH Score
- [ ] W – Weigh: Functions currently 200+ lines, causing 3 bugs/week
- [ ] O – Outcome: Refactor saves 5 hours/week debugging
- [ ] R – Right-sized: Team comfortable with Extract Method pattern
- [ ] T – Time-to-value: Complete in 3 days
- [ ] H – Horizon: Easier to test and modify going forward

**Total**: 4/5 → Proceed

Pull Request Bot

If WORTH ≤ 3 out of 5, tag #possible-over-engineering for reviewer attention.

Retro Audit

Quarterly, pick one principle adopted last quarter. Re-run WORTH. If score dropped (team changed, scope shifted), schedule removal or simplification.

Outcome: Balanced Engineering

Using WORTH with clean code rules prevents two extremes:

Dogma-driven engineering: Blindly applying every rule because "the book says so"
Chaos-driven shortcuts: Rejecting discipline because "we're moving fast"

Instead, every guideline must pay rent in the form of clearer code, faster delivery, or lower defects that outweigh its cognitive and calendar cost.

8. Acronym Analysis: Running WORTH Against Common Principles

STUPID – Six Code Smells to Avoid

What It Is: Singleton, Tight-coupling, Untestability, Premature-optimization, Indescriptive naming, Duplication

WORTH Check: Outcome > overhead – Will refactoring pay back quickly in fewer bugs or faster changes?

Worth It: Core modules touched daily; every hour saved in debugging repays fast

Not Worth It: Throwaway migration script that dies next sprint

CUPID – Joyful Code Properties

What It Is: Composability, Unix-philosophy, Predictability, Idiomatic, Domain-based

WORTH Check: Right-sized for team – Can your devs design tiny, composable units without paralysis?

Worth It: Green-field services where micro-features ship independently

Not Worth It: Legacy blob where carving out composable pieces freezes delivery

FIRST – Test Quality Principles

What It Is: Fast, Independent, Repeatable, Self-validating, Timely

WORTH Check: Time-to-value – Do cleaner tests shorten feedback loops enough to justify extra mocks and fixtures?

Worth It: High-change domains (pricing engines, rules) where hours matter

Not Worth It: Stable ETL job touched twice a year; quick smoke test is cheaper

INVEST – User Story Criteria

What It Is: Independent, Negotiable, Valuable, Estimable, Small, Testable

WORTH Check: Weigh the problem – Is story churn causing rework or developer idle time?

Worth It: Distributed teams sprinting in parallel needing clear boundaries

Not Worth It: Solo developer on hobby app—sticky notes beat fine-grained backlog

SMART – Goal-Setting Framework

What It Is: Specific, Measurable, Achievable, Relevant, Time-bound

WORTH Check: Horizon flexibility – Does locking target and date help focus, or stifle exploration?

Worth It: Delivery roadmaps where dates affect marketing or compliance

Not Worth It: Early R&D spikes where discovery is the whole point

ACID vs. BASE – Database Guarantees

What It Is: ACID (Atomicity, Consistency, Isolation, Durability) vs. BASE (Basically-Available, Soft-state, Eventually-consistent)

WORTH Check: Outcome > overhead – Which failure hurts worse: stale reads or blocked writes?

ACID Worth It: Banking ledger behind a queue

BASE Worth It: Feed counters, analytics dashboards

CAP Theorem – Distributed Systems Trade-offs

What It Is: Consistency, Availability, Partition tolerance; under a network partition, you must choose between Consistency and Availability

WORTH Check: Risk & uncertainty – Which guarantee keeps revenue safest under network splits?

Choose AP (Availability + Partition): Global SaaS where up-to-date cart less critical than staying online

Choose CP (Consistency + Partition): Stock trading engine; tolerate brief downtime for correctness

Using This Table

Name the acronym you're tempted to enforce
Run WORTH questions (+ risk lens if stakes high)
If < 3 clear "yes" answers, drop or postpone

Result: Every principle earns its keep instead of bloating rituals or code.

9. Architecture Decisions: Monolith, Microservices, Infrastructure

Monolith vs. Microservices

This is the most contentious architecture debate. WORTH cuts through the noise.

When Monolith Wins

WORTH Profile:

W = 2-3: Single team, low coordination overhead
O = 5: Simple deployment, low ops cost
R = 5: Fits most team skill sets
T = 5: Ships fast, refactors easily
H = 4: Can extract services later via strangler pattern

Signals:

Team size < 10 developers
Single deployment cadence acceptable
Domain boundaries still emerging
Database transactions simplify consistency

Example: Early-stage SaaS, internal tools, MVP validation

When Microservices Win

WORTH Profile:

W = 4-5: Multiple teams, release collision pain
O = 3-4: Ops overhead justified by independent scaling
R = 3-4: Team has SRE/DevOps expertise
T = 2-3: Migration takes months but unlocks velocity
H = 4: Polyglot flexibility, independent evolution

Signals:

3+ teams needing independent deploy cadence
Clear bounded contexts (payments, auth, catalog, etc.)
Different scaling needs per service
Regulatory boundaries (PCI isolation)

Example: Scale-ups post-Series B, organizations with 40+ engineers

The Hybrid Middle Ground: Modular Monolith

WORTH Profile: Often scores highest (20-23/25)

Characteristics:

Logical modules with strong boundaries
Shared database but separate schemas
Single deployment artifact
Can extract services later when justified

When It's Best: Most of the time, especially for teams in the 10-40 developer range.

Infrastructure Decisions

Bare VM vs. PaaS vs. Kubernetes

Option	W (Problem)	O (Outcome)	R (Team)	T (Time)	H (Horizon)	Total
Bare VM	2 (simple needs)	4 (cheap, fast)	5 (everyone knows)	5 (deploy today)	3 (harder to scale)	19
PaaS	3 (want auto-scale)	5 (instant value)	5 (no ops needed)	5 (deploy today)	4 (vendor lock manageable)	22
Kubernetes	4 (complex needs)	3 (powerful but heavy)	2 (need expertise)	1 (weeks to setup)	5 (max flexibility)	15

Verdict for Most Teams < 50 Developers: PaaS wins (Heroku, Render, Railway, Fly.io)

When k8s Becomes Worth It:

50+ services needing orchestration
Multi-cloud strategy required
Cost optimization at massive scale (> $50k/month infra spend)
Team includes dedicated SREs

10. Common Dogmas Debunked

Dogma 1: "X Is the Best Language"

Why People Repeat It: Tribal pride, blog benchmarks, echo chambers

Reality Check: Match runtime, ecosystem, hiring market, and performance envelope to your needs. Switching costs eclipse eloquence.

WORTH Analysis:

Situation	Language Choice	Why
ML/Data Science	Python	Ecosystem (NumPy, PyTorch, scikit-learn)
Network Services	Go	Concurrency, deployment simplicity
Systems Programming	Rust/C++	Performance, memory safety
Web Backends	Node.js, Python, Ruby, Go	Team skill + ecosystem fit

When It Pays Off: Domains with clear incumbent (Python for ML, Go for network daemons)

When It Hurts: Polyglot hobby projects that quietly become production—ops chaos, talent shortage, fragmented tooling

WORTH Question: Does this language choice improve outcomes more than the cost of switching, hiring, and maintaining two stacks?

Dogma 2: "Code Must Follow Every Clean Code Rule"

Why People Repeat It: Books sell certainty; lint rules give easy metrics

Reality Check: Each rule adds lines, indirection, review time. Measure gains in defect rate or comprehension, not aesthetic purity.

Spectrum:

Too Rigid: 300-line class for a 10-line feature because "SRP demands it"
Too Loose: 2000-line God class doing everything
Middle Ground: Modules with clear responsibilities, reasonable cohesion, pragmatic extraction when pain hits

WORTH Question: Does this clean code practice reduce bugs or speed changes more than the boilerplate costs?

When It Pays Off: Core libraries touched daily where bugs are costly

When It Hurts: Short-lived scripts, migration throwaway code, experiments racing deadlines

Dogma 3: "You Need Kubernetes and Microservices from Day One"

Why People Repeat It: Copying Big Tech success stories without context

Reality Check: k8s ⇒ steep cognitive load; microservices ⇒ extra latency, ops, observability. Ask: "Will a single VM + CI keep us shipping faster for the next 12 months?"

The "You Are Not Netflix" Principle:

Netflix: 1000+ engineers, millions of subscribers, polyglot services
You: 3 engineers, 100 users, single product

WORTH Question: Does the complexity pay off before we run out of runway?

When It Pays Off: Multiple cross-functional teams needing independent deploys, strict uptime SLOs

When It Hurts: Three-dev startup chasing product-market fit with < $1k/month budget

What to Do Instead:

Start with PaaS or single server
Add modules/boundaries in monolith
Extract 1-2 services when pain justifies it
Scale infrastructure when revenue supports it

Dogma 4: "Every Team Needs Heavyweight Process"

Why People Repeat It: Comfort for managers, audit trails for enterprises

Reality Check: Process speed ∝ team size × coupling. Bureaucracy kills flow in small squads.

Spectrum:

Too Rigid: SAFe for 5-person startup (planning ceremonies take longer than building)
Too Loose: No standups, no retrospectives, chaos
Middle Ground: Lightweight rituals scaled to team size

WORTH Question: Does this process improve delivery speed or just create overhead?

Team Size	Appropriate Process
1-3 devs	Kanban board, weekly sync, monthly retro
4-10 devs	Scrum-lite: standups, sprints, retros
10-50 devs	Scrum + architecture review + quarterly planning
50+ devs	Shape Up, SAFe, or custom scaled framework

When It Pays Off: Regulated scale-ups (fintech, health) where compliance beats velocity

When It Hurts: Seed-stage ventures where market changes faster than sprint review cycle

Part IV: Operationalization

11. Integration Points: Embedding WORTH in Daily Workflow

Pull Request Template

Add a WORTH delta checkbox for non-trivial changes:

## WORTH Delta (for changes adding complexity)

Does this change raise our cognitive load score?
- [ ] Adds new technology/framework
- [ ] Increases coupling between modules
- [ ] Requires new operational expertise

If yes to any, WORTH justification required.

**WORTH Score**: __/25 (if applicable)

CI Pipeline Integration

Post DORA metric diffs against the feature branch so reviewers see immediate delivery impact:

# .github/workflows/worth-check.yml
name: WORTH Metrics

on: pull_request

jobs:
  dora-diff:
    runs-on: ubuntu-latest
    steps:
      - name: Calculate lead time diff
        run: |
          # Compare current branch vs main
          # Post diff to PR comment

Visible Metrics:

Build time change
Test duration change
Estimated deployment frequency impact

Backlog Refinement Rule

Every story > 2 days of effort must link to its parent WORTH-scored ADR.

Example:

## Story: Add GraphQL endpoint for dashboard

**Parent ADR**: #147 (GraphQL Migration - WORTH Score 18/25)

**Justification**: Reduces frontend API calls from 7 to 1, improving load time by 40%

Quarterly Operations Review

Rotate a "devil's advocate" who:

Selects one live component or technology
Re-scores it with current context
Proposes kill/simplify plan if score < 15

Example Retro:

Q1 2024: Adopted Redis for caching (score: 19/25)
Q4 2024: Team size halved, cache hit rate 12%, ops burden high
Re-score: 11/25
Action: Remove Redis, use in-memory cache, save $200/month + 5 hours/week ops

12. Anti-Patterns & Failure Modes

Anti-Pattern 1: Resume-Driven Development

Symptom: Technology choice justified by "industry standard" or "I want to learn this"

Why It Happens: Individual career goals override business needs

Guard-rail: Bias gate—mandatory counter-proposal from someone outside the team who argues for simpler alternative

Example:

Proposed: Rewrite in Rust because "it's the future"
Counter: Keep Python, invest saved time in user research
WORTH Score: Rust = 9/25, Python = 21/25
Decision: Stay with Python

Anti-Pattern 2: Cargo-Cult Microservices

Symptom: Dozens of tiny services, yet releases still tied together; DORA lead time unchanged or worse

Why It Happens: Copying Big Tech patterns without understanding context

Guard-rail: Check DORA lead time quarterly. If unchanged after 6 months, merge modules back

Example:

Split monolith into 15 services
Deployment frequency: Before 5/week → After 1/week
Lead time: Before 2 days → After 5 days
Action: Consolidated 10 services back into 3 larger modules
Result: Deployment frequency back to 4/week

Anti-Pattern 3: Hero Ops

Symptom: One expert owns critical infrastructure (k8s, database); vacations freeze deploys

Why It Happens: Hiring/training gaps; over-reliance on individual

Guard-rail: WORTH "Right-sized" must score ≥ 3 → at least two on-call capable developers

Example:

k8s managed by single SRE
SRE leaves company
6 weeks of deployment freeze while team ramps up
Prevention: Require pair rotation, documentation, training budget

Anti-Pattern 4: Analysis Paralysis

Symptom: Extended WORTH scoring sessions lasting days; no decision made

Why It Happens: Perfectionism, fear of commitment, unclear decision-maker

Guard-rail: Time-box WORTH canvas to 2 working days maximum. Default to lowest-cost prototype if still undecided.

Example:

Team debates database choice for 3 weeks
Intervention: "We'll pilot PostgreSQL for 2 sprints. If it doesn't work, we switch. Decision made, moving on."

Anti-Pattern 5: Gaming the Score

Symptom: Scores are artificially inflated or manipulated to justify a predetermined outcome. This nullifies the framework's purpose, turning it into a tool for validating biases rather than challenging them.

How Teams Game the Score:

Inflating scores: Marking unknowns or subjective opinions as 5/5 without evidence.
Cherry-picking evidence: Ignoring contradictory data that would lower a score.
Skipping the bias gate: Not having a neutral party or "devil's advocate" review the scoring.
Ignoring re-scoring: Never revisiting past decisions to validate the original score against real-world outcomes.

Guard-Rails:

Require concrete evidence for any score of 4 or 5.
Rotate the "devil's advocate" or "no" voice for bias checks.
Post scorecards publicly within the engineering team for peer review.
Calendar auto-reminders for quarterly re-scores of major decisions.

13. Measurement & Validation: Proving WORTH Works

Metrics That Matter

Track these to validate WORTH-driven decisions:

1. DORA Four Keys

Metric	How WORTH Helps
Deployment Frequency	T (Time-to-value) ensures choices accelerate shipping
Lead Time for Changes	O (Outcome) prevents overhead that slows delivery
Time to Restore	R (Right-sized) ensures team can operate without heroics
Change Failure Rate	W (Weigh) focuses on real problems, not speculative fixes

How to Measure: Tools like Sleuth, LinearB, or custom scripts parsing git + deploy logs

2. Cognitive Load Surveys

Quarterly developer survey:

Rate 1-5 (1 = low, 5 = overloaded):

1. I feel overwhelmed by the number of technologies I need to know
2. I can debug production issues without asking for help
3. I understand how most of our systems work
4. I have time to learn new skills

Score < 12 = Healthy
Score 12-16 = Warning
Score > 16 = Overloaded (simplify stack)

3. Decision Cycle Time

Track time from "proposal" to "decision made":

Before WORTH: Median 3 weeks of debate
After WORTH: Median 3 days with scorecard

Goal: Reduce decision paralysis while maintaining quality

4. Rollback Rate

What percentage of decisions get reversed within 6 months?

Too High (> 30%): Scoring too optimistically; need more rigor
Too Low (< 5%): Maybe too conservative; missing opportunities
Healthy (10-20%): Acceptable learning rate

5. Cost Metrics

Track infrastructure and operational costs quarterly:

## Q3 2024 Infra Costs

| Category | Cost | Change vs. Q2 |
|----------|------|---------------|
| Cloud (AWS) | `$8,200` | -15% (removed unused k8s cluster) |
| SaaS Tools | `$3,400` | +5% (added monitoring) |
| On-call Hours | 40 hrs | -30% (simplified stack) |

**WORTH Impact**: Killed over-engineered service mesh (score dropped to 9/25). Saved `$1,500/month` + reduced on-call burden.

Example: Tracking One Decision

Decision: Migrate from REST to GraphQL for dashboard API

Initial WORTH Score: 17/25

W = 4 (7 API calls per page load)
O = 4 (expect 40% faster loads)
R = 3 (team learning GraphQL)
T = 3 (2-sprint migration)
H = 3 (can revert via feature flag)

Measurement Plan:

Week 0: Baseline page load time (2.1s), API call count (7), developer velocity (8 stories/sprint)
Week 4: Pilot on one dashboard
Week 8: Full rollout
Week 12: Re-measure

Results:

Page load: 1.3s (38% faster) ✅
API calls: 1 (86% reduction) ✅
Developer velocity: 6 stories/sprint (25% slower) ❌
On-call incidents: +2 (query complexity bugs) ❌

Re-score WORTH: 14/25 (dropped from 17)

O = 3 (speed gain offset by velocity loss)
R = 2 (team still struggling)

Action: Pause further rollout. Keep GraphQL for existing dashboards but don't expand. Re-evaluate in 6 months after team upskills.

14. Operationalizing the Framework

To make WORTH a team-wide habit, it needs to be embedded in the daily rituals of software development. The goal is to make the five questions an automatic, low-friction part of decision-making.

Start Small

Pick one pending decision (technology, architecture, process).
Fill out the WORTH scorecard with the team (15 minutes max).
Share the result and make a call with explicit kill criteria.
Set a 90-day reminder to re-score the decision with real data.

Repeat this process. Over time, asking "Is it worth it?" becomes muscle memory, helping your team stop chasing hype and start compounding value.

Part V: Extended Applications

15. Case Studies: WORTH in Practice

Case Study 1: Bootstrapped SaaS - Railway Over Kubernetes

Context:

2 developers (full-stack)
€300/month runway
MVP targeting small businesses
Goal: Revenue within 90 days

Pain Point: Need to deploy app; considering infrastructure options.

Options:

Option	Setup Time	Monthly Cost	Ops Burden
A: DigitalOcean VM	1 day	€20	Medium (manual deploys)
B: Railway PaaS	2 hours	€20-40	Minimal
C: Kubernetes (EKS)	2-3 weeks	€150+	High

WORTH Scores:

Dimension	VM	Railway	k8s
W – Weigh	3	3	2
O – Outcome	4	5	1
R – Right-sized	4	5	1
T – Time-to-value	5	5	1
H – Horizon	3	4	5
Total	19	22	10

Decision: Railway (score: 22/25)

Rationale: Maximize time-to-value. Can ship MVP this week. Budget-friendly. Easy migration path if growth demands it.

Implementation:

Deployed in 3 hours
Auto-deploy on git push
First customer within 2 weeks

6-Month Outcomes:

500 paying customers
$12k MRR
Infrastructure cost: $85/month
Zero downtime incidents
WORTH re-score: 23/25 (H improved from 4 to 5 based on proven migration path; could now scale to k8s if needed)

Lessons Learned:

PaaS removed all infrastructure distraction
Delayed optimization was correct call
Now at scale where k8s might be justified (revisiting at $50k MRR)

Case Study 2: Fintech Scale-Up - Hybrid Microservices

Context:

40 developers, 4 teams
Series C funded
PCI DSS compliance required
Modular monolith causing deployment conflicts

Pain Point: Teams blocking each other 2-3 times/week waiting for deployment windows. Lead time increased from 3 days to 7 days.

Options:

Option	Migration Time	Complexity	Team Impact
A: Tighten module boundaries	1 month	Low	Minimal
B: Extract 2-3 services (payments, auth)	3-4 months	Medium	2 teams affected
C: Full microservices (10+ services)	8-12 months	High	All teams disrupted

WORTH Scores:

Dimension	A: Modules	B: Hybrid	C: Full
W – Weigh	3	4	4
O – Outcome	3	4	3
R – Right-sized	5	4	2
T – Time-to-value	5	3	1
H – Horizon	4	5	4
Total	20	20	14

Decision: Option B (Hybrid) - score tied with A, but better horizon flexibility tipped decision

Rationale:

PCI compliance easier with separate payments service
Auth used by all teams, high coupling risk
Keep reporting, admin, backoffice in monolith (low change frequency)

Implementation:

Month 1-2: Extract payments service (clear PCI boundary)
Month 3: Measure: deployment frequency +30%, lead time -1 day
Month 4-5: Extract auth service
Month 6: Stabilize, measure

6-Month Outcomes:

Deployment frequency: 2/week → 6/week
Lead time: 7 days → 3.5 days
Change failure rate: 12% → 15% (slight increase during transition)
Infrastructure cost: +$800/month (acceptable)
Team satisfaction: +25% (survey)

WORTH Re-score: 21/25 (validated)

Lessons Learned:

Hybrid architecture was sweet spot
Didn't extract everything—kept stable domains in monolith
PCI isolation made compliance audit 50% faster
Now template for future service extraction

16. Blind Spots Addressed

Dogma: "TDD Is Mandatory or You're Reckless"

Why It's a Dogma: TDD can add indirection and slow exploratory work; even its creators debate universal value.

Reality Check: TDD shines for:

Safety-critical code (billing engines, medical devices)
Long-lived domains with clear requirements
Refactoring existing code

TDD struggles for:

Green-field spikes exploring problem space
Data science notebooks
One-off migration scripts

WORTH Application:

Context	W	O	R	T	H	Total	Verdict
Billing engine	5	5	4	3	5	22	✅ TDD worth it
One-off migration	2	1	3	1	3	10	❌ Post-hoc tests sufficient

Dogma: "Go All-In on Serverless"

Why It's a Dogma: "Never think about servers again!" marketing obscures trade-offs.

Reality Check:

Wins: Spiky traffic, infrastructure cost tied to usage, zero idle costs
Loses: Cold-start latency, opaque debugging, vendor lock-in

WORTH Application:

Scenario	Cold Start Acceptable?	Debugging Needs	Lock-In Risk	Verdict
Image resize API	Yes (async)	Low	Low	✅ Lambda
Real-time chat	No	High	High	❌ PaaS/VM

Rollback Story: A startup moved to Lambda for all APIs. Debugging production issues took 5x longer (no SSH, CloudWatch delays). After 4 months, moved critical path back to EC2, kept async jobs on Lambda. Hybrid won.

Dogma: "Event-Driven Architecture Is Always Better"

Why It's a Dogma: Loose coupling sounds great in theory.

Reality Check:

Wins: Multiple consumers, async workflows, audit trails
Loses: Traceability nightmares, idempotency complexity, eventual consistency headaches

WORTH Decision Matrix:

Use Case	Consistency Needs	Traceability	Latency	Verdict
Order processing	Strong	High	Low	❌ Request/response + DB transaction
Analytics events	Eventual OK	Medium	High OK	✅ Event stream

Dogma: "GraphQL Fixes Every API Problem"

Why It's a Dogma: Flexible queries sound universally better than REST.

Reality Check:

Wins: Rich clients querying complex object graphs, mobile apps with bandwidth constraints
Loses: N+1 query problems, complex caching, hard to secure joins, overkill for CRUD

WORTH Sidebar:

API Pattern	When to Use
REST	Simple CRUD, public APIs, need HTTP caching
gRPC	Internal services, performance-critical, typed contracts
GraphQL	Rich client queries, reducing round trips, mobile apps

Other Lenses

Security & Compliance Guard-Rails

Problem: WORTH can score "yes" but legal requirements veto the choice.

Solution: Add compliance checklist before scoring:

## Compliance Gate (complete BEFORE WORTH scoring)

- [ ] PCI DSS requirements (if handling payments)
- [ ] HIPAA requirements (if handling health data)
- [ ] GDPR/LGPD (if handling EU/Brazil personal data)
- [ ] SOC 2 controls (if selling to enterprises)
- [ ] Threat model reviewed (if internet-facing)

If ANY are blocked, decision is vetoed regardless of WORTH score.

Data Modeling & Storage

Why It Matters: SQL vs. NoSQL wars rival language wars in heat/lack of nuance.

WORTH Application:

Storage Type	When It Wins	When It Loses
PostgreSQL (ACID)	Relational data, strong consistency, complex queries	Massive write throughput, schemaless flexibility
MongoDB (BASE)	Schemaless flexibility, horizontal scaling	Complex joins, strong consistency needs
Redis	Caching, sessions, pub/sub	Primary data store, durability critical
NewSQL (CockroachDB)	Need both ACID + horizontal scale	Low-write workloads, budget-constrained

WORTH Question: Does our data access pattern justify the operational complexity of this database?

Observability: How Much Is Enough?

Dogma: "Full OTEL stack or you're flying blind"

Reality: Observability has costs (infrastructure, cognitive load, vendor bills)

WORTH Spectrum:

Stage	Logging	Metrics	Tracing	APM	Cost
MVP (< 1000 users)	Structured logs (free tier)	Basic (uptime, errors)	None	None	$0-50/mo
Growth (1k-50k users)	Centralized (ELK/Datadog)	RED metrics	Critical paths	Optional	$200-1k/mo
Scale (50k+ users)	Full OTEL	Full dashboard	Distributed	Full APM	$2k+/mo

WORTH Question: Does the observability investment reduce MTTR enough to justify the cost?

People Dogmas

Dogma 1: "Only hire seniors"

Reality: Balanced teams (seniors + mid-level + juniors) often outperform all-senior teams due to:

Cost efficiency
Diverse perspectives
Knowledge transfer culture
Hunger/growth mindset

WORTH Question: Does hiring exclusively seniors improve delivery more than the 2x salary cost?

Dogma 2: "10x engineer myth"

Reality: "10x teams" exist; "10x solo engineers" are rare and create bus-factor risk.

WORTH Question: Are we building for individual heroics or team resilience?

Solo Developer / Indie Hacker Mode

Blind Spot: Most books ignore one-person projects.

WORTH Solo Mode:

Default Answers for Solo Devs:

W: Only solve pains you feel daily
O: Optimize for shipping, not elegance
R: Stick to technologies you already know
T: MVP in days/weeks, not months
H: Avoid vendor lock-in (you can't negotiate)

One-Page Solo Canvas:

## Solo Dev WORTH Checklist

- [ ] Do I actually need this, or is it "best practice" theater?
- [ ] Can I build and run this alone in < 1 week?
- [ ] Will this help me ship faster or just add complexity?
- [ ] Can I reverse this in a weekend if it doesn't work?

If < 3 checks pass, skip it.

Appendices

Appendix A: WORTH Canvas (One-Page Template)

# WORTH Decision Canvas

**Date**: ___________  **Decision-Maker**: ___________  **Reviewer**: ___________

## Problem Statement (one sentence)
[What pain are we solving?]

**Cost of Delay**: $______/week OR _______ hours/week

---

## Options Considered

| Option | Description | Estimated Effort |
|--------|-------------|------------------|
| A (Baseline) | | |
| B (Incremental) | | |
| C (Ambitious) | | |

---

## WORTH Scorecard (1-5 scale, unknown = -1)

| Dimension | A | B | C | Weight |
|-----------|---|---|---|--------|
| **W** – Weigh the problem | | | | High |
| **O** – Outcome > overhead | | | | High |
| **R** – Right-sized team | | | | High |
| **T** – Time-to-value | | | | High |
| **H** – Horizon flexibility | | | | Medium |
| **CORE TOTAL** | | | | |

### Extended Lenses (optional, high-stakes only)

| Dimension | A | B | C |
|-----------|---|---|---|
| Cost of Delay | | | |
| Cognitive Load | | | |
| DORA Impact | | | |
| Compliance | ✅/❌ | ✅/❌ | ✅/❌ |

---

## Bias Check

**Potential biases influencing this decision**:
- [ ] Sunk-cost fallacy
- [ ] Resume-driven development
- [ ] Conformity ("everyone's doing it")
- [ ] Availability bias

**Counter-argument for preferred option**: ___________

---

## Decision

**Chosen**: Option ____

**Rationale**: ___________

**Kill Criteria** (revert if):
1. ___________
2. ___________

**Review Date**: ___________ (90 days from decision)

---

## Signatures

**Approved**: ___________ (Tech Lead)
**Reviewed**: ___________ (Devil's Advocate)

Appendix B: Decision Tables Consolidated

Table 1: Common Dogmas vs. Reality

Dogma	Reality Check	Pays Off When	Hurts When
"X is the best language"	Match runtime, ecosystem, hiring market to needs	Domain has clear incumbent (Python-ML, Go-network)	Polyglot chaos; ops fragmentation
"Follow every Clean Code rule"	Weigh defects saved vs. boilerplate added	Core libraries touched daily	Throwaway migration scripts
"Start with k8s & microservices"	Cognitive load check; outcome > overhead	Multi-team, strict SLO scale-ups	3-dev startup pre-PMF
"Heavyweight process for all"	Time-to-value; process ∝ team size	Regulated enterprises	Seed-stage pivoting weekly
"TDD everywhere"	Test value vs. exploration speed	Safety-critical long-lived code	Green-field spikes, notebooks
"Serverless cures ops"	Cold-start vs. observability trade-off	Spiky traffic, budget tied to usage	Steady low-latency workloads
"EDA > request/response"	Traceability vs. loose coupling	Multi-consumer async workflows	Single-team ACID transactions
"GraphQL over REST"	Query flexibility vs. caching complexity	Rich client queries, mobile	Simple CRUD, public APIs

Table 2: Acronyms Through WORTH Lens

Acronym	Core Idea	WORTH Check	Worth It	Not Worth It
STUPID	Six code smells to avoid	Outcome > overhead	Daily-touched modules	One-off scripts
CUPID	Joyful code properties	Right-sized team	Green-field micro-features	Legacy blob refactor
FIRST	Fast, independent tests	Time-to-value	High-change domains	Yearly ETL jobs
INVEST	Story slicing criteria	Weigh the problem	Parallel distributed teams	Solo dev hobby projects
SMART	Goal-setting framework	Horizon flexibility	Compliance-driven roadmaps	R&D exploration spikes
ACID/BASE	DB guarantee trade-offs	Outcome > overhead	Banking (ACID), Social feeds (BASE)	-
CAP	Distributed systems trade	Risk lens	Trading (CP), SaaS cart (AP)	-

Table 3: Architecture Decision Matrix

Pattern	When It Wins	WORTH Red Flags
Monolith	Single team, emerging domain, rapid iteration	Team > 15, deployment conflicts, different scaling needs
Modular Monolith	10-40 devs, clear modules, shared DB acceptable	Need polyglot, PCI isolation, independent scaling
Microservices	40+ devs, multiple teams, clear bounded contexts	< 10 devs, single team, shared transactions common
Serverless	Spiky traffic, async jobs, cost ∝ usage	Real-time requirements, deep debugging needs

Table 4: Infrastructure Choice Matrix

Option	Setup Time	Monthly Cost	Ops Burden	Flexibility	Best For
Bare VM	1 day	$20-100	Medium	Low	Solo dev, learning, static sites
PaaS	2 hours	$20-500	Minimal	Medium	Startups, small teams, rapid iteration
Kubernetes	2-4 weeks	$200+	High	Very High	50+ services, multi-cloud, large teams

Appendix C: Additional Dogmas Deep-Dive

Infrastructure as Code (IaC)

Dogma: "Everything must be IaC (Terraform, Pulumi) or you're unprofessional"

Reality Check: IaC shines in multi-environment estates (dev, staging, prod × 3 regions = 9 configs). For a single VM, manual terraform apply can be heavier than the work itself.

WORTH Application:

Infrastructure Scale	IaC Worth It?	Why
1-2 VMs	❌ No	Manual setup faster; fewer changes
3-10 servers, multiple envs	✅ Yes	Reproducibility saves time; disaster recovery
50+ resources, multi-region	✅ Definitely	Manual impossible; drift prevention critical

Feature Flags for Every Toggle

Dogma: "Feature flags everywhere for continuous deployment"

Reality Check: Flags multiply configuration drift and kill readability when product has < 100 users and tight feedback loops.

WORTH Application:

Stage	Flag Complexity	Worth It?	Rationale
MVP (< 100 users)	High (10+ flags)	❌ No	Just deploy and monitor
Growth (1k-10k users)	Medium (targeted rollouts)	✅ Yes	Gradual rollout reduces risk
Scale (100k+ users)	Managed (kill-switch only)	✅ Definitely	Instant rollback critical

Daily Stand-ups & Two-Week Sprints

Dogma: "Agile = daily standups + two-week sprints, non-negotiable"

Reality Check: Async tools (Linear, Tuple, Loom) yield same visibility for remote micro-teams without synchronous overhead.

WORTH Application:

Team Situation	Synchronous Standups	Async Updates
Co-located team, 5-10 devs	✅ 15-min standup works	-
Remote team, < 5 devs, tight communication	❌ Overhead > value	✅ Daily Slack updates
Remote team, 10+ devs, coordination heavy	✅ Yes, but 2-3x/week	✅ + async for off-days

Long-Running Release Branches

Dogma: "Only merge via long-running release branches (Git Flow)"

Reality Check: Trunk-based development with small PRs often halves merge conflicts and lead time.

WORTH Application:

Deployment Frequency	Git Flow	Trunk-Based
Monthly releases	✅ Acceptable	-
Weekly releases	⚠️ Painful merges	✅ Better flow
Daily+ releases	❌ Bottleneck	✅ Essential

Domain-Driven Design (Full Bounded Contexts)

Dogma: "DDD means full bounded-context everything with event storming and aggregates"

Reality Check: Heavy DDD artifacts pay off in large, domain-rich organizations—not in simple CRUD SaaS v1.

WORTH Application:

Domain Complexity	Full DDD	Lite DDD (just bounded contexts)
Simple CRUD (admin panel, CMS)	❌ Overkill	✅ Basic modules enough
Medium (e-commerce, SaaS)	⚠️ Selective use	✅ Yes, skip event sourcing
Complex (fintech, healthcare)	✅ Justified	-

Appendix D: Glossary & Further Reading

Glossary

ADR (Architecture Decision Record): Document capturing important architectural decisions with context, options, and rationale.

DORA Metrics: Four key metrics (deployment frequency, lead time, MTTR, change failure rate) predicting software delivery performance.

Cognitive Load: Mental effort required to understand and operate a system; teams have finite capacity.

Cost of Delay: Economic impact of postponing value delivery; calculated as (weekly value) × (weeks delayed).

PaaS (Platform as a Service): Cloud hosting that abstracts infrastructure (Heroku, Railway, Render).

Real Options: Financial concept applied to software: valuing the right to defer decisions until more information is available.

TCO (Total Cost of Ownership): Full lifecycle cost including development, operations, maintenance, and opportunity costs.

Wardley Mapping: Strategic planning tool showing component evolution from genesis to commodity.

Essential Reading

Books

Accelerate by Nicole Forsgren, Jez Humble, Gene Kim The science behind DORA metrics and high-performing teams
Team Topologies by Matthew Skelton, Manuel Pais Organizing teams for fast flow and cognitive load management
The Principles of Product Development Flow by Donald Reinertsen Cost of Delay, queuing theory, and economic decision-making
Wardley Maps by Simon Wardley Strategic context and component evolution
Working Effectively with Legacy Code by Michael Feathers Pragmatic refactoring without dogma
A Philosophy of Software Design by John Ousterhout Complexity management and design trade-offs

Online Resources

DORA Research: dora.dev
Team Topologies: teamtopologies.com
Wardley Mapping: learnwardleymapping.com
ThoughtWorks Tech Radar: thoughtworks.com/radar
Martin Fowler's Blog: martinfowler.com

How to Contribute

This framework improves through real-world application. Share your WORTH decisions:

Template:

## Decision: [Title]
**Context**: [Team size, domain, constraints]
**WORTH Score**: __/25
**Outcome**: [What happened]
**Lesson**: [What you'd do differently]

Submit case studies and feedback to: github.com/jeansossmeier/worth-framework

Conclusion

The Core Message

Great engineering isn't about accumulating principles. It's about consistently choosing the simplest solution that delivers outsized value in your context.

WORTH turns that judgment into a repeatable, team-wide habit by asking five questions:

Is the pain real and business-critical? (Weigh)
Does benefit exceed cost? (Outcome)
Can the team operate it? (Right-sized)
Does it deliver value soon? (Time)
Does it keep options open? (Horizon)

If you can't answer "yes" to at least three with clear evidence, don't adopt it yet.

What Makes WORTH Different

Not another best practice: WORTH sits above principles, helping you decide which to apply when.

Context-aware: What works for Netflix doesn't work for your three-person startup. WORTH forces explicit context.

Measurable: Scorecards, DORA metrics, and TCO calculations replace gut feelings with evidence.

Iterative: Quarterly re-scoring catches when "right" becomes "wrong" as context shifts.

Actionable: Templates, checklists, and automation make it operational, not theoretical.

Start Tomorrow

Pick one pending decision (technology, architecture, process)
Fill out the WORTH scorecard (5 minutes)
Share with your team for bias check
Make the call with explicit kill criteria
Set 90-day reminder to re-score with real data

That's it. One decision. Five questions. Repeat.

Over time, asking "Is it worth it?" becomes muscle memory. Your team stops chasing hype and starts compounding value.

Final Thought

The best engineers don't memorize frameworks. They ask better questions.

Is it worth it?

Everything else follows from that.

Go build something worth building.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
_config.yml		_config.yml

jeansossmeier/worth-framework

Folders and files

Latest commit

History

Repository files navigation

WORTH: A Framework for Context-Driven Technology Decisions

Table of Contents

Part I: Foundation

Part II: Deep Dive

Part III: Domain Applications

Part IV: Operationalization

Part V: Extended Applications

Appendices

Part I: Foundation

1. Introduction: The Problem with Technology Dogmas

The Missing Question

Why Context Matters More Than Dogma

What WORTH Provides

Who This Framework Serves

How to Use This Document

2. The WORTH Principle: Core Framework

The Five Questions

Rule of Thumb

Why WORTH Sits Above Other Principles

W – Weigh the Problem

O – Outcome > Overhead

R – Right-Sized for the Team

T – Time-to-Value

H – Horizon Flexibility

3. Decision Scorecard & Methodology

The Five-Minute Scorecard

Worked Examples

Example 1: Kubernetes for a 3-Developer Startup

Example 2: Microservices vs. Modular Monolith (Series B Scale-Up)

Example 3: Custom SQL Abstraction Layer

Embedding the Scorecard in Your Workflow

Pull Request Template

Architecture Decision Record (ADR)

Quarterly Architecture Review

Part II: Deep Dive

4. Expanded Dimensions

Cost of Delay & ROI Horizon

Cognitive Load & Team Topology

Delivery Performance (DORA Metrics)

Risk & Uncertainty (Real Options Thinking)

Strategic Context (Wardley Mapping)

Bias Gate

Compliance & Security

Op-Ex vs. Cap-Ex (Total Cost of Ownership)

Environmental & Social Impact

Market Timing & Competitive Landscape

The Extended WORTH Canvas

5. Real-World Application

A Cookbook Workflow

Step 1: Snapshot the Pain

Step 2: Generate Three Options Minimum

Step 3: Score Every Option Across WORTH + Extended Lenses

Step 4: Highlight Biggest Sensitivity

Step 5: Decide + Set Kill Switch

Step 6: Calendar a Review

6. Practical Examples

Case 1: Three-Developer Bootstrapped SaaS Debates Kubernetes

Case 2: Fintech Scale-Up (Series C) Considers Microservices

Case 3: Amazon Prime Video's Targeted Service Consolidation

Part III: Domain Applications

7. Code Principles: Applying WORTH to Clean Code, SOLID, and Object Calisthenics

How WORTH Sits Above Code Principles

Key Point

Practical Integration

Design Doc Header

Pull Request Bot

Retro Audit

Outcome: Balanced Engineering

8. Acronym Analysis: Running WORTH Against Common Principles

STUPID – Six Code Smells to Avoid

CUPID – Joyful Code Properties

FIRST – Test Quality Principles

INVEST – User Story Criteria

SMART – Goal-Setting Framework

ACID vs. BASE – Database Guarantees