Skip to content

Commit 79fe7a6

Browse files
Minor text updates
1 parent b8238f2 commit 79fe7a6

1 file changed

Lines changed: 22 additions & 21 deletions

File tree

src/lib/content/posts/productivity.md

Lines changed: 22 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: LLMs and performative productivity
33
date: '2026-06-05'
4-
updated: '2026-06-05'
4+
updated: '2026-06-08'
55
categories:
66
- opinion
77
- personal
@@ -22,7 +22,7 @@ At first, when I began to use agents day-to-day, I was blown away by their capab
2222

2323
Armed with this newfound power, I accomplished a flurry of tasks I either wasn't capable of, or didn't have the time for previously:
2424

25-
- At work, I could get up and running in new codebases without asking for help, and could contribute to them much more easily
25+
- At work, I could get up and running in new codebases without asking for help, and could contribute to them much more easily_especially_ when they were in programming languages I wasn't familiar with
2626
- I got some projects updated, moved, or refactored, all in record time (_including a particularly gnarly Nuxt upgrade I'd been putting off for years, and that would've taken me days of work, done in about an hour_)
2727
- I added several new features to a handful of apps here and there that I wouldn't have otherwise
2828
- I scaffolded new things and built out greenfield projects in record time
@@ -33,7 +33,7 @@ That all sounds fantastic, of course. It _felt_ fantastic.
3333

3434
But when I got done with all that, I had to wonder: **could I really call any of that _productive_**?
3535

36-
- At work, I didn't understand the codebases I was working in, and though I was contributing to them, I gained no real context about them. I was opening PRs, but I couldn't really defend what was in them, or say whether or not they worked with the system. I was constantly afraid I'd messed something up without realizing it
36+
- At work, I didn't understand the codebases I was working in, and though I was contributing to them, I gained no real context about them. I was opening PRs, but I couldn't really defend what was in them, or say whether or not they worked with the system. I was constantly afraid I'd messed something up without realizing it, and learned virtually nothing about the unfamiliar languages
3737
- Most of the other updates weren't really _needed_ per se, and afterwards, the apps themselves weren't any different; the changes just made me feel good, while making little to no difference on the user side of the software
3838
- The new features I built were neat, but they weren't actually being used
3939
- The greenfield projects were quickly abandoned
@@ -42,15 +42,15 @@ But when I got done with all that, I had to wonder: **could I really call any of
4242

4343
<CalloutPlusQuote>
4444

45-
I had mainly just checked off a bunch of old to-dos, most of which hadn't gotten done previously because they never mattered that much in the first place.
45+
I had mainly just checked off a bunch of old to-dos, most of which were unfinished because they never mattered that much in the first place.
4646

4747
</CalloutPlusQuote>
4848

4949
And even where they _did_ matter, I paid a cost for doing more, faster. I added a bunch of abandoned side-projects to the old pile, but unlike before, I didn't even come away with any new skills or experience.
5050

5151
If anything, it seemed like I knew _less_ than before.
5252

53-
Maybe the code improved, but I sure didn't.
53+
Maybe the codebase improved, but I sure didn't.
5454

5555
And that was all in the _best-case_ scenarios, where the agent worked well. Other times, I'd spend so long prompting and re-prompting it would've just been faster to do the work myself in the first place—but by that point, of course, I was so deep in the hole it seemed easier to just keep digging.
5656

@@ -138,8 +138,7 @@ All of those studies take different approaches, but there are a few common threa
138138

139139
- **LLM productivity benefits are highly situational**. LLMs excel at straightforward, time-consuming tasks. They're great at boilerplate and greenfield projects. And, they help less-experienced coders a lot more than experienced ones. The more you go outside that sweet spot, the less benefit there is.
140140
- **There's a pronounced gap between perception and reality**. This reaffirms my experience. LLM users _feel_ like the tool is doing much more for them than it actually is when measured objectively.
141-
- **Even where the gains are real, they come at a cost**. Several of the studies above (and others, in other fields) have confirmed LLM output is generally lower quality, in various ways. While it may be reasonable to think that particular gap is closing, there's another, even more concerning penalty:
142-
- **LLM usage inhibits cognition and understanding**. Which makes sense, of course; you can't expect to be ready for the game if you skip practice every day. Your comprehension of the system comes mainly from all those small, everyday touchpoints. If you outsource those, you quickly lose context and develop [cognitive debt](https://www.media.mit.edu/publications/your-brain-on-chatgpt/).<footnote>Since gains are most pronounced among novice developers, this creates a concerning catch-22: juniors have the most to gain from LLM usage, but those gains threaten to keep them reliant on the technology.</footnote>
141+
- **Even where the gains are real, they come at a cost**. Several of the studies above (and others, in other fields) have confirmed LLM output is generally lower quality, in various ways. While it may be reasonable to think that particular gap is closing, there's another, even more concerning penalty: LLM usage inhibits cognition and understanding. Which makes sense, of course; your comprehension of the system comes mainly from small, everyday touchpoints. If you skip practice every day, you won't be ready for the game. And if you outsource your chance to speak the language, you quickly lose context and develop [cognitive debt](https://www.media.mit.edu/publications/your-brain-on-chatgpt/).<footnote>Since gains are most pronounced among novice developers, this creates a concerning catch-22: juniors have the most to gain from LLM usage, but those gains threaten to keep them reliant on the technology.</footnote>
143142
- **Most studies so far have only measured productivity at the individual level, and in a vacuum**. Measurement tends to begin at authoring code and end at merging a PR. Rarely is a broader view, where impact is measured across an organization and over time, even attempted. But where it is, positive impacts tend to evaporate.
144143

145144
This last point might be the biggest takeaway, in my mind.
@@ -170,41 +169,41 @@ Whether LLM code is as good as human code is partially load-bearing here. After
170169
But that's actually only a small part of the overall question. There's much more to _actual productivity_ that we're (perhaps deliberately) overlooking right now.
171170

172171

173-
### Productivity only starts with "PR"
172+
### Productivity only starts with a "PR"
174173

175174
Obviously, LLMs can write code extremely quickly; nobody denies that. If sheer volume of code is how you measure productivity, the LLM wins hands-down.
176175

177176
But I don't think anybody who's ever worked in any real production context would agree absolute lines of code is a useful proxy for productivity. (In fact, not so long ago, we mostly agreed the opposite was often true, and _fewer_ lines of code was often the superior signal.)
178177

179178
Similarly: it doesn't really matter _how many_ PRs you're opening or merging, if you're not taking into account the _quality_ of the code they contain—as any Open Source maintainer will tell you. There have probably never been more PRs opened, and yet, the average quality has likely never been lower.
180179

181-
I think it's fair to say LLM-produced code is _not_ always as good as human code, for a few reasons:
180+
At this point, it's fair to say LLM-produced code is _not_ always as good as human code, for a few reasons:
182181

183182
- For one: the studies above confirm it; they overwhelmingly point to a reduction in the quality and reliability of the code LLMs generate, relative to human control groups. Maybe that changes in the future, but it seems to be the truth for now, at least.<footnote>I have yet to encounter anyone who says LLM code is as good as human code, *and who also* reads all the code their LLM produces. Seems like mostly the people who believe it are taking it on faith.</footnote>
184183

185184
- For another: LLMs were trained on average code, and thus generally have average outputs.
186185

187186
- For a third: LLM output is non-deterministic, and while that may not matter in many cases, it means any given implementation may be different every time. You either believe all implementations are essentially equal (which seems unreasonable), or you believe that matters.
188187

189-
- But mostly: a human will inevitably have a more comprehensive understanding of the organization, the team, the problem space, the history, the users, and so on. An LLM's context window is only so wide, and it's unlikely to reliably account for all of those things that may exist entirely _outside_ the codebase and in the real world. Best-case: a human will need to actively provide all that context, and that's not a scalable approach.
188+
- But mostly: a human will inevitably have a more comprehensive understanding of the organization, the team, the problem space, the history, the users, and so on. An LLM's context window is only so wide, and it's unlikely to reliably account for all of those things that may exist entirely _outside_ the codebase and in the real world. Best-case: a human will need to actively provide all that context, and that's not a very scalable approach.
190189

191190
At this point, LLM enthusiasts might argue that humans make mistakes, too. It's not as though we've ever been perfect, either.
192191

193192
And that's fair. We've all messed up. Most of us have taken prod down at one point or another.
194193

195194
But I have two responses to that:
196195

197-
1. **Nobody treats human code with such indifference**. I've never once, in all the hundreds and hundreds of PRs I've opened, had anyone express such low expectations of me, or had my mistakes with such blasé detachment. So this is an obvious double standard.<footnote>Very similar to how companies will not tolerate a human support agent lying to customers in the slightest, but will happily ignore an LLM chatbot that does the same thing.</footnote>
196+
1. **Nobody treats human code with such indifference**. I've never once, in over a decade of writing code, had anyone express such low expectations of me, or had my mistakes with such blasé detachment (no matter how fast I made them). So this is an obvious double standard.<footnote>Very similar to how companies will not tolerate a human support agent lying to customers in the slightest, but will happily ignore an LLM chatbot that does the same thing.</footnote>
198197

199-
2. **Mistakes are how humans learn**. When something goes wrong in our code, that we wrote, there's a benefit; we discovered something about our codebase that made us wiser. We gained resilience. We leveled up. We probably helped other people learn along with us, too.
198+
2. **Mistakes are how humans learn**. When we make something go wrong, there's a benefit; we discovered something about our codebase that made us wiser. We gained resilience. We leveled up. We probably helped other people learn along with us, too.
200199

201200
<CalloutPlusQuote>
202201

203202
A junior who made a mistake is one step closer to being a senior; a junior who let an LLM make a mistake (and had the LLM fix it for them) has probably learned nothing.
204203

205204
</CalloutPlusQuote>
206205

207-
Some might also argue the reduction in quality is worth the bump in speed, which I suppose may be reasonable in some cases (but not all).
206+
Some might also argue the reduction in quality is worth the bump in speed, which I suppose may be reasonable in some cases (but certainly not all).
208207

209208
But never mind that; let's set aside code quality for a minute.
210209

@@ -219,7 +218,7 @@ Writing code generally isn't what slows teams down, and has never really been th
219218

220219
**The job is so much more than that**. There's endless judgment, communication, and discernment that goes into the work. (And it feels like we all knew that, not so long ago.)
221220

222-
It's evaluating different approaches and weighing tradeoffs. It's talking to the right people on five different teams to make everyone's in alignment. It's figuring out if what you're building is actually the right implementation of the right solution. And no matter how fast you can churn out code, _you can't skip past that part_.
221+
It's evaluating different approaches and weighing tradeoffs. It's talking to the right people on five different teams to make everyone's in alignment. It's figuring out if what you're building is actually the right implementation of the right solution. It's _design_. And no matter how fast you can churn out code, _you can't skip past that part_.
223222

224223
Besides: PRs need to be reviewed, don't they? (Please say they need to be reviewed.)
225224

@@ -255,9 +254,11 @@ This is probably why so many vibe-coded apps are abandoned or left to rot nearly
255254

256255
It's probably also why, even though it's trivial to [slop-fork](https://www.slopfork.dev/) any software you want, most people don't seem to be doing it: because the moment you do, the maintenance and updates become _your_ problem.
257256

258-
Is _your_ team prepared to shepherd the code, when it proliferates beyond human scale? Are you planning for the maintenance to increase in proportion with the throughput?
257+
Is _your_ team prepared to shepherd the code, if it proliferates by an order of magnitude beyond its current scale? Are you planning for the maintenance to increase in proportion with the throughput?
259258

260-
What bugs and unforeseen side effects are hiding in the code that you haven't found yet? What happens if and when _those_ grow exponentially along with output?
259+
What bugs and unforeseen side effects are hiding in the code that you haven't found yet?
260+
261+
What happens if (when) _those_ grow exponentially along with output?
261262

262263
You likely don't know the answer to any of those questions yet, because they can often take weeks, months, or even years to be revealed. You don't fully know how well the work will hold up over time until it's actually, well, _held up over time_. (Or not).
263264

@@ -281,7 +282,7 @@ When you're headed in the wrong direction, speed isn't an asset; it's a liabilit
281282

282283
</CalloutPlusQuote>
283284

284-
We've seen an exponential explosion in the amount of software created over the past couple of years, but outside of AI itself, there doesn't really seem to be much change in what people are using and relying on day-to-day. Disregard the AI industry—which is largely circular, and propped up almost entirely by venture capital—and I don't really see much that's changed in software in general in the last few years.
285+
We've seen an exponential explosion in the amount of software created over the past couple of years, but outside of AI itself, there doesn't really seem to be much change in what people are using and relying on day-to-day. Disregard the AI industry itself—which is largely circular, and propped up almost entirely by venture capital—and I don't really see much that's changed in software in general in the last few years.<footnote>In fact, I think you could make a compelling case that AI has actually _stagnated_ software, rather than accelerated it. I genuinely can't think of any major new app, feature, product, or improvement from the last 3–5 years that doesn't crudely amount to shoving AI into something that existed already. In some cases this has been undeniably useful, but in many—if not _most_—it's just unwanted noise.</footnote>
285286

286287
It appears we're building more than ever, but that doesn't seem to correlate with any noticeable uptick in meaningful metrics like adoption, as far as I can tell. I have a theory why this might be:
287288

@@ -328,13 +329,13 @@ One of the most effective ways they do this is by making you *feel* productive,
328329

329330
</CalloutPlusQuote>
330331

331-
Crucially: consciously knowing this does *not* change your susceptibility, any more than knowing you're a heroin addict makes you immune to opioids. That's why objective, quantitative measurement, with a holistic definition, is so important.
332+
Crucially: consciously knowing this does *not* change your susceptibility, any more than knowing you're a heroin addict makes you immune to opioids. That's why objective, quantitative measurement, with a holistic definition of productivity, is so important.
332333

333334
You probably won't even notice all the creeping technical and cognitive debt as it weighs you down, because by that point, you're most likely not thinking of it in those terms.
334335

335336
It's so difficult to spot the downsides of LLM usage, because we're psychologically inclined to _feel_ that initial positive burst, and to ignore the dozens of tiny paper cuts that follow—even when they've bled the original gains away, drip by drip.
336337

337-
But even if you _do_ see it happening, notice **your incentives are all pointing in the wrong direction by that point**. Now that parsing the code is much harder than it would've been before (because you wrote none of it), sunk cost pushes you further down the path of least resistance.
338+
But even if you _do_ see it happening, notice **your incentives are all pointing in the wrong direction by that point**. Now that parsing the code is much harder than it would've been before (because you wrote none of it), and now that maintenance is harder (because you have much more code to maintain), sunk cost pushes you further down the path of least resistance.
338339

339340
Faced with the decision to start all over and do things a better way, or just press the button one more time to apply another layer of patch code you never read and don't understand, while staring down an ever-increasing backlog, all the inertia is pushing you further down the same path that got you here.
340341

@@ -346,7 +347,7 @@ The less you understand, the more you trust AI. But the more you trust AI, the l
346347

347348
</CalloutPlusQuote>
348349

349-
Throughout this whole process, however, you'll probably still _feel_ incredibly productive—even when the data would suggest you're lying to yourself.
350+
Throughout this whole process, however, you'll probably still _feel_ incredibly productive—even when the data would suggest you're lying to yourself—because it's tough to tell the difference between being busy and being productive if you never take a step back and measure.
350351

351352
But you probably don't measure, if you're in this deep. Because you probably trust how you _feel_ too much to believe reality could possibly contradict you.
352353

@@ -391,7 +392,7 @@ But we're not; we're outsourcing all of those to AI, too.
391392

392393
Why is anybody going to care about your company when everything about it is exactly the same homogenous AI output every other company has?
393394

394-
The main winners in a gold rush are the ones selling pickaxes, and it sure seems to me like the token vendors are about the only ones who really stand to gain from most of this, by any coherent, holistic definition of productivity.
395+
The main winners in a gold rush are the ones selling pickaxes, and it sure seems to me like the token vendors are about the only ones who really stand to gain from most of this, if we're adhering to any coherent, holistic definition of productivity.
395396

396397
But if (when) this whole bubble comes crashing down and token costs skyrocket: will you still have any idea what's going on in your codebase?
397398

0 commit comments

Comments
 (0)