2025 Learnings on LLMs and Software Work

2025 suggests that the gap between expectations and measured outcomes around LLMs in software engineering is no longer subtle. Large investments and confident narratives imply broad productivity gains or partial task replacement. The evidence, however, points us to something more constrained, perhaps more interesting: We see that LLMs are changing where effort lives, but are not removing effort itself.

A central observation is that LLMs compress the front of software work. Scaffolding, boilerplate, pattern reuse, and early experimentation are undeniably faster. This creates rapid visible progress and a strong subjective sense of acceleration. Teams reach something that works quickly, which is valuable in itself, especially in exploratory or product-facing contexts.

What does not (yet) shrink is the long tail. The work that dominates total delivery time is verification, integration, correctness, reliability, and accountability, which we see as remaining stubbornly human. In some settings it becomes heavier, because the human is now validating code they did not fully author and may not fully understand. Time saved in typing is often re-spent in inspection, debugging, and reconciliation with real-world constraints. ¹

These observations also explain why results diverge so sharply across contexts. When tasks are bounded, reversible, and tolerant of approximation, LLMs often reduce total time. When tasks are context-heavy, invariant-rich, or high-consequence, the net effect is mixed and sometimes negative. The difference is not frontend versus backend so much as fast feedback versus delayed failure. Where failure is visible and cheap, LLMs fit naturally. Where failure is hidden and expensive, verification dominates and gains evaporate.

An underappreciated effect is also familiarity loss. A large part of “writing code”, is how engineers build mental models. When large portions of a system are produced via a model, that implicit understanding is naturally thinner. The cost shows up later, when changes are required under pressure. This helps explain why developers often feel faster while objective throughput does not improve.

A broader takeaway is that LLMs turn software delivery into an inspection and control problem. The bottleneck shifts from construction to trust. Deciding what to ask for, proving that the output matches intent, and ensuring it will not violate system invariants over time. Most teams, however, still operate processes optimized for manual construction. Without strong automated checks, clear interfaces, and high-signal feedback loops, faster generation simply feeds more work into the same verification bottlenecks.

We see the current tools as valuable, but their returns are uneven and highly dependent on workflow redesign rather than raw model capability. Outside vendor and investor narratives, the emerging consensus among researchers and senior practitioners is cautious. Local productivity gains exist, general productivity transformation does not yet.

Our position is not that LLMs don’t work, but that software engineering was never primarily about writing code. Engineering is about managing uncertainty, risk, and long-lived responsibility. LLMs make the easy parts easier (which is fantastic) and expose, rather than eliminate, where the real cost has always been.

Whether future systems materially change that balance remains an open question. What is already clear is that treating code generation speed as the primary metric misses the point.

And with that, we close the 2025 blog series. Thank you for reading along this year. Looking forward to what 2026 brings!

See Short Model Horizons Revisited ↩︎