Execution Is Cheaper, Comprehension Is Not

A lot is happening in AI-assisted software development, justifying another analysis of recent high-profile accounts that describe engineers shipping more code, delegating implementation to agents, and experiencing what feels like “a step-change” in productivity.¹ ² Projects like OpenClaw³ (formerly Clawdbot/Moltbot) show how quickly always-on personal agents can be stood up and connected to real channels and credentials, expanding execution surface area far faster than teams can verify or secure it. Meanwhile Moltbook⁴ illustrates what happens when autonomous agents operate at scale in a shared environment: interaction and output explode, but oversight and control become the binding constraint.

At the level of personal experience, these reports are actually often accurate. However, we still maintain that what is changing currently is not the fundamental cost structure of software engineering, but the asymmetry of cheaper execution, with less comprehension and potentially increasing verification cost.

Large language models function as extremely capable execution engines with weak epistemology ⁵. They perform well when success criteria are explicit, scope is local, and correctness can be checked. They do not reliably understand why systems are shaped the way they are, which constraints are essential, or when the correct action is to remove rather than add. That distinction explains both the genuine acceleration people are experiencing and why it does not automatically translate into durable, system-level productivity gains.

Bounded Execution vs. System Ownership

We agree that LLMs dramatically reduce the cost of bounded work. Translating intent into code, scaffolding features, porting logic, iterating until tests pass, and exploring solution space. When the task is well-specified and the blast radius is contained, leverage is real.

This is the regime described in Andrej Karpathy’s widely circulated notes on his shift to Claude-driven coding, where he characterizes the change as a “phase shift” in his personal workflow¹. Similar patterns appear in Boris Cherny’s public discussion of how the Claude Code team operates, including extremely high PR throughput and near-total reliance on model-generated code². These accounts are describing bounded execution, not system ownership and also miss the cost of production on the one side and integration, verification and maintenance on the other.

Owning a system over time means maintaining a coherent mental model as it evolves, preserving invariants across unrelated changes, keeping APIs legible, deleting code safely, and ensuring that future changes are cheaper rather than more fragile. These are epistemic tasks and they depend on understanding. All places where costs have not fallen.

Decreasing Production Costs with Verification Costs Unchanged

LLMs have clearly reduced production cost. Time-to-first-implementation is down. The marginal cost of adding code is lower. More ideas cross the threshold of “worth trying”.

Verification cost, however, has not declined in parallel. Review, integration, debugging, security analysis, performance validation, and long-term maintenance remain expensive. In mature systems, they dominate total cost.

Lowering production cost without lowering verification cost does not produce linear gains. It allows more work to pile up behind the same constraint. This is our core argument made in the previous analysis Agentic AI Raises the Floor More Than the Ceiling⁶, which observes that apparent speedups often vanish once review and ownership are included. Metrics such as PR count, lines of code generated, or “100% AI-written code” primarily measure activity rather than outcomes.

Comprehension Debt

What accumulates instead is not just traditional technical debt, but comprehension debt. Comprehension debt forms when artifacts are produced faster than humans can internalize them. Behavior may be correct, but rationale is thin, implicit, or lost in prompts. Diffs are locally reasonable but globally surprising. The “why” exists only in the head of the prompter, or nowhere at all. Reviews slow down because reviewers must reconstruct intent. Debugging becomes harder because the state space is larger and less intentional. Ownership weakens because understanding does not compose across changes. LLM-heavy workflows amplify this dynamic by decoupling code creation from human comprehension at the moment of creation.

Addition as the Default Gradient

Given uncertainty, an agent will add layers rather than collapse concepts, wrap existing code instead of removing it, preserve dead paths “just in case,” and introduce abstractions that look clean locally. This behavior is locally rational for a system with weak epistemology: deletion is risky when invariants are poorly understood, while addition is safer. Simplicity is not additive. It is achieved by removing structure while preserving behavior. That requires global judgment and confidence about invariants—neither of which has become cheaper. Absent sustained human effort, complexity grows.

Evidence From Learning and Comprehension

The most direct empirical support for this picture comes from a controlled study by Anthropic researchers examining developers learning a new async Python library (Trio) with and without AI assistance ⁷. The results are difficult to reconcile with pure acceleration narratives:

No significant average speedup from AI assistance
Roughly 17% worse performance on conceptual understanding, debugging, and code reading tasks among AI-assisted participants
A clear tradeoff: participants who delegated more to AI sometimes finished faster, but learned less

This matters because onboarding, framework adoption, and architectural transitions are exactly where organizations are most fragile. Accelerating output while degrading understanding shifts cost forward into ownership and recovery.

Anecdotes Persist

Public narratives extrapolate from best-case local wins, capability demos, and subjective feelings of speed. They underweight distributed costs borne by reviewers, operators, and future maintainers. Most importantly, they conflate personal productivity with organizational productivity. An individual engineer can feel dramatically faster while the system they work on becomes harder to understand and more expensive to own. Both can be true at the same time. The anecdotes do not settle the question because they are not measuring the binding constraint.

What Matters (And Always Has)

Most recommendations for engineering teams aren’t new. The fundamentals of building durable software and organizations haven’t changed, but the cost of violating them has gone down.

We can now automate work that is high-volume and low-risk, and keep humans in the loop where errors are costly or irreversible. Systems still need clear ownership, automated or not, and an accountable human when they fail.

Invest in verification, not just production. Strong, high-level test suites, invariant checks, and observability are what make speed safe. Faster generation otherwise simply increases fragility.

The hardest rule of them all has always been to favor simplicity. Complexity compounds faster than output, and systems that are easy to understand are cheaper to change, review, and operate over time. Be especially careful with incidental complexity introduced by AI tools.

Design clean interfaces. Small, composable APIs and well-defined boundaries reduce coordination costs and make both humans and machines more effective.

Document what is durable. Processes, decision rights, escalation paths, and risk thresholds matter more than transient specs or implementation details. Ideally in text and mermaid charts, so LLMs can leverage them too.

Treat prototypes as disposable and production as sacred. Trying things should be cheap; shipping them can be cheap only if they’re greenfield or have strong test suites.

None of this is new. But the speed at which consequences show up when these principles are ignored has increased at a staggering rate.

The Middle Ground

LLMs are powerful execution engines. They materially expand what is feasible in bounded contexts with strong success criteria. That leverage is real and durable. Execution, however, was never the dominant constraint in mature systems. Understanding was—and remains. Until tools are biased toward simplicity or deletion, rather than expansion, toward preserving epistemic clarity rather than merely producing locally correct behavior, the fundamental economics of software engineering will not shift in a sustained way. That is the middle ground: acknowledging genuine leverage without mistaking it for systemic relief.

Andrej Karpathy, “A few random notes from Claude coding quite a bit last few weeks,” X (Twitter), Jan 27, 2026. https://x.com/karpathy/status/2015883857489522876 ↩︎ ↩︎
Boris Cherny, reply to nullmirror discussion on Claude Code usage and productivity, X (Twitter), Jan 2026. https://x.com/bcherny/status/2015979257038831967 ↩︎ ↩︎
OpenClaw, https://github.com/openclaw/openclaw ↩︎
Moltbook, https://www.moltbook.com/ ↩︎
By ‘weak epistemology’ we mean that LLMs are weak at knowing what is actually true as they lack reliable grounding, self-verification, and truth awareness. They confidently produce false or ungrounded statements. ↩︎
nullmirror, Agentic AI Raises the Floor More Than the Ceiling (Jan 19 2026) ↩︎
Anthropic, Judy Hanwen Shen, Alex Tamkin, How AI Impacts Skill Formation, 28 Jan 2026. https://arxiv.org/abs/2601.20245 ↩︎