Judgment Was Always the Bottleneck

Mar 08, 2026

This is Article 5 in a series on building with AI tools as a non-native technologist.

There’s a popular narrative being told right now about vibe coding, and it goes roughly like this: coding used to be the bottleneck. Now it isn’t. The new bottleneck is judgment and taste. If you can think clearly about what to build, the tools will handle the rest.

It’s a compelling narrative. But it only tells half the story.

Not because judgment doesn’t matter — it does, enormously, and I’ll spend most of this article on why. But because the framing implies that judgment is new to the equation. That before vibe coding, the separating factor between great products and mediocre ones was mainly technical execution.

My re-frame is simpler: judgment was always the bottleneck. Vibe coding just made it visible.

What Actually Changed

Things did change, substantially. Claude Code and similar tools collapsed the barrier to entry for certain categories of development work in a way that prior no-code tools never quite managed. The prototype-to-production gap is narrower than it was. The floor of what a non-engineer can ship went up substantially.

But here’s the disconnect: we’re confusing a change in what we can see as a change in what was present.

Before widespread vibe coding, poor product judgment was largely contained inside institutions. It existed — and produced plenty of mediocre features and failed products — but it was wrapped inside teams at companies, hidden behind proprietary systems, filtered through organizational process. You didn’t see it at scale in the open.

Now you do. Thousands of people are shipping publicly, often free, often building in the open. So a much higher volume of things to look at means a much higher volume of poorly crafted things to look at. The ratio of strong products to weak ones was probably always similar. What changed is that you now have a massive lens on all of it, versus a microscope pointed mostly at what large companies chose to release.

There’s a marketing layer on top of this too. “I’m a builder” has become the new “I’m a disruptor” — a buzzword with similar dynamics to disruption a decade ago. When everyone’s shipping, people pay attention to what gets shipped. The scrutiny goes up alongside the volume. Both effects make judgment failures more visible without changing the underlying rate.

The story isn’t “judgment matters now.” It’s “judgment failures are harder to hide now.”

Judgment Isn’t New — It Was Always Load-Bearing

To understand why, it helps to break down what building something actually requires. At the highest level, three things: knowing what to build, knowing how to build it, and getting it to market.

The first category — what to build — means understanding the problem you’re solving, for whom, and conceptually how. When I built the nature access map, I was my own first user. I knew the problem in depth because I’d lived it: I wanted to understand, by neighborhood, which areas of Seattle had parks and trails within walking distance before committing to a move. That gave me a clear problem, a clear user, and a clear mental model of what a useful solution would look like.

The second category — how to build — means going from that concept to something tangible: mock-ups, prototypes, and eventually working code. This is where the LLM tools have changed things most dramatically. Historically, you couldn’t get to a working prototype at all without substantial technical fluency. Now, for many categories of work, you can describe what you want in plain English and get something functional back.

The third category — distribution and commercialization — means actually getting the product in front of users and, if you’re building something commercial, making money from it.

What changed is the second category. What didn’t change is the first and the third. And the first category — judgment about what to build, for whom, and why — was always the harder problem. It’s just that when the second category was blocking most people entirely, the first category’s difficulty was invisible. You can’t expose a judgment problem in something that never gets built.

I’ve spent a decade in legaltech, turning contracts into structured data for enterprise customers. The engineering was real — machine learning models, complex document processing pipelines, large-scale data extraction and classification, and increasingly GenAI-powered tools. But the decisions that actually determined whether the product worked were almost never purely engineering decisions. Was the model accuracy on this clause type sufficient for the risk appetite of an in-house counsel or M&A attorney? Which edge cases would appear infrequently enough that we could accept them, versus which would surface in exactly the situations that mattered most? How did you explain a 92% accuracy rate to a lawyer in a way that mapped to how they actually processed risk?

Those were judgment calls. They required domain knowledge, pattern recognition, and the ability to translate between technical realities and human contexts. No amount of better tooling would have made them easier. The people who made them well were valuable long before vibe coding existed.

The Complexity Matrix No One Is Talking About

Part of why the popular narrative feels limiting is that it collapses a genuinely complex set of distinctions into a single claim.

Whether vibe coding meaningfully changes the engineering bottleneck depends entirely on where you are across multiple dimensions at once:

Internal tool vs. external product. An internal tool built for yourself has radically different tolerances — for rough edges, for edge cases, for performance under load. When I built the nature access map, it ran locally, served one user, and if it broke I fixed it. That’s a completely different problem than a SaaS product with uptime requirements, paying customers and users who aren’t you.

Prototype vs. production. The gap between something that works in a demo and something that works under real conditions — concurrent users, unexpected inputs, evolving requirements — remains enormous, and LLMs haven’t closed it. Prior no-code tools produced a high ratio of prototypes to production-ready products for real reasons. There’s an honest open question about whether this generation finally crosses that threshold. I think the answer is “maybe, in limited contexts, but we don’t fully know yet.”

Simple vs. complex use case. Some design patterns have been solved thoroughly enough that you can implement them at a raised level of abstraction and they’ll work reliably. Others haven’t. New interfaces — AR, voice, novel interaction paradigms — haven’t accumulated the solved patterns that web and mobile have built up over decades. What worked on desktop didn’t always translate to mobile. What works now won’t automatically translate to whatever comes next. Every new interface introduces substantial new variables and trade-offs.

Regulated vs. unregulated. Financial services, legal, healthcare — these domains introduce constraints that compound complexity in ways that aren’t just about technical difficulty. The cost of certain failure modes is qualitatively different. The judgment required to navigate those trade-offs is domain-specific in ways that can’t be substituted by general technical fluency.

Scale. A few users is categorically different from a few thousand or a few million. Engineering judgments that produce acceptable outcomes at small scale often produce catastrophic ones at large scale. Latency barely noticeable to one user becomes a UX crisis under load. Data pipelines that work fine on small datasets break in ways that are hard to predict without depth.

The claim that “coding is no longer the bottleneck” may be true in the simplest corner of this matrix — internal tools, prototype-grade, unregulated, small scale. It becomes progressively less true as you move in any direction.

Engineering Judgment Is Still Judgment

There’s a subtler problem buried in how “judgment” is being used right now: it’s being treated as a single thing when it’s actually several.

Product judgment — understanding user problems, prioritizing correctly, knowing when good enough is good enough — that’s one kind. Design judgment — taste in visual hierarchy, interaction patterns, information density — that’s another. Engineering judgment is another — and it doesn’t disappear because LLMs can write code.

Engineering judgment is knowing that a given API’s rate limits will create a bad user experience under realistic load, before you find out by watching it fail. It’s recognizing that a context window limitation will cap what’s possible for your use case, and structuring your approach accordingly. It’s knowing when the cost of using an LLM makes it the wrong tool for a particular subtask, even if it would technically work.

When I was building the nature access map and hit a frontend rendering bug that sent Claude into circles — each fix creating a new problem — what broke the loop wasn’t better prompting. It was recognizing that we needed browser developer tools to surface actual error information, rather than continuing to diagnose from assumptions. That was a judgment call, made by a human, that the LLM hadn’t arrived at on its own. And this was a simple, single-user, locally-run application. The engineering complexity scales from here fast.

You can’t decouple engineering decisions from user experience decisions, or from business decisions. The people who understood this — who made proactive choices about downstream needs before being asked, who proposed building test infrastructure before it became obviously necessary, who recognized when a technically feasible solution would create maintenance debt that wasn’t worth it — were always valuable because of those judgment behaviors. They just happened to express them through engineering work.

Vibe coding doesn’t make that less relevant. It reduces the coordination costs of separating judgment from execution across different people. That’s genuinely useful. It doesn’t synthesize the judgment.

The Economics of It

When the cost of development decreases, the value of complementary capabilities increases. This is basic economics — cheaper peanut butter raises demand for jelly — and it explains why judgment, taste, and domain expertise are getting more attention right now.

But the implied corollary — that the previous bottleneck disappears — doesn’t follow. It becomes less constraining for simpler cases. It remains fully present for complex ones. And new bottlenecks emerge in places previously masked by the old one.

When I built the nature access map, part of what made it worth building was that the search cost of finding a tool that answered my specific question exceeded the development cost of building it myself. That math used to reliably go the other direction. Now it sometimes flips. That’s a real shift — but one that made domain knowledge more load-bearing, not less. The development cost dropped. The judgment requirements didn’t.

There’s a second economic dynamic worth naming. When anyone can build, the supply of apps proliferates rapidly. At some point supply exceeds demand. And in a buyer’s market, buyers have two levers: pay less for the same thing, or demand higher quality. These are in tension — as quality requirements increase, fewer people can meet them, which changes pricing again. But the net effect is that “good enough” stops being sufficient in the same way it was when supply was scarce. When you were one of a few people who could ship anything at all, the bar was low by necessity. When thousands of people are shipping similar things, the baseline shifts. The judgment required to clear it doesn’t go away. It goes up.

The Open Question

We’ve had “low-code” and “no-code” tools for years. Their history is consistent: useful for prototypes, insufficient for production, high ratio of demos to shipped products. They lowered the floor without raising the ceiling. People built more things, most of which didn’t make it.

The open question is whether this generation of tools — Claude Code and its successors — finally crosses that threshold. Whether the production gap will actually be closed for a meaningful range of use cases in a way it wasn’t before.

My honest answer is: maybe. Given enough time, probably, for some categories of use cases and some levels of complexity. But we don’t fully know yet, and the people making the strongest claims — in either direction — are probably overstating their case.

What I do know is that the judgment question isn’t new. It was always the hard part. The fact that more people can now see it — because more people can now build things and see the results — doesn’t mean it recently emerged from new tools.

It was always there, hidden in the gap between shipping and shipping something worth using.

This is Article 5 in a series on building with AI tools as a non-native technologist — covering my real experiences and observations building products as someone without a formal engineering background. Previous articles covered financial planning with Claude Code, a framework for city selection, using LLM tools to help with a cross-country move, and building the Seattle Nature Access Map.

Discussion about this post

Ready for more?