The Instruction Gap: Following AI Output Is Now the Skill That Matters Most

David Borish
31 minutes ago
7 min read

The instruction gap by David Borish — The Instruction Gap: Following AI Output Is Now the Skill That Matters Most

How I Found the Gap

I came to this paper the way most useful questions come: by running into a problem I could not explain and refusing to drop it.

Earlier this year I finally started working through a backlog of projects I had been putting off for months. A few old side builds that had stalled, partly from time constraints and partly because the implementation work was tedious enough that I kept finding reasons to defer it. Claude Code changed that calculation fast. Within days I was making more progress than I had in months of earlier attempts, and the experience was close enough to Andrej Karpathy's vibe coding description from early 2025 that the term felt apt.

But something else happened alongside the progress. A pattern emerged in my workflow that I could not initially diagnose. Claude Code was producing solid output. The code was correct. The instructions were clear. The builds were failing anyway. I was introducing errors in the setup sequence that had nothing to do with the quality of what the model generated.

I was skipping steps, partially executing configuration tasks, misreading which field a value belonged in, and then spending time debugging something that was never broken to begin with. Once I started watching for it, I saw the same pattern in how developers talk about their tooling, in how enterprises describe failed AI implementations, in the forums where people troubleshoot builds that should have worked. The field had spent two years optimizing how people talk to AI systems and almost nobody was examining what happens when those systems talk back. That observation became this paper.

The Prompting Era and What It Missed

The conversation about AI skills in 2023 was almost entirely about prompting, and that made sense at the time. The gap between a well-constructed prompt and a vague one was large enough to observe directly. Better prompts produced better output, the connection was immediate and visible, and entire consulting practices, online courses, and job titles were built around the idea that prompt engineering was the defining skill of the AI era.

By 2026, that framing is incomplete. The models have improved substantially, and the floor for adequate prompting has risen with them. A reasonably clear natural language request now produces usable output most of the time. More practically: ask an AI to improve your prompt and it will produce something more effective than most people write on their own. The advantage that once required real investment to acquire has compressed considerably. You can close a large portion of the prompting gap just by asking the model to help you.

What has not compressed is the execution gap on the other side of the exchange. Claude Code, Cursor, and their peers can now generate multi-file applications, configure dependencies, and scaffold complete architectures from plain descriptions. When those agents return a sequence of steps, however, a human still has to complete them. Create an account on this platform. Generate an API key with these specific permissions. Set this environment variable in this file. Run this command in this directory, not that one. Paste this value here. Confirm the connection before proceeding to the next step. Each instruction is discrete, each depends on the one before it, and each requires complete human execution before the next one begins.

This is where builds break. The AI produced correct instructions. The human introduced an error in the execution. The build fails. The tool gets blamed. The actual cause goes unexamined.

What the Research Has Shown for Decades

The psychology of instruction-following has a substantial research base that the AI field has not engaged with, mostly because sequential AI output was not a significant mode of human-AI collaboration until recently.

A 2020 review in the American Journal of Pharmaceutical Education identified working memory capacity as the central constraint on instruction-following performance. Multi-step sequences place measurable cognitive load on working memory. When that load exceeds capacity, people start sequences and do not finish them. They execute the first several steps, lose the thread somewhere in the middle, and move on under the impression that they are done.

The failure is cognitive and predictable, not motivational. Susan Gathercole's research at Cambridge documented the specific breakdown pattern with particular clarity: instruction sequences fall apart at the back end. Early steps get executed because they feel essential. Later steps get executed because they feel conclusive. Middle steps are where the cognitive load is highest, where sequences are most likely to be truncated, and where errors are most consequential because everything subsequent depends on them.

A 2016 study by Jaroslawska, Gathercole, Allen, and Holmes identified a straightforward intervention with a strong effect size. Executing each step at the moment of instruction, rather than reading the full sequence first and then attempting to complete it in batch, significantly improved both retention and completion rates. The mechanism is enactment: doing step one encodes what comes next. When someone reads an eight-step setup guide and then attempts to execute all eight steps from memory, the full sequence load sits in working memory at once. When they read step one, execute it, and confirm it before reading step two, the load distributes across execution. Completion rates improve substantially.

Stanley Milgram's obedience research adds a dimension that is easy to overlook in this context. Milgram showed across nineteen variations of his core experiment that instruction-following is situational. People follow at higher rates when an authority structure is present and enforcing completion. When that structure is removed, when the authority figure leaves the room or communicates by telephone rather than in person, compliance drops measurably and predictably. An AI agent returns a numbered list of steps and waits. There is no authority in the room. There is no checkpoint structure built into the process. The data predicts exactly what practitioners observe: a significant portion of people do not complete the sequence as specified.

The Competitive Window That Exists Right Now

Browser control agents are coming, and they will arrive faster than most workforce planning has accounted for. When they reach sufficient reliability and cost efficiency, a large portion of routine digital work that humans currently perform will be completable by AI systems directly. The case for human involvement in executing digital instruction sequences will narrow considerably.

We are not there yet, and the window between now and then is a real opportunity for people who recognize what is actually valuable in this moment.

The persistent assumption is that AI will leave behind people who cannot write code or construct sophisticated prompts. The research suggests a different picture. Experienced developers have domain knowledge that reduces the cognitive load of unfamiliar steps, which helps with execution. But they have also built years of habits around improvisation and shortcutting, and those habits consistently produce the partial execution that breaks AI-assisted builds. Non-technical users, by contrast, often follow sequences more completely because they have no basis for deciding which steps to skip. They execute as specified because they do not know enough to deviate, and that turns out to be an advantage in this specific context.

Instruction-following discipline is learnable, trainable, and not dependent on a technical background. For individuals watching the AI transition and trying to identify where to put their development energy, it is a more accessible and more immediately valuable skill than prompt engineering has been. It also transfers forward. As AI systems move into physical-world coordination tasks, the same discipline that produces clean software builds will produce clean execution of whatever comes back in those environments. The skill does not become obsolete when browser agents take over digital work. It shifts into the physical domain.

The Enterprise Failure Nobody Is Diagnosing

Boston Consulting Group research has estimated that fewer than 20 percent of enterprise AI projects achieve their expected return on investment. The explanations most commonly offered focus on model limitations, data quality, change management failures, and strategic misalignment. Instruction-following capacity as a failure mode does not appear in these analyses, not because it is absent, but because it has not been named or measured.

Enterprise instruction-following failures have mechanics that differ from individual failures in important ways. In a team implementation, instruction sequences are frequently distributed across multiple people. Person A completes steps one through four and hands off to Person B, who picks up at step five without knowing that step three was incompletely executed. The build fails six steps removed from the actual error. The trace is difficult, the diagnosis defaults to tool failure, and the organization repeats the same failure on the next implementation.

The organizations that have succeeded in AI implementation share a consistent characteristic: they treat it as a procedural discipline. They build verification checkpoints into integration sequences. They require confirmed step completion before advancement. They assign clear ownership over sequence execution to a single person. They debrief failed implementations by examining step-level execution rather than assuming tool failure. These are direct applications of what checklist research in aviation and surgery has demonstrated for decades. The cognitive mechanisms are the same. The intervention that works is the same.

The App, the Workshop, and What Comes Next

Publishing the paper is one part of what I am releasing today. The other is a practice-and-measurement web app built directly on the research. The app lets you practice following sequential instruction sets with the enactment model built in: one step at a time, with confirmation before advancement, and tracking of where in sequences completion discipline breaks down. It is grounded in the same working memory and checkpoint research the paper covers, and is designed for individuals who want to develop this skill deliberately and for organizations that want to assess and track instruction-following capacity across teams. A live workshop is also in development for teams that want to work through the enterprise applications directly.

I also wanted to offer something actionable, so I created an app based on all the prior research to help you become better at following instructions.

DAVID BORISH