The Intent Gap: Why Slop Is a Specification Problem

Most people think slop is a prompt problem.

They read articles about magic words, XML tags, chain-of-thought. They craft elaborate instructions. They still get generic output.

The problem isn’t prompting. It’s upstream.

The Underspecification Trap

Research on LLM underspecification identified what they call the “underspecification trap”—the point where vague requirements force the model to guess.

When you say “write me a blog post,” the AI doesn’t know:

What voice? (Casual? Technical? Provocative?)
What length? (Tweet-length? 2000 words? Book chapter?)
What audience? (Beginners? Experts? General public?)
What purpose? (Educate? Persuade? Entertain?)
What structure? (Narrative? List? Problem-solution?)

So it guesses. And its guess is the probabilistic average of its training data—the most likely response given the inputs. That average is, by definition, generic.

Insight

Slop isn’t bad AI. It’s underspecified intent. The AI did exactly what you measured: produce plausible output for vague input.

2x Regression Rate

Here’s the number that should worry you: underspecified prompts lead to 2x higher regression rates when models update.

Why? The gaps in your specification get filled by training data defaults. When the model changes, those defaults change. Same prompt, different results.

Clear specifications create a contract. The AI might implement it differently across versions, but the boundaries are fixed. Vague specifications are just vibes—and vibes drift.

The Three Gulfs

Hamel Husain’s framework names the problem precisely:

Gulf of Comprehension: You don’t understand your own data or domain well enough to specify what you want.

Gulf of Specification: You understand it, but you can’t translate that understanding into precise requirements the AI can follow.

Gulf of Generalization: The AI behaves correctly in your tests but differently in production—because your tests didn’t cover the actual distribution.

Most slop comes from Gulfs 1 and 2. You either don’t know what you want, or you can’t articulate it.

How I Work Around This

I dictate. A lot. Messy, stream-of-consciousness, full of tangents.

Why? Because my goal isn’t clean input. It’s rich context.

Insight

It’s much more important to get the context in—the nuances, the colors, the undertones—than to polish your phrasing. The AI can handle messy syntax. It can’t handle missing information.

When I dictate, I’m not trying to be concise. I’m trying to externalize everything that might be relevant:

Why I want this (not just what)
What I’ve already tried
What didn’t work and why
Edge cases I’m worried about
The feel I’m going for

That’s not prompt engineering. That’s intent externalization.

The Prompt Engineering Myth

Prompt engineering had its moment. 2023, early 2024—people discovered you could nudge models with specific phrases. XML tags. Few-shot examples. “Think step by step.”

It still helps. But the leverage is dropping.

Modern models do this internally. They already decompose problems, consider alternatives, structure their reasoning—though Anthropic’s research on reasoning faithfulness shows that displayed chain-of-thought doesn’t always reflect actual reasoning processes. Adding explicit prompts for what they do automatically is often neutral—sometimes negative.

Anthropic’s context engineering guidelines recommend using normal, clear language over aggressive prompting (“CRITICAL: YOU MUST”). Overly forceful instructions can cause models to produce weird compromises when trying to satisfy conflicting constraints. Clear intent beats loud demands.

Warning

The arcane techniques that worked in 2023 may not work—or may hurt—in 2026. The models have absorbed the tricks.

Intent vs. Technique

Here’s how I think about it:

Technique is knowing XML tags structure context well for Claude.

Intent is knowing that you want a technical deep-dive for senior engineers, with contrarian takes, in a direct voice, about 1500 words, structured as problem → conventional wisdom → what actually works → practical advice.

Technique without intent produces correctly-formatted slop. Intent without technique produces rough output you can polish. I’ll take rough-but-targeted over polished-but-generic every time.

The Specificity Ladder

Not all specifications need to be surgical. It depends on how the AI will run.

Mode	Specificity Needed	Example
Interactive, exploratory	Low	”Help me think through this problem”
Interactive, targeted	Medium	”Write a blog post about X in my usual style”
Autonomous, single-run	High	”Generate exactly this output format”
Autonomous, scaled	Surgical	”Process 100 items consistently, no drift”

For exploratory work, underspecification is fine. You iterate.

For autonomous work, The Carbonara Rule applies: unspecified = probabilistic average. And at scale, averages become variance.

How to Close the Gap

1. Start With Why

Before you prompt, answer: Why do I need this? What would success look like? What would make me disappointed?

If you can’t answer, you don’t have intent yet. Brainstorm with the AI first, then come back with a specification.

2. Show, Don’t Tell

Examples beat descriptions. If you want a particular tone:

Bad: “Write in a conversational, engaging style”
Better: “Write like this: [example paragraph]”

Few-shot learning works because it removes ambiguity. The model sees exactly what you mean.

3. Constrain the Negative Space

Sometimes easier to say what you don’t want:

“Not corporate buzzword soup”
“Not an intro that restates the question”
“Not a list of 10 generic tips”

Negative constraints bound the solution space without over-prescribing the positive.

4. Iterate Your Intent, Not Your Prompt

When output disappoints, ask: “What did I fail to specify?” not “How do I phrase this better?”

Usually the fix is adding information, not changing words.

The Uncomfortable Truth

If your AI output is generic, that’s diagnostic data.

It might mean:

You don’t know what you want (Gulf of Comprehension)
You know but haven’t articulated it (Gulf of Specification)
Your specification has gaps (underspecification)
You’re running autonomous without enough precision (Carbonara Rule)

The fix isn’t better prompts. It’s better thinking about what you actually need.

Insight

The AI is a mirror. If you see slop, that’s showing you where your own thinking is unfinished.

Sources

Yang et al. — What Prompts Don’t Say — Underspecification in LLM prompts, 2x regression rates
Shreya Shankar & Hamel Husain — AI Evals for Engineers — The Three Gulfs framework
Anthropic — Reasoning Models Don’t Always Say What They Think — Displayed CoT may not reflect actual reasoning
Anthropic — Effective Context Engineering for AI Agents — Normal language over aggressive prompting

Deep Dives

01 Here

Your AI output is generic because your intent is generic. The fix isn't better prompting—it's knowing what you actually want.

The Intent Gap: Why Slop Is a Specification Problem

The Underspecification Trap

2x Regression Rate

The Three Gulfs

How I Work Around This

The Prompt Engineering Myth

Intent vs. Technique

The Specificity Ladder

How to Close the Gap

1. Start With Why

2. Show, Don’t Tell

3. Constrain the Negative Space

4. Iterate Your Intent, Not Your Prompt

The Uncomfortable Truth

Sources

Deep Dives

The Intent Gap: Why Slop Is a Specification Problem

The Carbonara Rule: Why Autonomous AI Needs Surgical Precision

Lazy AI: When Your Model Finds the Shortcut You Didn't Know Existed

The Validation Stack: 5 Techniques from Self-Check to Automatic Guardrails

The Stop Signal: Recognizing When AI Is Lost