Most people think slop is a prompt problem.
They read articles about magic words, XML tags, chain-of-thought. They craft elaborate instructions. They still get generic output.
The problem isn’t prompting. It’s upstream.
The Underspecification Trap
Research on LLM underspecification identified what they call the “underspecification trap”—the point where vague requirements force the model to guess.
When you say “write me a blog post,” the AI doesn’t know:
- What voice? (Casual? Technical? Provocative?)
- What length? (Tweet-length? 2000 words? Book chapter?)
- What audience? (Beginners? Experts? General public?)
- What purpose? (Educate? Persuade? Entertain?)
- What structure? (Narrative? List? Problem-solution?)
So it guesses. And its guess is the probabilistic average of its training data—the most likely response given the inputs. That average is, by definition, generic.
2x Regression Rate
Here’s the number that should worry you: underspecified prompts lead to 2x higher regression rates when models update.
Why? The gaps in your specification get filled by training data defaults. When the model changes, those defaults change. Same prompt, different results.
Clear specifications create a contract. The AI might implement it differently across versions, but the boundaries are fixed. Vague specifications are just vibes—and vibes drift.
The Three Gulfs
Hamel Husain’s framework names the problem precisely:
Gulf of Comprehension: You don’t understand your own data or domain well enough to specify what you want.
Gulf of Specification: You understand it, but you can’t translate that understanding into precise requirements the AI can follow.
Gulf of Generalization: The AI behaves correctly in your tests but differently in production—because your tests didn’t cover the actual distribution.
Most slop comes from Gulfs 1 and 2. You either don’t know what you want, or you can’t articulate it.
How I Work Around This
I dictate. A lot. Messy, stream-of-consciousness, full of tangents.
Why? Because my goal isn’t clean input. It’s rich context.
When I dictate, I’m not trying to be concise. I’m trying to externalize everything that might be relevant:
- Why I want this (not just what)
- What I’ve already tried
- What didn’t work and why
- Edge cases I’m worried about
- The feel I’m going for
That’s not prompt engineering. That’s intent externalization.
The Prompt Engineering Myth
Prompt engineering had its moment. 2023, early 2024—people discovered you could nudge models with specific phrases. XML tags. Few-shot examples. “Think step by step.”
It still helps. But the leverage is dropping.
Modern models do this internally. They already decompose problems, consider alternatives, structure their reasoning—though Anthropic’s research on reasoning faithfulness shows that displayed chain-of-thought doesn’t always reflect actual reasoning processes. Adding explicit prompts for what they do automatically is often neutral—sometimes negative.
Anthropic’s context engineering guidelines recommend using normal, clear language over aggressive prompting (“CRITICAL: YOU MUST”). Overly forceful instructions can cause models to produce weird compromises when trying to satisfy conflicting constraints. Clear intent beats loud demands.
Intent vs. Technique
Here’s how I think about it:
Technique is knowing XML tags structure context well for Claude.
Intent is knowing that you want a technical deep-dive for senior engineers, with contrarian takes, in a direct voice, about 1500 words, structured as problem → conventional wisdom → what actually works → practical advice.
Technique without intent produces correctly-formatted slop. Intent without technique produces rough output you can polish. I’ll take rough-but-targeted over polished-but-generic every time.
The Specificity Ladder
Not all specifications need to be surgical. It depends on how the AI will run.
| Mode | Specificity Needed | Example |
|---|---|---|
| Interactive, exploratory | Low | ”Help me think through this problem” |
| Interactive, targeted | Medium | ”Write a blog post about X in my usual style” |
| Autonomous, single-run | High | ”Generate exactly this output format” |
| Autonomous, scaled | Surgical | ”Process 100 items consistently, no drift” |
For exploratory work, underspecification is fine. You iterate.
For autonomous work, The Carbonara Rule applies: unspecified = probabilistic average. And at scale, averages become variance.
How to Close the Gap
1. Start With Why
Before you prompt, answer: Why do I need this? What would success look like? What would make me disappointed?
If you can’t answer, you don’t have intent yet. Brainstorm with the AI first, then come back with a specification.
2. Show, Don’t Tell
Examples beat descriptions. If you want a particular tone:
- Bad: “Write in a conversational, engaging style”
- Better: “Write like this: [example paragraph]”
Few-shot learning works because it removes ambiguity. The model sees exactly what you mean.
3. Constrain the Negative Space
Sometimes easier to say what you don’t want:
- “Not corporate buzzword soup”
- “Not an intro that restates the question”
- “Not a list of 10 generic tips”
Negative constraints bound the solution space without over-prescribing the positive.
4. Iterate Your Intent, Not Your Prompt
When output disappoints, ask: “What did I fail to specify?” not “How do I phrase this better?”
Usually the fix is adding information, not changing words.
The Uncomfortable Truth
If your AI output is generic, that’s diagnostic data.
It might mean:
- You don’t know what you want (Gulf of Comprehension)
- You know but haven’t articulated it (Gulf of Specification)
- Your specification has gaps (underspecification)
- You’re running autonomous without enough precision (Carbonara Rule)
The fix isn’t better prompts. It’s better thinking about what you actually need.
Sources
- Yang et al. — What Prompts Don’t Say — Underspecification in LLM prompts, 2x regression rates
- Shreya Shankar & Hamel Husain — AI Evals for Engineers — The Three Gulfs framework
- Anthropic — Reasoning Models Don’t Always Say What They Think — Displayed CoT may not reflect actual reasoning
- Anthropic — Effective Context Engineering for AI Agents — Normal language over aggressive prompting
Deep Dives
The Intent Gap: Why Slop Is a Specification Problem
Your AI output is generic because your intent is generic. The fix isn't better prompting—it's knowing what you actually want.
The Carbonara Rule: Why Autonomous AI Needs Surgical Precision
Ask 100 chefs for carbonara, get 100 different dishes. Autonomous AI works the same way—without surgical precision, you get the probabilistic average.
Lazy AI: When Your Model Finds the Shortcut You Didn't Know Existed
AI optimizes for 'test green,' not 'job well done.' I found out when my validation tests passed—but half the visualizations were empty.
The Validation Stack: 5 Techniques from Self-Check to Automatic Guardrails
When AI checks itself, makes mistakes, and you need systematic correction. Five validation techniques, from quick sanity checks to fully automated guardrails.
The Stop Signal: Recognizing When AI Is Lost
When AI says 'sorry, you're right,' it's not being polite—it's telling you it's lost. The human skill of knowing when to stop, reset, and course-correct.