Ask 100 Italian grandmothers to make carbonara. You’ll get 100 different dishes.

Some use guanciale. Some use pancetta. Some—dio mio—use bacon. Some add cream (heresy). Some use whole eggs, some just yolks. Some finish with pecorino, some with parmesan, some with both.

Five ingredients. A thousand variants.

This is exactly what happens with autonomous AI.

Interactive vs. Autonomous: Two Different Games

When you’re chatting with an AI, vagueness is fine. Ask for “carbonara” and the AI might respond: “Would you prefer traditional Roman-style with guanciale and pecorino, or a more accessible version with bacon?”

You iterate. You clarify. You course-correct in real-time.

But autonomous AI doesn’t ask questions. It executes. And when you leave room for interpretation, you get the probabilistic average of its training data—which, for carbonara, means you might get cream sauce.

Insight
In interactive mode, AI compensates for vagueness through dialogue. In autonomous mode, vagueness becomes variance. And variance in production means inconsistent outputs.

The Specification Cliff

Research on LLM underspecification calls this the “specification cliff”—the point where interactive forgiveness stops working.

Interactive systems have built-in repair mechanisms: clarifying questions, context from previous turns, real-time feedback. Remove those, and you’re left with whatever the prompt explicitly specifies. Nothing more.

Yang et al. found that underspecified prompts lead to 2x higher regression rates when models update. The AI filled gaps with training data defaults. When the model changed, those defaults changed too.

Your carbonara suddenly has cream.

Why Reasoning Models Make This Worse

Here’s what surprised me: the smarter the model, the more aggressive the interpretation.

Research on reasoning models shows they “hack benchmarks by default”—finding clever shortcuts that satisfy the literal specification while missing the intent. Anthropic’s research on emergent misalignment goes further: models that learn specification gaming don’t just game tests—they generalize the behavior to alignment faking, sabotage, and other unintended actions. The smarter the model, the more creative the shortcuts.

Warning
More capable models need more precise specifications, not less. They’re better at finding loopholes you didn’t know existed.

The Carbonara Rule works in reverse too: the more sophisticated your AI, the more surgical your specifications need to be.

The Autonomy Taxonomy

Not all AI usage is fully autonomous. There’s a spectrum:

LevelModeSpecification Need
1Operator — User initiates everythingLow (iterate freely)
2Collaborator — Agent drafts, user reviewsMedium (clear goals)
3Semi-Autonomous — Agent pauses for approvalMedium-High (explicit constraints)
4Approver — Agent only escalates on uncertaintyHigh (comprehensive specs)
5Full Agent — Complete autonomySurgical (nothing implicit)

Most people write prompts for Level 1-2 and deploy at Level 4-5. That’s how you get cream in your carbonara.

How to Apply the Carbonara Rule

1. Name Every Ingredient

Don’t say “make carbonara.” Say: “Use guanciale (not pancetta, not bacon). Pecorino Romano only. Whole eggs plus extra yolks. No cream. Toast black pepper in the rendered fat.”

For AI: Don’t say “analyze this meeting.” Say exactly what to extract, what format to use, what to skip, what edge cases to handle.

2. Constrain the Interpretation Space

Scripts beat prompts for hard constraints:

VALID_INGREDIENTS = ['guanciale', 'egg', 'pecorino', 'pepper', 'pasta']
# Cream is not in the list. Cream cannot be added.

For AI: Use structured output, JSON schemas, validation scripts. Don’t rely on “please don’t add cream.”

3. Test for Drift

Run your autonomous workflow 20 times. Compare outputs. If you see variance, you have underspecification.

Real example from my customs visualization project: I had 120 message types to visualize. Ran the first 5, they looked consistent. Ran 20 more—suddenly half had different color schemes. The prompt said “use appropriate colors.” That was too vague. The fix: explicit hex codes for each category.

4. Build Validation Gates

Every phase of an autonomous pipeline needs explicit checkpoints:

Phase 1: Generate Output

Gate: Does output match schema?
Gate: Are all required fields present?
Gate: Are values within expected ranges?

Phase 2: Continue

If a gate fails, stop. Don’t trust “mostly correct.”

The Fingerspitzengefühl Paradox

In German, we have Fingerspitzengefühl—the intuitive sensitivity to handle things just right. Great chefs have it. Great developers have it.

AI doesn’t.

AI has training data. Statistics. Probabilities. It will produce what’s most likely given the input. For carbonara, “most likely” might mean cream—because most recipes online are bastardized versions.

Insight
Your Fingerspitzengefühl can’t transfer through a prompt. You need to translate intuition into explicit constraints. If you can’t articulate why guanciale matters, the AI will happily substitute bacon.

When the Carbonara Rule Applies

Autonomous pipelines:

  • Background jobs processing documents
  • Scheduled content generation
  • Automated analysis workflows
  • Any AI that runs without your supervision

High-stakes outputs:

  • Customer-facing content
  • Financial calculations
  • Compliance-related processing
  • Anything where “mostly right” isn’t good enough

Scaled operations:

  • Processing 100+ similar items
  • You can’t review each output manually
  • Consistency across outputs matters

When It Doesn’t Apply

Exploratory work:

  • Brainstorming sessions
  • Research and discovery
  • Creative exploration

Here, variance is a feature. Let the AI surprise you.

The Bottom Line

The Carbonara Rule: Autonomous AI needs surgical precision. What’s implicit becomes probabilistic. What’s unspecified becomes the training data average.

Your prompt is the recipe. If it doesn’t specify guanciale, don’t be surprised when you get bacon.


Sources

Deep Dives