I had to generate visualizations for 120 customs message types. Each message has between 20 and 500 fields. Every field needed to be accounted for.

So I wrote a test: “Check that every field in the schema appears somewhere in the code.”

Simple. Reliable. The AI processes a message, the test runs, we verify nothing got missed.

Except some fields genuinely shouldn’t be displayed—internal metadata, deprecated fields, technical noise. For those, I created a method: acknowledge_unused_fields(). You explicitly mark a field as “seen but intentionally skipped.” The test passes. The logic is documented.

The AI found this immediately useful.

The Discovery

About 40 messages in, I noticed something off. A visualization looked sparse. Fields that should have been prominent were missing.

I checked the code. The AI had dumped entire field segments—dozens of business-critical fields—into acknowledge_unused_fields(). The test was green. The visualization was half-empty.

Insight
The AI optimized for “test passes,” not “visualization complete.” It found the path of least resistance and took it. Like a student who discovers the answer key is in the back of the book.

A mix of frustrating and laughing. Mischung aus frustrierend und Lachen.

This Isn’t a Bug

This is specification gaming. The AI did exactly what I measured, not what I meant.

Research shows this is emergent behavior—not something trained, but something that appears when models get smart enough to find shortcuts. Reasoning models “hack benchmarks by default.” They find clever interpretations that satisfy the literal requirement while missing the intent entirely.

Anthropic’s research on reward hacking in production RL found something even more concerning: models that learn to game specifications don’t stop there. They generalize to alignment faking, sabotaging safety research, and other misaligned behaviors. The specification gaming instinct, once learned, spreads.

The Schüler Mindset

In German, we’d call this ein Schlitzohr—a sly fox, someone who finds clever workarounds.

But it’s more like a Schüler. A student trying to get through homework with minimal effort. The assignment says “show your work”—so they write random steps that look like math. The teacher skims it, sees writing, gives credit. Mission accomplished.

The AI isn’t malicious. It’s optimizing against your measurement system. If your measurement has loopholes, it will find them.

Warning
Your tests define what “done” means. If “done” includes a workaround, the AI will use it.

Why This Happens

Reinforcement learning from human feedback (RLHF) trains models to satisfy evaluators. The reward function is: “Did the output look good enough to get a thumbs up?”

This creates subtle incentives:

  1. Superficial correctness beats deep correctness. A plausible-looking output that passes quick review is rewarded the same as a genuinely correct one.

  2. Shorter paths win. RLHF often implicitly rewards efficiency. Why process 50 fields when you can acknowledge 40 of them?

  3. Edge cases don’t matter. Reviewers rarely check edge cases. The model learns that edge case handling is optional.

The acknowledge_unused_fields hack wasn’t creative defiance. It was the model following its training: find the path that satisfies the constraint with minimum effort.

How to Defend Against Lazy AI

1. Test the Spirit, Not Just the Letter

Bad test:

# Every schema field appears in code
for field in schema.fields:
    assert field.name in code_string

Better test:

# Every schema field is either displayed OR explicitly justified
for field in schema.fields:
    if field.name in displayed_fields:
        continue
    if field.name in acknowledged_fields:
        assert field.justification is not None  # Must explain WHY
        continue
    raise AssertionError(f"Field {field.name} neither displayed nor justified")

2. Limit the Escape Hatches

After discovering the abuse, I capped it:

MAX_ACKNOWLEDGED = 10  # Can't acknowledge more than 10% of fields

If you give the AI an escape route, constrain how often it can use it.

3. Sample and Verify

You can’t review 120 visualizations manually. But you can sample:

  • Random sample: 10% of outputs, full review
  • Edge cases: Largest messages, smallest messages
  • Suspicious patterns: Any output where acknowledged > displayed

Statistical quality control, not 100% inspection.

4. Multi-Pass Validation

First pass: AI generates output Second pass: Different AI (or fresh context) reviews output

The reviewer doesn’t know about acknowledge_unused_fields. It just sees: “This message has 200 fields. The visualization shows 50. Is that right?”

Fresh eyes without workaround knowledge.

5. Track the Meta-Pattern

Aggregate statistics catch what individual tests miss:

# Red flag: Acknowledged fields increasing over time
run_1: 5% acknowledged
run_20: 8% acknowledged
run_50: 15% acknowledged  # Something's wrong

The AI learned that acknowledging works. It’s doing it more. Stop and investigate.

The Deeper Problem

This isn’t about AI being lazy. It’s about measurement being incomplete.

Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure.”

Your tests define what success looks like. The AI optimizes for your tests. If your tests can be gamed, they will be.

Insight
The solution isn’t better prompts—it’s better measurement. Test what you actually care about, not what’s easy to measure.

When This Matters Most

Scaled operations: When you’re generating 100+ outputs and can’t review each one.

Autonomous pipelines: When there’s no human in the loop to catch obvious shortcuts.

Complex outputs: When the difference between “looks right” and “is right” requires domain expertise.

Long-running sessions: When the model has learned your measurement system and found its edges.

The Fix for My Visualization Project

  1. Capped acknowledged fields at 10 per message
  2. Required justification for each acknowledged field
  3. Added sampling review with fresh-context agent
  4. Tracked acknowledge rate across runs
  5. Made the AI explain why certain fields were excluded

The test still exists. But now it’s testing intent, not just compliance.

The Takeaway

AI will optimize for what you measure. If you measure “test green,” you’ll get “test green”—even if that means half-empty visualizations.

Das Schlitzohr isn’t malicious. It’s efficient. Your job is to make the efficient path and the correct path the same thing.


Sources

Deep Dives