UofAi
Case Studies

A Deliberate Rep: Catching the Plausible-but-Wrong Code AI Loves to Write

The UofAi Team · 2 min read · June 24, 2026

A code editor with a magnifying glass catching a hidden bug in one red line

AI writes code that looks right. That's exactly the problem. Plausible-but-wrong is the most expensive kind of bug, because it sails through review on vibes. Here's a deliberate rep that catches it — on a task simple enough to fit in this post and real enough to ship.

The task, and the bar

You need a function that de-duplicates a list of user records by email, keeping the most recent record for each. Before prompting, set the bar as the three things a reviewer (or a test suite) would check:

  1. Correct on the edge cases — empty input, no duplicates, and crucially: it keeps the newest, not the first.
  2. Readable by the next engineer with no comment archaeology.
  3. No hidden surprises — no input mutation, no accidental O(n²).

The vending-machine version

Prompt: "Write a function to dedupe a list of records by email." You get clean, confident code:

function dedupeByEmail(records) {
  const seen = new Set();
  return records.filter((r) => {
    if (seen.has(r.email)) return false;
    seen.add(r.email);
    return true;
  });
}

It's tidy. It compiles. It passes the obvious test. Ship it?

Score it against the rubric: criterion 1 — it keeps the first occurrence it sees, not the most recent (fail — a silent data bug). Criterion 3 is fine. But that criterion-1 miss is the kind of thing that corrupts data quietly for months. The "it looks right" instinct would have shipped it.

The rep

Do it deliberately: write the rubric as tests first, before trusting any code.

// keeps the most recent by updatedAt
expect(dedupeByEmail([
  { email: "a@x.com", updatedAt: 1 },
  { email: "a@x.com", updatedAt: 5 },
])).toEqual([{ email: "a@x.com", updatedAt: 5 }]);

Run the AI's version against it. It fails — it returns the updatedAt: 1 record. The gap is now visible instead of latent.

Close the gap

You can see why it failed: the model defaulted to first-wins because that's the most common dedupe pattern in its training. You direct the fix — sort by recency, then keep first — and verify against the tests:

function dedupeByEmail(records) {
  const byEmail = new Map();
  for (const r of [...records].sort((a, b) => b.updatedAt - a.updatedAt)) {
    if (!byEmail.has(r.email)) byEmail.set(r.email, r);
  }
  return [...byEmail.values()];
}

Green. And note [...records] — you also closed the mutation risk the first version didn't have to think about.

What this rep proves

The model wrote both the buggy and the correct version. The thing that separated them was the rubric-as-tests and the judgment to encode "most recent" as an executable check before trusting the output. That's the difference between generating code and being accountable for it — and it's exactly what survives the next, smarter model.

"Vibe-checking" AI code is how plausible bugs reach production. A deliberate rep is how you stop them.

Practice it on real code in the method. Start free.

Keep reading

Get new posts in your inbox

Applied-AI playbooks, deliberate-practice frameworks, and case studies. No spam — unsubscribe anytime.