Where the Hallucinations Got Through
Two retractions in a week, one in consulting and one in research preprints. The shared failure isn't the model. It's what the review layer was never built to catch.
Ernst & Young pulled a published study this week because someone reading it noticed the citations didn't exist. That's the version that made the Financial Times. The version that should make procurement officers pause is that the citations made it through whatever internal review EY runs before a study with their name on it goes out the door.
The firm's framing is that this is a cautionary tale about AI misuse. That framing is convenient. It locates the problem in the tool (the technology that misled the professionals) and lets the institution off the hook for the part that actually broke. The study didn't slip past review because the model is too clever. It slipped past because the review wasn't checking the thing the model is known to get wrong.
A day later, on the other end of the credibility economy, arXiv's moderators announced they would start banning submitters whose papers contain AI-generated hallucinations. The announcement came via a moderator's social-media post, not a formal policy document, which is its own tell. A tweet doesn't carry the institutional weight of a board resolution, and the casual framing undersells what's happening. The world's largest preprint server is admitting that its existing screening can't catch what's coming in.
Same gap, different uniforms
Strip away the contexts (Big Four consulting and open-access physics preprints share almost nothing in audience, incentive, or workflow) and what's left is a single shape. AI output enters the pipeline. A human review layer exists to catch errors. The review layer is calibrated for the errors humans used to make: typos, weak arguments, sloppy reasoning, citations that go to the wrong page. It is not calibrated for plausible-looking citations to papers that don't exist, or for confident summaries of statutes that were never passed. The model produces a new error class. The review process catches the old one.
Call it the review-layer gap. Every organization deploying generative AI into knowledge work has one, whether they've audited for it or not. EY has it. arXiv has it. The midmarket consultancy pitching your CFO has it, and probably hasn't named it.
The pricing question
Here's the part the procurement conversation hasn't caught up to. When you buy advisory work from a brand-name firm, you're paying for the review layer at least as much as you're paying for the analyst's keystrokes. The whole pricing premium of a Big Four engagement over a freelance shop is the institutional check that says: this passed through people whose careers depend on it being right. A hallucination in the deliverable doesn't just damage the deliverable. It devalues the premium.
I don't know whether EY's clients on this engagement will pursue contractual remedies, and that gap matters. If they don't, the market signal is that buyers will absorb the cost of bad output rather than litigate the standard. If they do, every Big Four AI policy gets rewritten by Q4. Right now the procurement market is pricing this risk at roughly zero.
The next vendor meeting
The right question for a consulting vendor is no longer "do you use AI." Every honest answer is yes. The right question is: what does your review layer specifically do to catch fabricated citations, invented case law, and confidently wrong numerical summaries before they reach me? If the answer is "our consultants review the output," you have your answer about the review-layer gap, and it's the same answer EY just gave the world.
arXiv has the easier version of this problem — refuse the submission, ban the submitter, write the policy. A consulting firm can't ban its own juniors. It has to rebuild the review layer for an error class the layer wasn't designed to catch. That's a months-long process, not a memo.
Does the contract you signed last quarter say who eats the cost when the deliverable hallucinates?
Sources
Want to talk about this?
Get in touchMore on AI
There's No Rulebook Behind the AI Export Crackdown
One AI vendor gets ordered to cut a customer off over unproven China ties. Another sells frontier models into China the same week with no apparent friction. If you're trying to write a vendor-compliance policy off that, good luck.
Your AI ROI Runs On a Subsidized Rate
The ROI math on most enterprise AI work is built on a price that isn't real yet. Two sets of vendor financials this week say the bill is already moving.
Your Dev Toolchain Just Became a Conglomerate Asset
A $60 billion price tag is the headline. The procurement question nobody's asking is the part that should keep you up at night.
