Support NeoGAF

Draugoth · Sep 21, 2025

bafkreiaian6pxyvxafqbcry2wbf26boddcqw4k426dfa6sqj6ji63riocm@jpeg

In a landmark study, OpenAI researchers reveal that large language models will always produce plausible but false outputs, even with perfect data, due to fundamental statistical and computational limits.

"Large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty. Such 'hallucinations' persist even in state-of-the-art systems."

"The study established that "the generative error rate is at least twice the IIV misclassification rate," where IIV referred to "Is-It-Valid" and demonstrated mathematical lower bounds that prove AI systems will always make a certain percentage of mistakes, no matter how much the technology improves."

"The OpenAI research also revealed that industry evaluation methods actively encouraged the problem. Analysis of popular benchmarks, including GPQA, MMLU-Pro, and SWE-bench, found nine out of 10 major evaluations used binary grading that penalized "I don't know" responses while rewarding incorrect but confident answers."

Hookshot · Sep 21, 2025

I love what AI is doing in the medical fields, they can check for blood cancers via eye tests now but you always need a person checking it to make sure it's not producing utter bollocks.

Sadly those pushing it hard seem to think it's magic and not easily derailed.

bitbydeath · Sep 21, 2025

"large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty,"

What could possibly go wrong.

jason10mm · Sep 21, 2025

I wonder if this is the solution....

Get multiple answers and they need a majority consensus to weed out the random oddball.

Damn, now I gotta watch it again.....

Wildebeest · Sep 22, 2025

For the first time in human history we will have to deal with plausible but false information. We may never recover.

Futaleufu · Sep 22, 2025

Wildebeest said:
For the first time in human history we will have to deal with plausible but false information. We may never recover.

It has been an issue since the cavemen days

ReBurn · Sep 22, 2025

Humanity never needed to verify conclusions until AI came along.

readonly · Sep 22, 2025

Where one company has flaws, another will solve. Only ever a matter of time, so it doesn't even matter. It's just something we should know when using it for now.

poppabk · Sep 22, 2025

Man, imagine if people did that.

Unknown Soldier · Sep 22, 2025

"Statistical and computational limits" is a funny way of admitting that LLM's can't actually reason or think and that they will never be the path to AGI

poppabk · Sep 22, 2025

Unknown Soldier said:
"Statistical and computational limits" is a funny way of admitting that LLM's can't actually reason or think and that they will never be the path to AGI

We are increasingly having to re-evaluate what reason and think means.

YCoCg · Sep 22, 2025

@grok tell me how to feel about this

Dr Bass · Sep 22, 2025

Yeah, anyone who knows how an LLM works knew this a long time ago. This isn't really news.

violence · Sep 22, 2025

Jinzo Prime · Sep 22, 2025

jason10mm said:
I wonder if this is the solution....

Get multiple answers and they need a majority consensus to weed out the random oddball.

Damn, now I gotta watch it again.....

Consensus is also how the Geth worked in Mass Effect. So maybe the answer is hundreds of slightly different AI agents "voting" on the right answer, and a minority report being generated from the desenting voices.

MayauMiao · Sep 22, 2025

Draugoth said:
Source

In a landmark study, OpenAI researchers reveal that large language models will always produce plausible but false outputs, even with perfect data, due to fundamental statistical and computational limits.

No wonder AI art produces 6 fingers.

Cyberpunkd · Sep 22, 2025

bitbydeath said:
What could possibly go wrong.

Yes, this is insufferable. The model will never tell you "I'm not sure", last week I was calculating something and the convo went something like that:

AI: You are correct, 1024-40 will give you 1005.
Me: But it won't, that's 984.
AI: You are absolutely correct, I made a mistake. 1024-40 is in fact 984.

MidGenRefresh · Sep 22, 2025

Dr Bass said:
Yeah, anyone who knows how an LLM works knew this a long time ago. This isn't really news.

Exactly this. AI is only a marketing term.

Dirk Benedict · Sep 22, 2025

violence said:

oh, I like this.
Nothing like Early Man having a damning conversation with a mineral packed rock.

p_xavier · Sep 22, 2025

jason10mm said:
I wonder if this is the solution....

Get multiple answers and they need a majority consensus to weed out the random oddball.

Damn, now I gotta watch it again.....

Evangelion did it first though.

Hari Seldon · Sep 22, 2025

Dr Bass said:
Yeah, anyone who knows how an LLM works knew this a long time ago. This isn't really news.

It's news that they mathematically proved it.

Ulysses 31 · Sep 22, 2025

jason10mm said:
I wonder if this is the solution....

Get multiple answers and they need a majority consensus to weed out the random oddball.

Damn, now I gotta watch it again.....

And get puzzled again how they let John Anderton keep his security access while being hunted down?

Mithos · Sep 22, 2025

Cyberpunkd said:
Yes, this is insufferable. The model will never tell you "I'm not sure", last week I was calculating something and the convo went something like that:

AI: You are correct, 1024-40 will give you 1005.
Me: But it won't, that's 984.
AI: You are absolutely correct, I made a mistake. 1024-40 is in fact 984.

It's even WORSE....

The following has happened many many times.... (just using your example for demonstration)

AI: You are correct, 1024-40 will give you 1005.
Me: But it won't, that's 984.
AI: You are absolutely correct, I made a mistake. 1024-40 is in fact 984.
Me: So tell me how much is 1024-40 again ?
AI: It's... 1005...

Bojji · Sep 22, 2025

Hookshot said:
I love what AI is doing in the medical fields, they can check for blood cancers via eye tests now but you always need a person checking it to make sure it's not producing utter bollocks.

Sadly those pushing it hard seem to think it's magic and not easily derailed.

Yeah, it's pretty useless for many things.

I have seen some nonsense answers from Ai like half the time when I search something on google (first response is from Ai, sometimes it's useful).

readonly said:
Where one company has flaws, another will solve. Only ever a matter of time, so it doesn't even matter. It's just something we should know when using it for now.

I think current way of doing things will not lead to that. Ai is leaning slowly at this point and adding more power to it generates diminishing returns. We need some new approach.

kruis · Sep 22, 2025

One reason for those bad results in the real world could be that thorough responses take much longer to compute (and are therefore more expensive), so especially free AI services take shortcuts to give you an acceptable answer in the shortest possible time. That acceptable answer in most cases is just a cursory search engine query.

Support NeoGAF

OpenAI admits AI hallucinations can't be fixed

Gold Member

Member

Member

Gold Member

Member

Member

Gold Member

Member

Cheeks Spread for Digital Only Future

Member

Cheeks Spread for Digital Only Future

Member

Gold Member

Member

Member

Member

Member

*Refreshes biennially

Gold Member

Authorized Fister

Member

Member

Member

Gold Member

Exposing the sinister cartel of retailers who allow companies to pay for advertising space.

Similar threads