• Hey Guest. Check out your NeoGAF Wrapped 2025 results here!

Little Torn On AI

The problem even with improving hallucination rate, is it's still really bad. It means people trust the models more, which means they use them for more or bigger things, and the actual amount of hallucinations ending up in some sort of output used for business (say, production code) doesn't necessarily decrease.

The models are getting insanely powerful but the more you do with them the more out of control it all sort of becomes. And it's all at a massive amount of cost, all the solutions to solve issues with AI tend to not just be the model itself but by all kinds of post-processing.
 
The problem even with improving hallucination rate, is it's still really bad. It means people trust the models more, which means they use them for more or bigger things, and the actual amount of hallucinations ending up in some sort of output used for business (say, production code) doesn't necessarily decrease.

The models are getting insanely powerful but the more you do with them the more out of control it all sort of becomes. And it's all at a massive amount of cost, all the solutions to solve issues with AI tend to not just be the model itself but by all kinds of post-processing.
Hallucinations can also be addressed by an AI aggregator, which is what copilot is amongst other things.

So an AI aggregator allows you to ask multiple AI agents across different platforms the same question.

You can read all 5 outputs or you can ask for aggregate output, which will eliminate outliers.

Thus, no hallucination.

Similar to humans who have seizures, diseases, collapses. The answer to hallucination is more AI. Since multiple ai agents will never hallucinate in exactly the same way(especially across platforms/models) a consensus will only ever be formed around correct data. This is actually a very easy problem to solve. The OP could do that and eliminate them today. So you can put thousands of humans on one problem. You can do so with AI as well. The tools are already built in. Just buy the 300 dollar a month grok plan and select team of experts, or use copilot to survey all the AIs at once.
 
Last edited:
Hallucinations can also be addressed by an AI aggregator, which is what copilot is amongst other things.

So an AI aggregator allows you to ask multiple AI agents across different platforms the same question.

You can read all 5 outputs or you can ask for aggregate output, which will eliminate outliers.

Thus, no hallucination.
Eh, I don't think this is true in any way. GitHub Co-Pilot like a lot of tools allows you to select a different model, or it has an auto-select mode where it picks a model based on your query. There is no option to tell it to go try all the models and aggregate them.

And none of that eliminates hallucinations even if it could do that. Either way what you just described is incredibly expensive lol

Source: I've used GitHub Copilot and I have taken 2 different trainings that utilize it. Possible I missed something but I doubt it, I just used it on a project last Friday.
 
Last edited:
Eh, I don't think this is true in any way. GitHub Co-Pilot like a lot of tools allows you to select a different model, or it has an auto-select mode where it picks a model based on your query. There is no option to tell it to go try all the models and aggregate them.

And none of that eliminates hallucinations even if it could do that. Either way what you just described is incredibly expensive lol

Source: I've used GitHub Copilot and I have taken 2 different trainings that utilize it. Possible I missed something but I doubt it, I just used it on a project last Friday.
I thought copilot was a multi-model aggregator. I asked gemini before I posted it and it gave me this:
Yes, Microsoft Copilot functions as a multi-model aggregator. Rather than relying solely on a single AI, it acts as a central orchestration layer that allows users to route tasks through various leading Large Language Models (LLMs) such as OpenAI's GPT and Anthropic's Claude within the Microsoft 365 ecosystem.
As far as my logic, here goes:

Say you ask a question to Gemini and you get a hallucination.

Are you implying that you could open Chatgpt and get the exact same hallucination?

Let's say you open 3 separate platforms and get 3 hallucinations.

There will be no agreement. So a council style aggregator can catch those.

Let me say this, you will never eliminate hallucination in a single agent model 100% and due to law of large numbers you will never eliminate it across all of AI, but that's the same as saying humans should never make mistakes.

So say I asked what color the sky is.

Claude tells me green
Chatgpt tells me yellow
Gemini and grok tell me it is blue

Now if my AI looks at those results it can tell that blue is most likely the right answer. You can see how at the very least these can be reduced to almost meaninglessness. The AI is a tool but it is also the user's job to mitigate hallucinations. That said, tools like I've mentioned above will allow us to refine this further as we go into the future.

Can we talk about professional metacritic reviews. Reviews are of course opinion but let's imagine they are not. Imagine reviews rather are quantifiable subjective data.

Some of them would hallucinate. They give great games bad scores and shit games great scores.

Aggregate data would then smooth out the spikes and bring us close to a norm. Consensus allows us to not fall for hallucinations. We already use it everyday.
 
Last edited:
I thought copilot was a multi-model aggregator. I asked gemini before I posted it and it gave me this:

As far as my logic, here goes:

Say you ask a question to Gemini and you get a hallucination.

Are you implying that you could open Chatgpt and get the exact same hallucination?

Let's say you open 3 separate platforms and get 3 hallucinations.

There will be no agreement. So a council style aggregator can catch those.

Let me say this, you will never eliminate hallucination in a single agent model 100% and due to law of large numbers you will never eliminate it across all of AI, but that's the same as saying humans should never make mistakes.

So say I asked what color the sky is.

Claude tells me green
Chatgpt tells me yellow
Gemini and grok tell me it is blue

Now if my AI looks at those results it can tell that blue is most likely the right answer. You can see how at the very least these can be reduced to almost meaninglessness.
It's not actually aggregating any output. It's not a great word to use for how GitHub Co-Pilot works. It aggregates models like a mall aggregates stores. You have access to all of the models in one place, but you aren't getting any output that somehow merges the models just as you aren't getting a Baja Blast flavored Frappuccino from Starbucks just because a Taco Bell exists in the same food court.

And you will never eliminate hallucinations 100% in any scenario with LLMs. It's the nature of the beast. What agents are doing is also far, far more complex than asking a question like "what color is the sky?"
 
Last edited:
It's not actually aggregating any output. It's not a great word to use for how GitHub Co-Pilot works. It aggregates models like a mall aggregates stores. You have access to all of the models in one place, but you aren't getting any output that somehow merges the models just as you aren't getting a Baja Blast flavored Frappuccino from Starbucks just because a Taco Bell exists in the same food court.

And you will never eliminate hallucinations 100% in any scenario with LLMs. It's the nature of the beast. What agents are doing is also far, far more complex than asking a question like "what color is the sky?"
To be fair there are several ways anyone can build this themselves by using the token sites instead of services already on the table. I'm not on co-pilot but there are services available right now that offer council style systems. The top one that appears in my search is perplexity.
  • Perplexity AI (Model Console): Available through a Perplexity AI Max subscription, the platform features a model console that queries various top-tier AI architectures at once and aggregates a summarized, verified response.
Of course you won't eliminate errors entirely but a system like this allows for a significant reduction in hallucinations. As of course it would if you think about it. Even if it didn't exist, it would have to. It's just too good of a antidote to hallucinations. Imagine if there were 20 mainstream independent AI instead of like 5. We could make hallucinations almost irrelevant.

So to answer the OP, the best ways to reduce hallucinations that I've come up with are:

1. Multimodel council style service like Perplexity AI Max, which queries various models and outputs a results summary will significantly reduced hallucinations
2. Suggesting specific sources to AI when asking video questions(such as the prompt, only provide me data from fextralife*) - aka constrain the model's creative freedom by forcing it to use specific factual info
3. Informing the AI when it has made a mistake and asking if why it made the mistake, then instructing it not to do that again(usually this will be something like a comment was pulled from a youtube video that was wrong). So when AI is wrong, tell it, then ask it to identify why that answer was wrong. Tell it to stop doing that for the rest of the chat and you will significantly reduce similar errors. AI is a self diagnostic tool.
4. Tell your AI it is okay to fail(or to say I can't be sure or I don't know)

I always demand citation which helps when I run into issues. This is part of number 2. It's also important to allow your AI to fail. Lots of hallucinations happen when AI can't really find the data it needs to answer and tries to guess. Unless you specifically tell it not to get crazy it will. Tell it not to guess. It's okay if it can't find the answer. Sometimes it isn't out there to find.
 
Last edited:
To be fair there are several ways anyone can build this themselves by using the token sites instead of services already on the table. I'm not on co-pilot but there are services available right now that offer council style systems. The top one that appears in my search is perplexity.

Of course you won't eliminate errors entirely but a system like this allows for a significant reduction in hallucinations. As of course it would if you think about it. Even if it didn't exist, it would have to. It's just too good of a antidote to hallucinations. Imagine if there were 20 mainstream independent AI instead of like 5. We could make hallucinations almost irrelevant.

So to answer the OP, the best ways to reduce hallucinations that I've come up with are:

1. Multimodel council style service like Perplexity AI Max, which queries various models and outputs a results summary will significantly reduced hallucinations
2. Suggesting specific sources to AI when asking video questions(such as the prompt, only provide me data from fextralife*) - aka constrain the model's creative freedom by forcing it to use specific factual info
3. Informing the AI when it has made a mistake and asking if why it made the mistake, then instructing it not to do that again(usually this will be something like a comment was pulled from a youtube video that was wrong). So when AI is wrong, tell it, then ask it to identify why that answer was wrong. Tell it to stop doing that for the rest of the chat and you will significantly reduce similar errors. AI is a self diagnostic tool.
4. Tell your AI it is okay to fail(or to say I can't be sure or I don't know)

I always demand citation which helps when I run into issues. This is part of number 2. It's also important to allow your AI to fail. Lots of hallucinations happen when AI can't really find the data it needs to answer and tries to guess. Unless you specifically tell it not to get crazy it will. Tell it not to guess. It's okay if it can't find the answer. Sometimes it isn't out there to find.
In the end Perplexity AI still has hallucinations. And the reason it's not all that hot of a company is because all it's doing is searching or aiding research. It's good at basic search results, with a very low hallucination rate, but the hallucination rate goes up significantly the more complex of a task you are asking it to do.

And it's an incredibly expensive way to go about solving the hallucination problem as well, and again, you'll never 100% do that like you seemed to be claiming.

Agentic tools are a much bigger piece of the pie, that's partly because Gemini is going to crush everyone in search, and they are incredibly powerful tools people are using today to massively save time completing tasks. People are in denial about that, but they still will always hallucinate and the more you ask an AI to do the bigger problem that can become. The companies having executives build software applications for instance are pretty insane if they are unleashing these apps/web sites to the world as they really don't know WTF they are doing lol
 
I'd say aggregation with LLMs is not better or worse than with humans on the whole.

Even hallucinations are often not that bad compared to a human trawling through information on the Internet. Just like humans, LLMs are coded and trained to trust some sources more than others, such as Wikipedia (for all it's faults, it usually is accurate).

I've often searched for some information online and found it conflicting. I've had to make a judgement as to which information to use or make an educated guess drawing on that information. That's what LLMs do too.

I've found LLMs to have gotten much better at this educated guessing and summarisation. And yes, running something through multiple LLMs does help as it offers other perspectives. It can go too far, but often I've found it to fix most hallucinations.
 
Top Bottom