Eh, I don't think this is true in any way. GitHub Co-Pilot like a lot of tools allows you to select a different model, or it has an auto-select mode where it picks a model based on your query. There is no option to tell it to go try all the models and aggregate them.
And none of that eliminates hallucinations even if it could do that. Either way what you just described is incredibly expensive lol
Source: I've used GitHub Copilot and I have taken 2 different trainings that utilize it. Possible I missed something but I doubt it, I just used it on a project last Friday.
I thought copilot was a multi-model aggregator. I asked gemini before I posted it and it gave me this:
Yes, Microsoft Copilot functions as a multi-model aggregator. Rather than relying solely on a single AI, it acts as a central orchestration layer that allows users to route tasks through various leading Large Language Models (LLMs) such as OpenAI's GPT and Anthropic's Claude within the Microsoft 365 ecosystem.
As far as my logic, here goes:
Say you ask a question to Gemini and you get a hallucination.
Are you implying that you could open Chatgpt and get the exact same hallucination?
Let's say you open 3 separate platforms and get 3 hallucinations.
There will be no agreement. So a council style aggregator can catch those.
Let me say this, you will never eliminate hallucination in a single agent model 100% and due to law of large numbers you will never eliminate it across all of AI, but that's the same as saying humans should never make mistakes.
So say I asked what color the sky is.
Claude tells me green
Chatgpt tells me yellow
Gemini and grok tell me it is blue
Now if my AI looks at those results it can tell that blue is most likely the right answer. You can see how at the very least these can be reduced to almost meaninglessness. The AI is a tool but it is also the user's job to mitigate hallucinations. That said, tools like I've mentioned above will allow us to refine this further as we go into the future.
Can we talk about professional metacritic reviews. Reviews are of course opinion but let's imagine they are not. Imagine reviews rather are quantifiable subjective data.
Some of them would hallucinate. They give great games bad scores and shit games great scores.
Aggregate data would then smooth out the spikes and bring us close to a norm. Consensus allows us to not fall for hallucinations. We already use it everyday.