Support NeoGAF

Buggy Loop · 2026-03-27T14:04:15+0000

Google's turboquant is a compression algorithm for LLMs, it just changed the game. For some tasks it reduces memory usage by a factor of 6 while boosting performances by a factor of 8.

TurboQuant: Redefining AI efficiency with extreme compression

research.google

The MLX creator already implemented it and had interesting results

Stock market it already reacted to it?

Could even the gaming GPUs benefit from this algorithm and save on VRAM requirements at every implementations of neural rendering?

Now the question comes for every such advancements that happens for over a hundred year, the Jevons Paradox

Jevons paradox - Wikipedia

en.wikipedia.org

Will the lower RAM requirement just mean that they'll scale these AI centers even more and basically cancel out the expectations of RAM availability? It's almost always like that. Their planning based on RAM scarcity is now relaxed, they can scale more.. Although energy is gonna be the immediate limit.

Fingers crossed it lowers prices

squallbr · 2026-03-27T14:05:42+0000

I hope it can be done until PS6 launch

Kacho · 2026-03-27T14:05:53+0000

@grok is this true?

Sintoid · 2026-03-27T14:06:39+0000

Another Google Revolution like Stadia?

The Cockatrice · 2026-03-27T14:06:44+0000

Kacho said:
@grok is this true?

Yesn't.

StereoVsn · 2026-03-27T14:07:18+0000

Maybe in the long run, but that just means inference is cheaper so they can offer even more. I doubt it will change the current RAM issues, not to mention ridiculous NAND cost or wafer pricing increases.

AJUMP23 · 2026-03-27T14:07:56+0000

They need more product to even out the demand. That would be helpful.

cormack12 · 2026-03-27T14:08:51+0000

Ah I knew it would come down to the classic quant level issue

winjer · 2026-03-27T14:09:12+0000

My fear is that companies will just use this to make bigger models, not to reduce memory usage.

falling into hell · 2026-03-27T14:15:21+0000

winjer said:
My fear is that companies will just use this to make bigger models, not to reduce memory usage.

Of course that's what companies will do until they finally build Skynet.

jshackles · 2026-03-27T14:15:24+0000

That's like saying human greed will disappear

Buggy Loop · 2026-03-27T14:15:42+0000

winjer said:
My fear is that companies will just use this to make bigger models, not to reduce memory usage.

Yea more than likely, it's why I mentioned the Javons paradox.

USA companies will be limited by energy in immediate future because there's no grand projects for adding power, they're even cancelling some because "offshore" by a certain man is not good (look how Europe is scaling offshore stations comparatively).

China will be somewhere else completely energy wise.

ReyBrujo · 2026-03-27T14:16:44+0000

winjer said:
My fear is that companies will just use this to make bigger models, not to reduce memory usage.

Well, that's a given, suddenly you got 6 times more ram? Increase the parameters to one trillion!

If demand falls RAM makers will probably cut supply to keep prices high until their investments pays off. Most of these are fully automated so they don't need to fire people like other industries, they just need to turn bot speed down or turn them off altogether.

winjer · 2026-03-27T14:16:56+0000

Buggy Loop said:
Yea more than likely, when I mentioned the Javons paradox.

USA companies will be limited by energy in immediate future because there's no grand projects for adding power, they're even cancelling some because "offshore" by a certain man is not good (look how Europe is scaling offshore stations comparatively).

China will be somewhere else completely energy wise.

On the other hand, it might be cool to be able to run a 96Gb model on my home PC.

DarkShadowXI · 2026-03-27T14:17:21+0000

winjer said:
My fear is that companies will just use this to make bigger models, not to reduce memory usage.

Ah yes, the classic greedy lifestyle creep. This will absolutely happen.

Holdfing · 2026-03-27T14:18:56+0000

mrqs · 2026-03-27T14:24:08+0000

It will only get worse.

We were gaming with gold and had no ideia it was gold.

Now people are using it like gold.

There's no way to increase production enough on the foreseeable future.

And demand for AI hardware will only grow.

High-end gaming will continue to get more and more expensive. No end in sight.

Throttle · 2026-03-27T14:24:24+0000

This means we can run better llms locally with 16gb vram?

dorkimoe · 2026-03-27T14:26:06+0000

Kacho said:
@grok is this true?

Nein, heil musk

StereoVsn · 2026-03-27T14:27:10+0000

winjer said:
On the other hand, it might be cool to be able to run a 96Gb model on my home PC.

Yeah, running local models kind of sucks now days unless you have multiple 5090s, Pro 6000 series cards or some crazy Mac Studio M3 Ultra configs.

If that memory compression technique works than a lot more capable models can be run locally (including modern phones).

yurinka · 2026-03-27T14:31:04+0000

Buggy Loop said:
RAM-pocalypse to smooth out?
Google's turboquant is a compression algorithm for LLMs, it just changed the game. For some tasks it reduces memory usage by a factor of 6 while boosting performances by a factor of 8.

I'd say it won't smooth out, instead if they reduced memory usage by a factor of 6, they now maybe will train stuff 6x bigger, or will train 6x more stuff.

justiceiro · 2026-03-27T14:34:26+0000

Remember when deepseek launched, and people thought Nvidia run was over? Yeah...

winjer · 2026-03-27T14:34:52+0000

Buggy Loop said:
Yea more than likely, when I mentioned the Javons paradox.

USA companies will be limited by energy in immediate future because there's no grand projects for adding power, they're even cancelling some because "offshore" by a certain man is not good (look how Europe is scaling offshore stations comparatively).

China will be somewhere else completely energy wise.

Another cool thing that might happen is reducing the processing time of things like DLSS4, FSR4 and XeSS.
Imagine the GPU taking 2ms for the upscaling time. Estimations place this time at around 70% for Ml calculations, so 1.4ms.
Now imagine reducing that by 8 times, to 0.18ms. This could make all these upscalers almost completly "free performance".
And a similar thing for Frame Generation and DLSS5.

Diseased Yak · 2026-03-27T14:40:46+0000

lol no. It's not just RAM, it's everything. Micron is out of consumer business entirely, Samsung and Hynix are the only two left and they're sold up through '26 and beyond.
OpenAI's Stargate is accounting for 40% of global DRAM supply alone.
The LPDDR ripple effect is hammering lower level tech and will continue to do so for years. (goodbye any cheap ultra books, cell phones, etc)

The list goes on and on. All of the manufacturers of hardware now view consumer products as a waste of money and time. Why cater to a very small percentage of buyers, who have disparate hardware, who whine and need support, who require marketing dollars, when you can just rake in billions from government and private companies who demand none of those things???

This is going to get way, way worse before it gets better. IF it gets better. 5 years from now we'll all be renting time to play games via streaming because we have no other alternative.

Throttle · 2026-03-27T14:44:47+0000

Diseased Yak said:
lol no. It's not just RAM, it's everything. Micron is out of consumer business entirely, Samsung and Hynix are the only two left and they're sold up through '26 and beyond.
OpenAI's Stargate is accounting for 40% of global DRAM supply alone.
The LPDDR ripple effect is hammering lower level tech and will continue to do so for years. (goodbye any cheap ultra books, cell phones, etc)

The list goes on and on. All of the manufacturers of hardware now view consumer products as a waste of money and time. Why cater to a very small percentage of buyers, who have disparate hardware, who whine and need support, who require marketing dollars, when you can just rake in billions from government and private companies who demand none of those things???

This is going to get way, way worse before it gets better. IF it gets better. 5 years from now we'll all be renting time to play games via streaming because we have no other alternative.

5 years from now is unpredictable. Even 1 year ago the current crisis wasn't predicted.

IntentionalPun · 2026-03-27T14:48:01+0000

This doesn't help training does it? Isn't that where the vast majority of GPUs / memory is used?

Buggy Loop · 2026-03-27T14:48:24+0000

winjer said:
Another cool thing that might happen is reducing the processing time of things like DLSS4, FSR4 and XeSS.
Imagine the GPU taking 2ms for the upscaling time. Estimations place this time at around 70% for Ml calculations, so 1.4ms.
Now imagine reducing that by 8 times, to 1.8ms. This could make all these upscalers almost completly "free performance".
And a similar thing for Frame Generation and DLSS5.

Also all neural features we see coming to GPUs also take a lot of VRAM. This algorithm could not only speed up but also lower VRAM.

Combine this with neural texture compression cutting ~90% of VRAM requirements and bandwidth and there's a path forward in optimizing all this where we go back to sensible memory requirements.

Diseased Yak · 2026-03-27T14:51:32+0000

Throttle said:
5 years from now is unpredictable. Even 1 year ago the current crisis wasn't predicted.

You're right, 5 years is little tough to see, but this year and next year (with some manufacturers saying '28 is sold) seems pretty locked in.

I hope my doom and gloom doesn't happen, that this is just a blip, but with companies raking it in at the expense of consumers... just feels bad.

What do I care, though, if I stopped buying games now I'd have enough backlog to last me the rest of my life most likely

Rentahamster · 2026-03-27T14:53:14+0000

Unless there's diminishing returns, there's no reason they won't still buy more RAM anyway

Buggy Loop · 2026-03-27T14:53:30+0000

IntentionalPun said:
This doesn't help training does it? Isn't that where the vast majority of GPUs / memory is used?

Not for training no, it's for inference. Well never know when one discovery leaks improvements into another but inference so far.

For datacenter workload, it's the opposite. ~2/3 of AI is inference, 1/3 training.

CobraAB · 2026-03-27T14:53:34+0000

The Ram makers are still going to be greedy.

MiguelItUp · 2026-03-27T14:54:43+0000

It's going to be a long time before any of this is going to "smooth out" I'm afraid.

neocycle · 2026-03-27T14:58:05+0000

Buggy Loop said:
Will the lower RAM requirement just mean that they'll scale these AI centers even more and basically cancel out the expectations of RAM availability?

Yes. Nothing else to add.

LQX · 2026-03-27T14:58:11+0000

I doubt this will actually help though as I imagine these companies are locked into contracts at the inflated prices. Also, I think remember reading some even paid in advance.

Holammer · 2026-03-27T14:58:38+0000

Nothing like a good crisis to spur development.
Like how the oil crisis of 1970s gave us energy efficient cars and the Space Race or WW2 pushing technological development through the roof.

Loomy · 2026-03-27T15:00:12+0000

Stock market is reacting to temporary uncertainty.

This doesn't mean price/availability is going to improve for us. Just that the people hoarding it now will be able to hoard more and do more with it. They're bottlenecked as well right now.

But the funniest part of this timeline is google was ahead in the research for all this AI work, tripped on their own dicks when they didn't have a chatbot out at the same time as chatGPT, and will still most likely end up ahead at the end of the day.

Dread it, run from it, Google comes out on top all the same.

Madserb2023 · 2026-03-27T15:03:10+0000

No need because the downward pressure isn't there not even from consumers. We are buying hardware that is more expensive years after release. If Sony is able to sell the PS6 at $700 and sell it a year later at $800 then the pressure is off. Companies are willing to pay the high ram prices. No need to negotiate.

Buggy Loop · 2026-03-27T15:06:35+0000

Loomy said:
Stock market is reacting to temporary uncertainty.

This doesn't mean price/availability is going to improve for us. Just that the people hoarding it now will be able to hoard more and do more with it. They're bottlenecked as well right now.

But the funniest part of this timeline is google was ahead in the research for all this AI work, tripped on their own dicks when they didn't have a chatbot out at the same time as chatGPT, and will still most likely end up ahead at the end of the day.

Dread it, run from it, Google comes out on top all the same.

Google is playing the long game, openAI played the immediate short term game

Google is spread out into so many R&D that inevitably they'll probably have the most game changers

Another one of their research,

Titans + MIRAS: Helping AI have long-term memory

research.google

Bypassing LLM limits to have the model have human like memory handling where it forgets the non important stuffs and introduces long term memory

adamsapple · 2026-03-27T15:09:28+0000

Bojji · 2026-03-27T15:15:44+0000

They will just do more stuff with the memory they have. I doubt their appetite for it will get any lower.

Buggy Loop · 2026-03-27T15:20:08+0000

Bojji said:
They will just do more stuff with the memory they have. I doubt their appetite for it will get any lower.

Yup, but as discussed with winjer, it could benefit greatly also for all the gaming neural rendering coming down the line.

Lower time inference, less VRAM requirement. DLSS, Frame gen, ray reconstruction, (gasp) DLSS 5 (gasp)...

You could slap neural texture compression on sample with less performance hits and keep bandwidth and VRAM requirement a fraction of what it used to be.

yogaflame · 2026-03-27T15:22:55+0000

Buggy Loop said:
Google's turboquant is a compression algorithm for LLMs, it just changed the game. For some tasks it reduces memory usage by a factor of 6 while boosting performances by a factor of 8.

TurboQuant: Redefining AI efficiency with extreme compression

research.google

The MLX creator already implemented it and had interesting results

Stock market it already reacted to it?

Could even the gaming GPUs benefit from this algorithm and save on VRAM requirements at every implementations of neural rendering?

Now the question comes for every such advancements that happens for over a hundred year, the Jevons Paradox

Jevons paradox - Wikipedia

en.wikipedia.org

Will the lower RAM requirement just mean that they'll scale these AI centers even more and basically cancel out the expectations of RAM availability? It's almost always like that. Their planning based on RAM scarcity is now relaxed, they can scale more.. Although energy is gonna be the immediate limit.

Fingers crossed it lowers prices

Hopefully it helps asap

amigastar · 2026-03-27T15:30:56+0000

I currently have 32gigs of RAM but 64gigs would be good honestly. With Firefox and several apps running in the background 32gigs are still sometimes too little i would say.

Loomy · 2026-03-27T15:48:44+0000

Buggy Loop said:
Google is playing the long game, openAI played the immediate short term game

Yup. OpenAI is operating like a startup. Always spending slightly more than they have so they can get to the next round of funding.

RoboFu · 2026-03-27T15:52:06+0000

but what about my car?

Throttle · 2026-03-27T15:59:10+0000

RoboFu said:
but what about my car?

300gb

intbal · 2026-03-27T16:20:53+0000

At this point in time, I would like to invite everyone to invest in my startup company Proman Technology.
Our intention is to develop a variant "Gray Goo" cyborg bacteria that subsists and replicates through consumption of Hafnium and Strontium.
Initial estimates indicate that after general atmospheric release, we could achieve a 75% global destruction of High-K Dielectric materials. Thereby incapacitating Skynet's basic functionality.
It's a race against the machines, people. Do your part!

_Ex_ · 2026-03-27T16:43:17+0000

Using RAM more efficiently doesn't necessarily mean data centers will buy less RAM, it could mean they will buy more RAM to run even more AI processes but more efficiently.

Autistic_Pancake · 2026-03-27T16:58:14+0000

This only concerns the KV-cache quantization.

So let's say we have LLM that occupies 500GB RAM+VRAM or just VRAM. When we load it, we set a certain context window size (like 100k tokens). That's what this "turbo-mumbo-jumbo" aims to reduce in size. So, if it was 500GB + 50GB initially, with this thing it would get down to 500GB + 20GB or something like that.

Long story short, no, this doesn't solve anything unless they find a way to quantize the models themselves aggressively without lobotomizing them so much.

Hemingwayoffbase · 2026-03-27T17:31:27+0000

Pied Piper finally coming to market!

Support NeoGAF

RAM-pocalypse to smooth out?

Gold Member

Member

Gold Member

Member

I'm retarded?

Gold Member

Parody of actual AJUMP23

Gold Member

Gold Member

Member

Gentlemen, we can rebuild it. We have the capability to make the world's first enhanced store. Steam will be that store. Better than it was before.

Gold Member

Member

Gold Member

Gold Member

Member

Member

Member

Member

Gold Member

Member

RAM-pocalypse to smooth out?

Marlboro: Other M

Gold Member

Gold Member

Member

Ask me about my wife's perfect butthole

Gold Member

Gold Member

Rodent Whores

Gold Member

Member

Member

Member

Member

Member

Thinks Microaggressions are Real

Member

Gold Member

Or is it just one of Phil's balls in my throat?

Gold Member

Gold Member

Member

Member

Thinks Microaggressions are Real

One of the green rats

Member

Member

Member

Member

Member

Similar threads