• Hey Guest. Check out your NeoGAF Wrapped 2025 results here!

RAM-pocalypse to smooth out?

Buggy Loop

Gold Member
Google's turboquant is a compression algorithm for LLMs, it just changed the game. For some tasks it reduces memory usage by a factor of 6 while boosting performances by a factor of 8.


The MLX creator already implemented it and had interesting results




Stock market it already reacted to it?

hh0gpdqyykrg1.jpeg


Could even the gaming GPUs benefit from this algorithm and save on VRAM requirements at every implementations of neural rendering?

Now the question comes for every such advancements that happens for over a hundred year, the Jevons Paradox


Will the lower RAM requirement just mean that they'll scale these AI centers even more and basically cancel out the expectations of RAM availability? It's almost always like that. Their planning based on RAM scarcity is now relaxed, they can scale more.. Although energy is gonna be the immediate limit.

Fingers crossed it lowers prices

Praying The Office GIF
 
Maybe in the long run, but that just means inference is cheaper so they can offer even more. I doubt it will change the current RAM issues, not to mention ridiculous NAND cost or wafer pricing increases.
 
That's like saying human greed will disappear
 
My fear is that companies will just use this to make bigger models, not to reduce memory usage.

Yea more than likely, it's why I mentioned the Javons paradox.

USA companies will be limited by energy in immediate future because there's no grand projects for adding power, they're even cancelling some because "offshore" by a certain man is not good (look how Europe is scaling offshore stations comparatively).

China will be somewhere else completely energy wise.
 
Last edited:
My fear is that companies will just use this to make bigger models, not to reduce memory usage.

Well, that's a given, suddenly you got 6 times more ram? Increase the parameters to one trillion!

If demand falls RAM makers will probably cut supply to keep prices high until their investments pays off. Most of these are fully automated so they don't need to fire people like other industries, they just need to turn bot speed down or turn them off altogether.
 
Yea more than likely, when I mentioned the Javons paradox.

USA companies will be limited by energy in immediate future because there's no grand projects for adding power, they're even cancelling some because "offshore" by a certain man is not good (look how Europe is scaling offshore stations comparatively).

China will be somewhere else completely energy wise.

On the other hand, it might be cool to be able to run a 96Gb model on my home PC.
 
It will only get worse.

We were gaming with gold and had no ideia it was gold.

Now people are using it like gold.

There's no way to increase production enough on the foreseeable future.

And demand for AI hardware will only grow.

High-end gaming will continue to get more and more expensive. No end in sight.
 
On the other hand, it might be cool to be able to run a 96Gb model on my home PC.
Yeah, running local models kind of sucks now days unless you have multiple 5090s, Pro 6000 series cards or some crazy Mac Studio M3 Ultra configs.

If that memory compression technique works than a lot more capable models can be run locally (including modern phones).
 

RAM-pocalypse to smooth out?

Google's turboquant is a compression algorithm for LLMs, it just changed the game. For some tasks it reduces memory usage by a factor of 6 while boosting performances by a factor of 8.
I'd say it won't smooth out, instead if they reduced memory usage by a factor of 6, they now maybe will train stuff 6x bigger, or will train 6x more stuff.
 
Yea more than likely, when I mentioned the Javons paradox.

USA companies will be limited by energy in immediate future because there's no grand projects for adding power, they're even cancelling some because "offshore" by a certain man is not good (look how Europe is scaling offshore stations comparatively).

China will be somewhere else completely energy wise.

Another cool thing that might happen is reducing the processing time of things like DLSS4, FSR4 and XeSS.
Imagine the GPU taking 2ms for the upscaling time. Estimations place this time at around 70% for Ml calculations, so 1.4ms.
Now imagine reducing that by 8 times, to 0.18ms. This could make all these upscalers almost completly "free performance".
And a similar thing for Frame Generation and DLSS5.
 
Last edited:
lol no. It's not just RAM, it's everything. Micron is out of consumer business entirely, Samsung and Hynix are the only two left and they're sold up through '26 and beyond.
OpenAI's Stargate is accounting for 40% of global DRAM supply alone.
The LPDDR ripple effect is hammering lower level tech and will continue to do so for years. (goodbye any cheap ultra books, cell phones, etc)

The list goes on and on. All of the manufacturers of hardware now view consumer products as a waste of money and time. Why cater to a very small percentage of buyers, who have disparate hardware, who whine and need support, who require marketing dollars, when you can just rake in billions from government and private companies who demand none of those things???

This is going to get way, way worse before it gets better. IF it gets better. 5 years from now we'll all be renting time to play games via streaming because we have no other alternative.
 
lol no. It's not just RAM, it's everything. Micron is out of consumer business entirely, Samsung and Hynix are the only two left and they're sold up through '26 and beyond.
OpenAI's Stargate is accounting for 40% of global DRAM supply alone.
The LPDDR ripple effect is hammering lower level tech and will continue to do so for years. (goodbye any cheap ultra books, cell phones, etc)

The list goes on and on. All of the manufacturers of hardware now view consumer products as a waste of money and time. Why cater to a very small percentage of buyers, who have disparate hardware, who whine and need support, who require marketing dollars, when you can just rake in billions from government and private companies who demand none of those things???

This is going to get way, way worse before it gets better. IF it gets better. 5 years from now we'll all be renting time to play games via streaming because we have no other alternative.
5 years from now is unpredictable. Even 1 year ago the current crisis wasn't predicted.
 
Another cool thing that might happen is reducing the processing time of things like DLSS4, FSR4 and XeSS.
Imagine the GPU taking 2ms for the upscaling time. Estimations place this time at around 70% for Ml calculations, so 1.4ms.
Now imagine reducing that by 8 times, to 1.8ms. This could make all these upscalers almost completly "free performance".
And a similar thing for Frame Generation and DLSS5.

Also all neural features we see coming to GPUs also take a lot of VRAM. This algorithm could not only speed up but also lower VRAM.

Combine this with neural texture compression cutting ~90% of VRAM requirements and bandwidth and there's a path forward in optimizing all this where we go back to sensible memory requirements.
 
5 years from now is unpredictable. Even 1 year ago the current crisis wasn't predicted.
You're right, 5 years is little tough to see, but this year and next year (with some manufacturers saying '28 is sold) seems pretty locked in.

I hope my doom and gloom doesn't happen, that this is just a blip, but with companies raking it in at the expense of consumers... just feels bad.

What do I care, though, if I stopped buying games now I'd have enough backlog to last me the rest of my life most likely :messenger_tears_of_joy:
 
I doubt this will actually help though as I imagine these companies are locked into contracts at the inflated prices. Also, I think remember reading some even paid in advance.
 
Nothing like a good crisis to spur development.
Like how the oil crisis of 1970s gave us energy efficient cars and the Space Race or WW2 pushing technological development through the roof.
 
Stock market is reacting to temporary uncertainty.

This doesn't mean price/availability is going to improve for us. Just that the people hoarding it now will be able to hoard more and do more with it. They're bottlenecked as well right now.


But the funniest part of this timeline is google was ahead in the research for all this AI work, tripped on their own dicks when they didn't have a chatbot out at the same time as chatGPT, and will still most likely end up ahead at the end of the day.

Dread it, run from it, Google comes out on top all the same.
 
No need because the downward pressure isn't there not even from consumers. We are buying hardware that is more expensive years after release. If Sony is able to sell the PS6 at $700 and sell it a year later at $800 then the pressure is off. Companies are willing to pay the high ram prices. No need to negotiate.
 
Stock market is reacting to temporary uncertainty.

This doesn't mean price/availability is going to improve for us. Just that the people hoarding it now will be able to hoard more and do more with it. They're bottlenecked as well right now.


But the funniest part of this timeline is google was ahead in the research for all this AI work, tripped on their own dicks when they didn't have a chatbot out at the same time as chatGPT, and will still most likely end up ahead at the end of the day.

Dread it, run from it, Google comes out on top all the same.

Google is playing the long game, openAI played the immediate short term game

Google is spread out into so many R&D that inevitably they'll probably have the most game changers

Another one of their research,


Bypassing LLM limits to have the model have human like memory handling where it forgets the non important stuffs and introduces long term memory
 
They will just do more stuff with the memory they have. I doubt their appetite for it will get any lower.

Yup, but as discussed with winjer, it could benefit greatly also for all the gaming neural rendering coming down the line.

Lower time inference, less VRAM requirement. DLSS, Frame gen, ray reconstruction, (gasp) DLSS 5 (gasp)...

You could slap neural texture compression on sample with less performance hits and keep bandwidth and VRAM requirement a fraction of what it used to be.

YvZ0isKPXnxURUQZ.png
 
Last edited:
Google's turboquant is a compression algorithm for LLMs, it just changed the game. For some tasks it reduces memory usage by a factor of 6 while boosting performances by a factor of 8.


The MLX creator already implemented it and had interesting results




Stock market it already reacted to it?

hh0gpdqyykrg1.jpeg


Could even the gaming GPUs benefit from this algorithm and save on VRAM requirements at every implementations of neural rendering?

Now the question comes for every such advancements that happens for over a hundred year, the Jevons Paradox


Will the lower RAM requirement just mean that they'll scale these AI centers even more and basically cancel out the expectations of RAM availability? It's almost always like that. Their planning based on RAM scarcity is now relaxed, they can scale more.. Although energy is gonna be the immediate limit.

Fingers crossed it lowers prices

Praying The Office GIF

Hopefully it helps asap
 
I currently have 32gigs of RAM but 64gigs would be good honestly. With Firefox and several apps running in the background 32gigs are still sometimes too little i would say.
 
At this point in time, I would like to invite everyone to invest in my startup company Proman Technology.
Our intention is to develop a variant "Gray Goo" cyborg bacteria that subsists and replicates through consumption of Hafnium and Strontium.
Initial estimates indicate that after general atmospheric release, we could achieve a 75% global destruction of High-K Dielectric materials. Thereby incapacitating Skynet's basic functionality.
It's a race against the machines, people. Do your part!
 
Using RAM more efficiently doesn't necessarily mean data centers will buy less RAM, it could mean they will buy more RAM to run even more AI processes but more efficiently.
 
This only concerns the KV-cache quantization.

So let's say we have LLM that occupies 500GB RAM+VRAM or just VRAM. When we load it, we set a certain context window size (like 100k tokens). That's what this "turbo-mumbo-jumbo" aims to reduce in size. So, if it was 500GB + 50GB initially, with this thing it would get down to 500GB + 20GB or something like that.

Long story short, no, this doesn't solve anything unless they find a way to quantize the models themselves aggressively without lobotomizing them so much.
 
Top Bottom