• Hey Guest. Check out your NeoGAF Wrapped 2025 results here!

RAM-pocalypse to smooth out?

Buggy Loop

Gold Member
Google's turboquant is a compression algorithm for LLMs, it just changed the game. For some tasks it reduces memory usage by a factor of 6 while boosting performances by a factor of 8.


The MLX creator already implemented it and had interesting results




Stock market it already reacted to it?

hh0gpdqyykrg1.jpeg


Could even the gaming GPUs benefit from this algorithm and save on VRAM requirements at every implementations of neural rendering?

Now the question comes for every such advancements that happens for over a hundred year, the Jevons Paradox


Will the lower RAM requirement just mean that they'll scale these AI centers even more and basically cancel out the expectations of RAM availability? It's almost always like that. Their planning based on RAM scarcity is now relaxed, they can scale more.. Although energy is gonna be the immediate limit.

Fingers crossed it lowers prices

Praying The Office GIF
 
Maybe in the long run, but that just means inference is cheaper so they can offer even more. I doubt it will change the current RAM issues, not to mention ridiculous NAND cost or wafer pricing increases.
 
That's like saying human greed will disappear
 
My fear is that companies will just use this to make bigger models, not to reduce memory usage.

Yea more than likely, it's why I mentioned the Javons paradox.

USA companies will be limited by energy in immediate future because there's no grand projects for adding power, they're even cancelling some because "offshore" by a certain man is not good (look how Europe is scaling offshore stations comparatively).

China will be somewhere else completely energy wise.
 
Last edited:
My fear is that companies will just use this to make bigger models, not to reduce memory usage.

Well, that's a given, suddenly you got 6 times more ram? Increase the parameters to one trillion!

If demand falls RAM makers will probably cut supply to keep prices high until their investments pays off. Most of these are fully automated so they don't need to fire people like other industries, they just need to turn bot speed down or turn them off altogether.
 
Yea more than likely, when I mentioned the Javons paradox.

USA companies will be limited by energy in immediate future because there's no grand projects for adding power, they're even cancelling some because "offshore" by a certain man is not good (look how Europe is scaling offshore stations comparatively).

China will be somewhere else completely energy wise.

On the other hand, it might be cool to be able to run a 96Gb model on my home PC.
 
It will only get worse.

We were gaming with gold and had no ideia it was gold.

Now people are using it like gold.

There's no way to increase production enough on the foreseeable future.

And demand for AI hardware will only grow.

High-end gaming will continue to get more and more expensive. No end in sight.
 
On the other hand, it might be cool to be able to run a 96Gb model on my home PC.
Yeah, running local models kind of sucks now days unless you have multiple 5090s, Pro 6000 series cards or some crazy Mac Studio M3 Ultra configs.

If that memory compression technique works than a lot more capable models can be run locally (including modern phones).
 

RAM-pocalypse to smooth out?

Google's turboquant is a compression algorithm for LLMs, it just changed the game. For some tasks it reduces memory usage by a factor of 6 while boosting performances by a factor of 8.
I'd say it won't smooth out, instead if they reduced memory usage by a factor of 6, they now maybe will train stuff 6x bigger, or will train 6x more stuff.
 
Yea more than likely, when I mentioned the Javons paradox.

USA companies will be limited by energy in immediate future because there's no grand projects for adding power, they're even cancelling some because "offshore" by a certain man is not good (look how Europe is scaling offshore stations comparatively).

China will be somewhere else completely energy wise.

Another cool thing that might happen is reducing the processing time of things like DLSS4, FSR4 and XeSS.
Imagine the GPU taking 2ms for the upscaling time. Estimations place this time at around 70% for Ml calculations, so 1.4ms.
Now imagine reducing that by 8 times, to 1.8ms. This could make all these upscalers almost completly "free performance".
And a similar thing for Frame Generation and DLSS5.
 
lol no. It's not just RAM, it's everything. Micron is out of consumer business entirely, Samsung and Hynix are the only two left and they're sold up through '26 and beyond.
OpenAI's Stargate is accounting for 40% of global DRAM supply alone.
The LPDDR ripple effect is hammering lower level tech and will continue to do so for years. (goodbye any cheap ultra books, cell phones, etc)

The list goes on and on. All of the manufacturers of hardware now view consumer products as a waste of money and time. Why cater to a very small percentage of buyers, who have disparate hardware, who whine and need support, who require marketing dollars, when you can just rake in billions from government and private companies who demand none of those things???

This is going to get way, way worse before it gets better. IF it gets better. 5 years from now we'll all be renting time to play games via streaming because we have no other alternative.
 
lol no. It's not just RAM, it's everything. Micron is out of consumer business entirely, Samsung and Hynix are the only two left and they're sold up through '26 and beyond.
OpenAI's Stargate is accounting for 40% of global DRAM supply alone.
The LPDDR ripple effect is hammering lower level tech and will continue to do so for years. (goodbye any cheap ultra books, cell phones, etc)

The list goes on and on. All of the manufacturers of hardware now view consumer products as a waste of money and time. Why cater to a very small percentage of buyers, who have disparate hardware, who whine and need support, who require marketing dollars, when you can just rake in billions from government and private companies who demand none of those things???

This is going to get way, way worse before it gets better. IF it gets better. 5 years from now we'll all be renting time to play games via streaming because we have no other alternative.
5 years from now is unpredictable. Even 1 year ago the current crisis wasn't predicted.
 
Another cool thing that might happen is reducing the processing time of things like DLSS4, FSR4 and XeSS.
Imagine the GPU taking 2ms for the upscaling time. Estimations place this time at around 70% for Ml calculations, so 1.4ms.
Now imagine reducing that by 8 times, to 1.8ms. This could make all these upscalers almost completly "free performance".
And a similar thing for Frame Generation and DLSS5.

Also all neural features we see coming to GPUs also take a lot of VRAM. This algorithm could not only speed up but also lower VRAM.

Combine this with neural texture compression cutting ~90% of VRAM requirements and bandwidth and there's a path forward in optimizing all this where we go back to sensible memory requirements.
 
5 years from now is unpredictable. Even 1 year ago the current crisis wasn't predicted.
You're right, 5 years is little tough to see, but this year and next year (with some manufacturers saying '28 is sold) seems pretty locked in.

I hope my doom and gloom doesn't happen, that this is just a blip, but with companies raking it in at the expense of consumers... just feels bad.

What do I care, though, if I stopped buying games now I'd have enough backlog to last me the rest of my life most likely :messenger_tears_of_joy:
 
This doesn't help training does it? Isn't that where the vast majority of GPUs / memory is used?

Not for training no, it's for inference. Well never know when one discovery leaks improvements into another but inference so far.

For datacenter workload, it's the opposite. ~2/3 of AI is inference, 1/3 training.
 
I doubt this will actually help though as I imagine these companies are locked into contracts at the inflated prices. Also, I think remember reading some even paid in advance.
 
Nothing like a good crisis to spur development.
Like how the oil crisis of 1970s gave us energy efficient cars and the Space Race or WW2 pushing technological development through the roof.
 
Stock market is reacting to temporary uncertainty.

This doesn't mean price/availability is going to improve for us. Just that the people hoarding it now will be able to hoard more and do more with it. They're bottlenecked as well right now.


But the funniest part of this timeline is google was ahead in the research for all this AI work, tripped on their own dicks when they didn't have a chatbot out at the same time as chatGPT, and will still most likely end up ahead at the end of the day.

Dread it, run from it, Google comes out on top all the same.
 
No need because the downward pressure isn't there not even from consumers. We are buying hardware that is more expensive years after release. If Sony is able to sell the PS6 at $700 and sell it a year later at $800 then the pressure is off. Companies are willing to pay the high ram prices. No need to negotiate.
 
Stock market is reacting to temporary uncertainty.

This doesn't mean price/availability is going to improve for us. Just that the people hoarding it now will be able to hoard more and do more with it. They're bottlenecked as well right now.


But the funniest part of this timeline is google was ahead in the research for all this AI work, tripped on their own dicks when they didn't have a chatbot out at the same time as chatGPT, and will still most likely end up ahead at the end of the day.

Dread it, run from it, Google comes out on top all the same.

Google is playing the long game, openAI played the immediate short term game

Google is spread out into so many R&D that inevitably they'll probably have the most game changers

Another one of their research,


Bypassing LLM limits to have the model have human like memory handling where it forgets the non important stuffs and introduces long term memory
 
They will just do more stuff with the memory they have. I doubt their appetite for it will get any lower.

Yup, but as discussed with winjer, it could benefit greatly also for all the gaming neural rendering coming down the line.

Lower time inference, less VRAM requirement. DLSS, Frame gen, ray reconstruction, (gasp) DLSS 5 (gasp)...

You could slap neural texture compression on sample with less performance hits and keep bandwidth and VRAM requirement a fraction of what it used to be.

YvZ0isKPXnxURUQZ.png
 
Last edited:
Google's turboquant is a compression algorithm for LLMs, it just changed the game. For some tasks it reduces memory usage by a factor of 6 while boosting performances by a factor of 8.


The MLX creator already implemented it and had interesting results




Stock market it already reacted to it?

hh0gpdqyykrg1.jpeg


Could even the gaming GPUs benefit from this algorithm and save on VRAM requirements at every implementations of neural rendering?

Now the question comes for every such advancements that happens for over a hundred year, the Jevons Paradox


Will the lower RAM requirement just mean that they'll scale these AI centers even more and basically cancel out the expectations of RAM availability? It's almost always like that. Their planning based on RAM scarcity is now relaxed, they can scale more.. Although energy is gonna be the immediate limit.

Fingers crossed it lowers prices

Praying The Office GIF

Hopefully it helps asap
 
I currently have 32gigs of RAM but 64gigs would be good honestly. With Firefox and several apps running in the background 32gigs are still sometimes too little i would say.
 
Top Bottom