• Hey Guest. Check out your NeoGAF Wrapped 2025 results here!

RAM-pocalypse to smooth out?

This only concerns the KV-cache quantization.

So let's say we have LLM that occupies 500GB RAM+VRAM or just VRAM. When we load it, we set a certain context window size (like 100k tokens). That's what this "turbo-mumbo-jumbo" aims to reduce in size. So, if it was 500GB + 50GB initially, with this thing it would get down to 500GB + 20GB or something like that.

Long story short, no, this doesn't solve anything unless they find a way to quantize the models themselves aggressively without lobotomizing them so much.

Beat me to it.

It's playing with semantics when they say reduction in model size.
TurboQuant is a compression method that achieves a high reduction in model size with zero accuracy loss, making it ideal for supporting both key-value (KV) cache compression and vector search. It accomplishes this via two key steps:

It just allows for a larger context window - which means analyse the whole repo instead of a couple of files for instance.
 
In not so distant future:
JXECc-.gif
 
If companies are just gonna increase their models parameters, can somebody explain to me why Google doesn't keep this tech to themselves to have a huge advantage over their competitors?

Edit: already cleared up by GAFers who know shit
 
Last edited:
As others have said unlikely to reduce demand. We are approaching the singularity for AI. The industry cant make enough chips. Terafab etc
 
Yup, but as discussed with winjer, it could benefit greatly also for all the gaming neural rendering coming down the line.

Lower time inference, less VRAM requirement. DLSS, Frame gen, ray reconstruction, (gasp) DLSS 5 (gasp)...

You could slap neural texture compression on sample with less performance hits and keep bandwidth and VRAM requirement a fraction of what it used to be.

YvZ0isKPXnxURUQZ.png
None of the realtime ML models that run on GPUs use KV cache
 
Stock market it already reacted to it?

hh0gpdqyykrg1.jpeg
Thats Good Donald Glover GIF


If can reduce the memory need to any degree for some of these companies where they can do more with less, that's a win for consumers getting access...but I suspect this year (maybe the next) will still be a wash.

That said, beneficial for people running AI locally when it comes to context window size, and if can trickle down benefits to Nvidia AI use in gaming that's nice.
 
Last edited:
This is neat technology, but the idea it will have any impact on RAM prices is a complete pipedream. It's wishful thinking at best.

Economics drives RAM pricing, and the competition between the AI backend datacentre superscalers MS, Open AI, Google and Amazon is not going to be eased by a nifty software trick. At most it lets Google better utilize their existing hardware, but that only acts as a force multiplier on their datacentre expansion plans. It in no way will make them want to rein in those plans; as doing so could place them at a massive competitive disadvantage in the market vs their competitors.

SK Hynix, Micron and Samsung all have multiple new fabs in construction, set to start production of wafers in 2028. So it could optimistically be 2030 before supply has a chance to start catching up with demand. But even then, since only Samsung currently still makes DRAM dies for consumer products, with the rest focused entirely on HBM for AI datacentres, it's likely that prices for consumer DRAM may never return to the levels they were previously.

What the semiconductor market needs is a new memory product for consumer computing devices like Re-RAM or phase change memory like 3D flashpoint. These were previously too expensive for the consumer market, but with DRAM shitting the bed, these alternative technologies start to look way more appealing.

For consumer electronic pricing to really return to normal, we need something radical to disrupt the market like graphene based memory cells, compute-in-memory chip designs or some other such technology that would provide an orders of magnitude difference in memory performance for AI workloads.
 
This whole RAM thing will keep happening. Even if prices drop, something else will happen eventually, and they will go back up. This industry is too top-heavy, as long as there are so few players in manufacturing, then these things will always happen. The demand is just too much.

What the semiconductor market needs is a new memory product for consumer computing devices like Re-RAM or phase change memory like 3D flashpoint. These were previously too expensive for the consumer market, but with DRAM shitting the bed, these alternative technologies start to look way more appealing.

For consumer electronic pricing to really return to normal, we need something radical to disrupt the market like graphene based memory cells, compute-in-memory chip designs or some other such technology that would provide an orders of magnitude difference in memory performance for AI workloads.
What we need is just more supply. There are literally only like 3 companies manufacturing RAM and NAND flash, and only like one company making 2nm processors. And that company's 2nm allotment for 2026 is already sold out to just two companies.

No point in having any new radical RAM tech only to end up with only one company making it or something.
 
Last edited:
What we need is just more supply.

Well duh!

It's easy to say. But in practice, one new fab isn't going to solve the global consumer DRAM supply issues and a single fab construction takes decades and billions of dollars of investment. Combine that with the fact that AI technology adoption is still in its infancy, and so demand is expected to continue increasing and not stay static, and yeah there aren't enough companies willing to build out new fabs that would be sufficient to address the increasing demand into the future.

If we wanna see RAM prices return to sanity in anything approaching the near term, we need radical new RAM technology that can bifurcate the demand; meaning that AI demand goes after the fancy new technology and consumer demand focuses on traditional DRAM (or vice versa); so that the two markets are no longer competing for the same supply.

There are literally only like 3 companies manufacturing RAM and NAND flash,

Not for consumer products anymore. Only Samsung serves the consumer market now. So even if all three, i.e. SK Hynix, Micron and Samsung, build out new fabs---which they are and will be ready in 2028---only one company's fabs will continue serving the consumer market, and that's nowhere near enough.

No point in having any new radical RAM tech only to end up with only one company making it or something.

Why would only one company make it?

If a radical new memory technology is developed and commercially demonstrated, a new std spec. will be created for it (just like DRAM, HBM et al), and all RAM producers will be able to mass produce products based on the new technology.

If we're talking about something like compute-in-memory designs, it won't even be the RAM producers making it. It will be the logic die fabs like TSMC, Global Foundry, Samsung and Intel. That will massively ease the pressure off the DRAM market.
 
Top Bottom