Support NeoGAF

Thebonehead · 2026-03-27T17:40:43+0000

Autistic_Pancake said:
This only concerns the KV-cache quantization.

So let's say we have LLM that occupies 500GB RAM+VRAM or just VRAM. When we load it, we set a certain context window size (like 100k tokens). That's what this "turbo-mumbo-jumbo" aims to reduce in size. So, if it was 500GB + 50GB initially, with this thing it would get down to 500GB + 20GB or something like that.

Long story short, no, this doesn't solve anything unless they find a way to quantize the models themselves aggressively without lobotomizing them so much.

Beat me to it.

It's playing with semantics when they say reduction in model size.

TurboQuant is a compression method that achieves a high reduction in model size with zero accuracy loss, making it ideal for supporting both key-value (KV) cache compression and vector search. It accomplishes this via two key steps:

It just allows for a larger context window - which means analyse the whole repo instead of a couple of files for instance.

PeteBull · 2026-03-27T17:57:36+0000

In not so distant future:

HerjansEagleFeeder · 2026-03-27T19:22:22+0000

If companies are just gonna increase their models parameters, can somebody explain to me why Google doesn't keep this tech to themselves to have a huge advantage over their competitors?

Edit: already cleared up by GAFers who know shit

Deerock71 · 2026-03-27T19:23:58+0000

RAMageddon is better.

poppabk · 2026-03-27T19:36:52+0000

Remember when deepseek crashed Nvidia stock? This is probably that.

LavitzSlambert · 2026-03-27T19:38:13+0000

bender · 2026-03-27T19:39:31+0000

jshackles said:
That's like saying human greed will disappear

It will once I'm finished killing all humans.

sono · 2026-03-27T19:43:08+0000

As others have said unlikely to reduce demand. We are approaching the singularity for AI. The industry cant make enough chips. Terafab etc

K' Dash · 2026-03-27T19:53:55+0000

Sintoid said:
Another Google Revolution like Stadia?

I've seen a few idiotic comments in the last week, but this takes the cake.

How are you comparing Stadia to this? do you even know what you're taking about?

Sintoid · 2026-03-27T19:56:33+0000

K' Dash said:
I've seen a few idiotic comments in the last week, but this takes the cake.

How are you comparing Stadia to this? do you even know what you're taking about?

Take a deep breath and have a nice weekend

Avid Reading Horse · 2026-03-27T20:00:30+0000

Whatever suckers I just walked out of Circuit City and my RAM motherfucking doubled.

Thebonehead · 2026-03-27T20:18:50+0000

Avid Reading Horse said:
Whatever suckers I just walked out of Circuit City and my RAM motherfucking doubled.

I'm old enough to remember

KeplerL2 · 2026-03-27T20:27:05+0000

Buggy Loop said:
Yup, but as discussed with winjer, it could benefit greatly also for all the gaming neural rendering coming down the line.

Lower time inference, less VRAM requirement. DLSS, Frame gen, ray reconstruction, (gasp) DLSS 5 (gasp)...

You could slap neural texture compression on sample with less performance hits and keep bandwidth and VRAM requirement a fraction of what it used to be.

None of the realtime ML models that run on GPUs use KV cache

Durin · 2026-03-27T20:42:52+0000

Buggy Loop said:
Stock market it already reacted to it?

If can reduce the memory need to any degree for some of these companies where they can do more with less, that's a win for consumers getting access...but I suspect this year (maybe the next) will still be a wash.

That said, beneficial for people running AI locally when it comes to context window size, and if can trickle down benefits to Nvidia AI use in gaming that's nice.

Thebonehead · 2026-03-27T22:22:34+0000

KeplerL2 said:
None of the realtime ML models that run on GPUs use KV cache

LLM's like Qwen 3.5, Mistral, LLama 3/4 etc have made massive improvements to how they use kv cache these days.

It's what helps make larger context windows possible as it helps by making it computationally viable

Three · 2026-03-27T23:13:22+0000

Well a lot of people betting that it will

https://www.ft.com/content/e4e15692-187e-4466-832e-ec267e792292?syn-25a6b1a6=1

TheThreadsThatBindUs · 2026-03-27T23:39:25+0000

This is neat technology, but the idea it will have any impact on RAM prices is a complete pipedream. It's wishful thinking at best.

Economics drives RAM pricing, and the competition between the AI backend datacentre superscalers MS, Open AI, Google and Amazon is not going to be eased by a nifty software trick. At most it lets Google better utilize their existing hardware, but that only acts as a force multiplier on their datacentre expansion plans. It in no way will make them want to rein in those plans; as doing so could place them at a massive competitive disadvantage in the market vs their competitors.

SK Hynix, Micron and Samsung all have multiple new fabs in construction, set to start production of wafers in 2028. So it could optimistically be 2030 before supply has a chance to start catching up with demand. But even then, since only Samsung currently still makes DRAM dies for consumer products, with the rest focused entirely on HBM for AI datacentres, it's likely that prices for consumer DRAM may never return to the levels they were previously.

What the semiconductor market needs is a new memory product for consumer computing devices like Re-RAM or phase change memory like 3D flashpoint. These were previously too expensive for the consumer market, but with DRAM shitting the bed, these alternative technologies start to look way more appealing.

For consumer electronic pricing to really return to normal, we need something radical to disrupt the market like graphene based memory cells, compute-in-memory chip designs or some other such technology that would provide an orders of magnitude difference in memory performance for AI workloads.

Mr.Phoenix · 2026-03-27T23:53:26+0000

This whole RAM thing will keep happening. Even if prices drop, something else will happen eventually, and they will go back up. This industry is too top-heavy, as long as there are so few players in manufacturing, then these things will always happen. The demand is just too much.

TheThreadsThatBindUs said:
What the semiconductor market needs is a new memory product for consumer computing devices like Re-RAM or phase change memory like 3D flashpoint. These were previously too expensive for the consumer market, but with DRAM shitting the bed, these alternative technologies start to look way more appealing.

For consumer electronic pricing to really return to normal, we need something radical to disrupt the market like graphene based memory cells, compute-in-memory chip designs or some other such technology that would provide an orders of magnitude difference in memory performance for AI workloads.

What we need is just more supply. There are literally only like 3 companies manufacturing RAM and NAND flash, and only like one company making 2nm processors. And that company's 2nm allotment for 2026 is already sold out to just two companies.

No point in having any new radical RAM tech only to end up with only one company making it or something.

TheThreadsThatBindUs · 2026-03-28T08:21:20+0000

Mr.Phoenix said:
What we need is just more supply.

Well duh!

It's easy to say. But in practice, one new fab isn't going to solve the global consumer DRAM supply issues and a single fab construction takes decades and billions of dollars of investment. Combine that with the fact that AI technology adoption is still in its infancy, and so demand is expected to continue increasing and not stay static, and yeah there aren't enough companies willing to build out new fabs that would be sufficient to address the increasing demand into the future.

If we wanna see RAM prices return to sanity in anything approaching the near term, we need radical new RAM technology that can bifurcate the demand; meaning that AI demand goes after the fancy new technology and consumer demand focuses on traditional DRAM (or vice versa); so that the two markets are no longer competing for the same supply.

Mr.Phoenix said:
There are literally only like 3 companies manufacturing RAM and NAND flash,

Not for consumer products anymore. Only Samsung serves the consumer market now. So even if all three, i.e. SK Hynix, Micron and Samsung, build out new fabs---which they are and will be ready in 2028---only one company's fabs will continue serving the consumer market, and that's nowhere near enough.

Mr.Phoenix said:
No point in having any new radical RAM tech only to end up with only one company making it or something.

Why would only one company make it?

If a radical new memory technology is developed and commercially demonstrated, a new std spec. will be created for it (just like DRAM, HBM et al), and all RAM producers will be able to mass produce products based on the new technology.

If we're talking about something like compute-in-memory designs, it won't even be the RAM producers making it. It will be the logic die fabs like TSMC, Global Foundry, Samsung and Intel. That will massively ease the pressure off the DRAM market.

Support NeoGAF

RAM-pocalypse to smooth out?

Gold Member

Member

Gold Member

Very Poor Hoice of Wor

Cheeks Spread for Digital Only Future

Member

What time is it?

Member

Member

Member

Member

Gold Member

Member

Member

Gold Member

Member

Member

Member

Member

Similar threads