Support NeoGAF

ethomaz · Nov 12, 2021

Probably fake? I don't know.

BTW the nVidia FLOPS needs a explanation.

Until Turing, nVidia SP could process INT + FP32 at the same unit... so no matter what you could at any time use all the SP units for FP32.

With Ampere, nVidia SP could process either INT + FP32 or FP32 + FP32... in simple terms nVidia changed the INT part of the unit to process FR32 too... so in easy terms it can do now INT or FP32 + FP32... that generate double the TFs from Turing to Ampere with the same number of the units and clock.

But there is a catch here... if you are using the INT or FP32 part to INT you can't use to FP32... you can do one of these.

So the TFs are a big misleading.

Let's take some examples using the Boot Clock (Avg):

RTX 3080:
FP32 free to use all the time: 15TFs
FP32 to use when not using for INT: 15TFs

RXT 2080:
FP32 free to use all the time: 10TFs

The difference is around 5TFs only but if you are not using the INT to do INT ou can do FP32 boosting the TFs difference to 20TFs... the issue is that 15TFs are used either to INT or FP32... so it is of very limited use... you still have to do INT math in games so what is not used for INT will be used for FP32 but that depend of game, engine, etc... in pratical terms it won't be anywhere near to 30TFs being used for FP32.

THE DUCK · Nov 12, 2021

DonkeyPunchJr said:
There should be an official "bitching about how it's hard to buy a GPU" thread. You don't need to shit post about it every time some new tidbit of information comes out about some rumored future product.

How about you just let us complain about a real issue and move on. Sorry about your sensitivity.

Makoto-Yuki · Nov 12, 2021

i would care if there was any chance of buying one and if they didn't output similiar heat as the sun.

i remember buying a "high end" GPU for £220 lmao and when a 500W psu was "overkill".

DonkeyPunchJr · Nov 12, 2021

dvdvideo said:
How about you just let us complain about a real issue and move on. Sorry about your sensitivity.

Okay hope it makes you feel better. Thanks for shitting up yet another thread with low effort repetitive noise.

THE DUCK · Nov 12, 2021

DonkeyPunchJr said:
Okay hope it makes you feel better. Thanks for shitting up yet another thread with low effort repetitive noise.

Your the one turning one post into 4, the irony is massive. The only thing low effort here is your weak attempt to control what people say on this board.

LiquidMetal14 · Nov 12, 2021

b0uncyfr0 said:
Can we do 4k 120hz yet..?

Not without concessions.

On a 5900x/3090 I'm in the 80-120+ range maxed.

I suspect these new cards will be the thing to carry you through on the high end for a few years.

Kamina · Nov 12, 2021

So no stock at 2k+ €/$/£

TheThreadsThatBindUs · Nov 12, 2021

Bo_Hazem said:
Would love to see 4K@120fps with path-tracing then! Massive jump.

85 TFLOPs will be barely enough for 1080p 30fps with full Monte Carlo path-tracing.

BusierDonkey · Nov 12, 2021

UnNamed said:
How many Terra?

$3500 worth right here:

Or you could get a 4090 for the same price...

TheThreadsThatBindUs · Nov 12, 2021

SantaC said:
Things are ramping up. 2022 will be very interesting techvise.

These are just rumors, but if things are true then Intel will get a hard time to enter the game.

How the fuck do you feed a 76TFLOP shader array with only a 256-bit bus?

Even at 18 Gbit/s per chip, on a 256-bit bus that's only 576 GB/s bandwidth. That's 29% higher than the PS5 with its 10 TFLOPs GPU, even with 512MB of Infinity Cache I'd be surprised if that cache amount is enough to offset the minuscule bandwidth availability relative to GPU performance.

I can easily see these cards being bandwidth starved.

HBM just hasn't come down in price enough and it will hurt high-end GPUs and potentially next-gen consoles unless it does.

iHaunter · Nov 12, 2021

DeepEnigma said:
And you won't ever be able to buy one MSRP!

I walked into Microcenter and scooped up a 3080 Ti.

DeepEnigma · Nov 12, 2021

iHaunter said:
I walked into Microcenter and scooped up a 3080 Ti.

DonkeyPunchJr · Nov 12, 2021

TheThreadsThatBindUs said:
How the fuck do you feed a 76TFLOP shader array with only a 256-bit bus?

Even at 18 Gbit/s per chip, on a 256-bit bus that's only 576 GB/s bandwidth. That's 29% higher than the PS5 with its 10 TFLOPs GPU, even with 512MB of Infinity Cache I'd be surprised if that cache amount is enough to offset the minuscule bandwidth availability relative to GPU performance.

I can easily see these cards being bandwidth starved.

HBM just hasn't come down in price enough and it will hurt high-end GPUs and potentially next-gen consoles unless it does.

Supposedly the cache hit rate is 58% at 4K for 6000 series. Quadruple the cache should make a big improvement. I think it'll have plenty of bandwidth.

thicc_girls_are_teh_best · Nov 12, 2021

Jeebus Cwist what game needs that many TFLOPs?

Also these top-end cards gonna be hitting 500 watt TDP easy (once you add in the memory, of course).

thicc_girls_are_teh_best · Nov 12, 2021

P.S: FWIW TF isn't even what I'd pay too much attention to here, because those numbers are just looking excess for sake of excess (good for 3D rendering farms tho I'm sure).

E.G a 3090 has over 3x (300%) the TF of a PS5 but barely 33% more pixel fillrate throughput. Take a guess which one is more relevant for gaming perf (and also FWIW, due to Ampere arch the 3090 wasn't really over 2x over 2080 Ti in practice, closer to 50% better in practical terms).

Hoddi · Nov 12, 2021

thicc_girls_are_teh_best said:
E.G a 3090 has over 3x (300%) the TF of a PS5 but barely 33% more pixel fillrate throughput. Take a guess which one is more relevant for gaming perf (and also FWIW, due to Ampere arch the 3090 wasn't really over 2x over 2080 Ti in practice, closer to 50% better in practical terms).

People overestimate pixel fillrate in modern games. You also need higher bandwidth to take advantage of higher fillrate and you wouldn't see a benefit without that. I'm not sure why it's a thing now but it's wrong for the vast majority of games..

Here's a 4k capture of Control that shows this pretty plainly. I've highlighted how loaded the color ROPs are ('CROP Throughput') and you'll also find the summary on the right. In contrast, the pixel and compute shaders ('Warps') are highly loaded almost the entire time.

tusharngf · Nov 12, 2021

I can still sell my 3070 and get something good on launch. Anyways we are going to get monster cards in a year or two. Competition is always good. By 2024 ARIZONA TSMC foundry will be finished. There are more fabs in development now than before due to chip shortage.

Celcius · Nov 13, 2021

Sure it may be powerful... but will it require a nuclear plant to power it, generate more heat than the sun, and be more expensive than Jensen's jacket?

Loxus · Nov 13, 2021

iHaunter said:
I walked into Microcenter and scooped up a 3080 Ti.

I saw a 3070TI today in my Island in the Caribbean, and I was surprised that it's available where I'm from with the chip shortages.

THE DUCK · Nov 13, 2021

DonkeyPunchJr said:
Okay hope it makes you feel better. Thanks for shitting up yet another thread with low effort repetitive noise.

Actually I thought about, you are actually right, it doesn't make sense to complain on every one of these threads. It's my frustration with the lack of buyable product, but its not constructive, my apologies.

Loxus · Nov 13, 2021

TheThreadsThatBindUs said:
How the fuck do you feed a 76TFLOP shader array with only a 256-bit bus?

Even at 18 Gbit/s per chip, on a 256-bit bus that's only 576 GB/s bandwidth. That's 29% higher than the PS5 with its 10 TFLOPs GPU, even with 512MB of Infinity Cache I'd be surprised if that cache amount is enough to offset the minuscule bandwidth availability relative to GPU performance.

I can easily see these cards being bandwidth starved.

HBM just hasn't come down in price enough and it will hurt high-end GPUs and potentially next-gen consoles unless it does.

I kind of wished HBM was pushed more, but it's major con is implementation (interposer and substrate) when compared to GDDR.

But what I don't understand is the memory setup.

256bit bus/32 GB, that's 16 chips.
Or probably 8/4 GB chips?

I'm glad AMD hasn't given up on HBM as it's on their CDNA2 GPUs.

Hopefully it returns to gaming GPUs with HBM3.

I think Sony and probably Microsoft, will utilize HBM in the future as seen by a Sony multi-APU patent.

Fig. 3&5 shows an HBM solution, while fig. 4 looks like a GDDR solution.

Imagine an Infinity Cache + HBM solution though. Insane performance.

lh032 · Nov 13, 2021

Fantastic, here comes the TF flexing again.

Bogeyman · Nov 13, 2021

How many TFLOPS are needed to fully path traced rendering of modern games? Does anyone know of sensible estimates from industry professionals?

SantaC · Nov 13, 2021

DeepEnigma said:

TheThreadsThatBindUs · Nov 13, 2021

Loxus said:
I kind of wished HBM was pushed more, but it's major con is implementation (interposer and substrate) when compared to GDDR.

Yeah, the additional packaging steps required will invariably impact overall yields and volumes.

I'm not sure they'd be able to get 20+ million MCMs per annum with HBM... not currently at least.

Loxus said:
But what I don't understand is the memory setup.

256bit bus/32 GB, that's 16 chips.
Or probably 8/4 GB chips?

I'm not sure if 4GB chips exist yet, but 16 chips in clamshell mode like the PS4 is easily possible.

Loxus said:
I'm glad AMD hasn't given up on HBM as it's on their CDNA2 GPUs.

Hopefully it returns to gaming GPUs with HBM3.

I think Sony and probably Microsoft, will utilize HBM in the future as seen by a Sony multi-APU patent.

Fig. 3&5 shows an HBM solution, while fig. 4 looks like a GDDR solution.

Imagine an Infinity Cache + HBM solution though. Insane performance.

We can only dream.

That said, HBM + infinity cache is redundant when the HBM gives you TB/s worth of bandwidth to memory.

kikkis · Nov 13, 2021

Bogeyman said:
How many TFLOPS are needed to fully path traced rendering of modern games? Does anyone know of sensible estimates from industry professionals?

Probably path traced is not smart approach but tim sweeney has suggested 40 teraflops for realistic. Caveat is that timmy is full of shit.

Alexios · Nov 13, 2021

So, do people think we're gonna get GPU prices come back down to normal like, ever? Like getting a very high end card (just not ti/titan) for ~500? I jumped on my 1080 for 506 after availability issues had just started to become widespread back then (iirc) and it was usually going for higher prices. Those days seem to be behind us forever and ever to be honest. Even when the chip drought ends, if there's no competition as of right now, I don't see Nvidia wanting to sell anything but low end stuff for such prices and keep the high end for like 800 or 1000 or more (I mean, the 3080 launched at like 700 euros, never mind what it skyrocketed to after distribution issues), just not the crazy scalper prices. Has there been any real pushback from big publishers? Like surely if there are no high end GPUs around (to play games, not mine) what's the reason to dev/release games that utilize them other than a deal with Nvidia to include RTX or whatever? Just go low end, keep your games cross generation on console too since they have their own issues in the "next gen" there so that you reach the most people possible with your game should be a valid strategy, no? RTX maxed out videos will only be impressive for so long before people realize they won't be getting that at home, no? Especially since issues have lasted way longer than expected, it could result in a blow for the platform if games keep increasing their requirements just because way better hardware theoretically exists but can't be bought by most gamers who were buying high end cards up until a couple years ago but now find their hobby turning to some deluxe premium pricing tiers as the baseline with anything less not offering a very good experience or even much of an upgrade.

Haggard · Nov 14, 2021

Alexios said:
So, do people think we're gonna get GPU prices come back down to normal like, ever? Like getting a very high end card (just not ti/titan) for ~500?

Very high end for 500 hasn't been normal since the 5xx gen...
That will never come back. The sheer complexity of the tech has multiplied over the years.
And the only chance for prices to get into regions with non-scalper-profit margins again is fierce competition and production levels where it is not guaranteed that every card that leaves the factory is sold immediately.
We're at least another year or maybe even two away from that judging by what we heard from the industry.

Midn1ght · Nov 14, 2021

Scalpers and pandemic aside, it's the lack of control over retail msrp that is infuriating at least here in EU.

Consoles are super hard to find in stores just like graphic cards, but if you're lucky, you'll get the thing at the normal price.
The worst that can happen is them forcing you to buy a bundle with an extra controller or a game, still shitty but you're paying the normal price for what you get.

GPU in stores? They just inflated the prices by 300 - 400 Euros just because.

Fuck them.

BattleScar · Nov 14, 2021

ethomaz said:
Probably fake? I don't know.

BTW the nVidia FLOPS needs a explanation.

Until Turing, nVidia SP could process INT + FP32 at the same unit... so no matter what you could at any time use all the SP units for FP32.

With Ampere, nVidia SP could process either INT + FP32 or FP32 + FP32... in simple terms nVidia changed the INT part of the unit to process FR32 too... so in easy terms it can do now INT or FP32 + FP32... that generate double the TFs from Turing to Ampere with the same number of the units and clock.

But there is a catch here... if you are using the INT or FP32 part to INT you can't use to FP32... you can do one of these.

So the TFs are a big misleading.

Let's take some examples using the Boot Clock (Avg):

RTX 3080:
FP32 free to use all the time: 15TFs
FP32 to use when not using for INT: 15TFs

RXT 2080:
FP32 free to use all the time: 10TFs

The difference is around 5TFs only but if you are not using the INT to do INT ou can do FP32 boosting the TFs difference to 20TFs... the issue is that 15TFs are used either to INT or FP32... so it is of very limited use... you still have to do INT math in games so what is not used for INT will be used for FP32 but that depend of game, engine, etc... in pratical terms it won't be anywhere near to 30TFs being used for FP32.

Allow me to offer some clarirfications.

All Nvidia architectures with unified shaders (i.e Fermi onwards) up to Turinghave had vector ALUs which can EITHER run 1 FP32 operation OR 1 INT32 operation (its actually a bit more complicated than that because of fused multiply-add instructions but whatever). What this means is that the CUDA core (which is Nvidia speak for Vector ALU) could do only run integer math or floating point math sequentially.
This can be represented as

[ FP / INT ]

Turing broke this by adding a separating the FP and INT pathways, so they could both operate asynchronously (i.e at the same time).
This can be represented as.

[ FP ] + [ INT ]

Ampere upgraded Turing's concept by improving the INT units to also be able to do FP math, like the original CUDA cores from Fermi through to Pascal.
So what we have here is:

[ FP ] + [ FP / INT ]

This means when there is no integer code running on Ampere, GA102 can spit out 36 TF. If there is INT code blocking the pipline then half it.
Ampere is extremely powerful at compute for this reason.

As for Lovelace, we don't know what changes Nvidia will have made to the SM's. They could separate the INT and FP pipelines again so we end up with an [FP] + [FP] + [INT] setup in the SM, and with 144 SM's that's a shitload of compute throughput.
How well is it fed? Who knows. Not outside the realm of possibility though.

base · Nov 14, 2021

2022 will be out of stock.

Lethal01 · Nov 14, 2021

Bogeyman said:
How many TFLOPS are needed to fully path traced rendering of modern games? Does anyone know of sensible estimates from industry professionals?

I guess I'd say it would be at minimum 70 nvidia flops. Could easily be triple

Anchovie123 · Nov 14, 2021

Not surprised at all from these numbers. TSMC are doing gods work. Moore's Law is alive and well.

BattleScar · Nov 14, 2021

Lethal01 said:
I guess I'd say it would be at minimum 70 nvidia flops. Could easily be triple

I don't think so. That uses dedicated RT hardware from 4x TU102 GPUs. The Raw compute needed from general purpose compute shaders for ray tracing would be astronomically high.

SolidQ · Nov 14, 2021

How many TFLOPS are needed to fully path traced rendering of modern games?

limited path traced you will see soon. Full path traced until Playstation 8 we won't see, do you understand how much power need for like Battlefield 128 vs 128 with full path traced?

ByWatterson · Nov 14, 2021

UnNamed said:
How many Terra?

World of Financial Ruin

Kenpachii · Nov 14, 2021

thicc_girls_are_teh_best said:
Jeebus Cwist what game needs that many TFLOPs?

Also these top-end cards gonna be hitting 500 watt TDP easy (once you add in the memory, of course).

Every game if you push the settings far enough.

Mister Wolf · Nov 14, 2021

Romulus said:
Remember when it mattered to have uber-powerful PCs? And there were games that were made specifically for them? Now we're playing console versions at higher resolutions.

I'll take it. You should see Tales Of Arise running at 4K 120hz.

Kenpachii · Nov 14, 2021

Romulus said:
Remember when it mattered to have uber-powerful PCs? And there were games that were made specifically for them? Now we're playing console versions at higher resolutions.

Yea i remember those great times of buying the top end GPU and slamming it to max settings just to see single digit performance, u had to wait for 3 further generations of hardware to even play it in its full glory. At best u could get 30 fps at average settings. Then a new game came out a year later that was builded for the next series of GPU's which resulted in you having to upgrade pretty much straight up again if you wanted those goodies because nvidia made sure to cripple the old gpu's enough for you to require upgrading ( low v-ram or focusing that new game on a feature that there new card offers )
Have terrible optimisation with it and it was absolute a disaster.

A good example is metro 2033, took what? 6 gpu generations to actually get it going at max settings? hell they had to revamp the entire game for the PS4 to even be able to run it. my 580 which wasn't up for the task even remotely.

Another more modern good example is Cyberpunk, slam it at 4k at ultra settings and watch ampere die in front of you. If last gen consoles where not a thing, that game would probably hit 1080p 30 fps on 3090.

What consoles added for PC gamers
- Baseline, cheap hardware can play games
- Better optimisation
- High framerate gaming
- Higher resolution gaming
- Good performance on any piece of hardware all around.

Honestly i never wanna go back to the old times where they did the shit they did. It was horrible.

Now however its not all sunshine and rainbows, what i do feel like is that games are evolving way to slow, production and making of games need to speed up considerable and we need to be able to actually scale visuals in a more complex manner then just a few more shadows here and there.

Chris_Rivera · Nov 14, 2021

dvdvideo said:
How about you just let us complain about a real issue and move on. Sorry about your sensitivity.

Klik · Nov 14, 2021

If we get 40-50TF AMD GPU in 2022 that would mean we're probably gonna have 40-60TF gpu in PS5 Pro around 2024 which is great, 4x increase and better RT.

Bernd Lauert · Nov 14, 2021

Hoddi said:
I'm not sure why it's a thing now but it's wrong for the vast majority of games..

PS5 beats XSX in pixel fillrate. That's why people keep mentioning it now.

I literally can't remember the last time pixel fillrate was a real bottleneck, outside of very specific and rare cases.

Loxus · Nov 14, 2021

TheThreadsThatBindUs said:
Yeah, the additional packaging steps required will invariably impact overall yields and volumes.

I'm not sure they'd be able to get 20+ million MCMs per annum with HBM... not currently at least.

I'm not sure if 4GB chips exist yet, but 16 chips in clamshell mode like the PS4 is easily possible.

We can only dream.

That said, HBM + infinity cache is redundant when the HBM gives you TB/s worth of bandwidth to memory.

I thought having 16 chips would of gone against the reason for implementing Infinity Cache.
Which is the engineering team's desire to avoid using a super-expensive, and thirsty 512-bit memory bus.

Surly 16 chips with a 256-bit bus can't be that much less expensive and power hungry than going with a 512-bit bus, which is 16 chips also.

With RDNA3, Infinity Cache is used for more than increasing bandwidth.
It also connects the chiplets and other stuff.

Also if your going with 32 GB HBM and want a higher bandwidth, but also less stacks.
You can go Infinity Cache + 2 stacks of HBM @ 2.8 Gbps, instead of 4 stacks.
Which not only reduces cost but also reduces implementation difficulties and much more power efficient than 32 GB of GDDR6.

Bogeyman · Nov 14, 2021

BattleScar said:
I don't think so. That uses dedicated RT hardware from 4x TU102 GPUs. The Raw compute needed from general purpose compute shaders for ray tracing would be astronomically high.

That's very interesting. Sounds higher than I would've thought.

I don't know much about the area. From what I know, in terms of geometric complexity, path tracing scales sublinear (logarithmically) with poly count. Since we can already just about path trace quake2, Minecraft etc, I'd have thought something like 70tf could might getting close.

(Then again, I guess there's a lot more than poly count involved, from material and probably a whole bunch of other factors?!)

Romulus · Nov 14, 2021

Kenpachii said:
Yea i remember those great times of buying the top end GPU and slamming it to max settings just to see single digit performance, u had to wait for 3 further generations of hardware to even play it in its full glory. At best u could get 30 fps at average settings. Then a new game came out a year later that was builded for the next series of GPU's which resulted in you having to upgrade pretty much straight up again if you wanted those goodies because nvidia made sure to cripple the old gpu's enough for you to require upgrading ( low v-ram or focusing that new game on a feature that there new card offers )
Have terrible optimisation with it and it was absolute a disaster.

A good example is metro 2033, took what? 6 gpu generations to actually get it going at max settings? hell they had to revamp the entire game for the PS4 to even be able to run it. my 580 which wasn't up for the task even remotely.

Another more modern good example is Cyberpunk, slam it at 4k at ultra settings and watch ampere die in front of you. If last gen consoles where not a thing, that game would probably hit 1080p 30 fps on 3090.

What consoles added for PC gamers
- Baseline, cheap hardware can play games
- Better optimisation
- High framerate gaming
- Higher resolution gaming
- Good performance on any piece of hardware all around.

Honestly i never wanna go back to the old times where they did the shit they did. It was horrible.

Now however its not all sunshine and rainbows, what i do feel like is that games are evolving way to slow, production and making of games need to speed up considerable and we need to be able to actually scale visuals in a more complex manner then just a few more shadows here and there.

It was case by case. You can come up with examples of PC games that didn't run well at launch, but many that were completely playable and next level going back to the late 90s. The main draw was you could play something not possible on the console. That's basically a thing of the past. No reason we can't have both now, all the console games and the marvels that take full advantage of PC grunt, especially considering the disparity in power that's coming.

thicc_girls_are_teh_best · Nov 14, 2021

Kenpachii said:
Every game if you push the settings far enough.

Except that's not true, unless that game has an engine that relies very heavily on mesh compute. Otherwise, you're seeing giant TF leaps but modest (at best) increases in culling throughput, rasterization throughput, pixel fillrate, texture/texel fillrate etc.

You know, things that are a bit more important for gaming-related performances, at least until mesh shading becomes more universally used in commercial AAA games. But even with that in mind, at most for a long while it's just going to lead to higher-resolution textures and maybe a few more effects. Game budgets will absolutely not scale enough to meaningfully use 75 TF/92 TF whatever of compute power in any way other than as resolution and texture boosters.

Loxus said:
I thought having 16 chips would of gone against the reason for implementing Infinity Cache.
Which is the engineering team's desire to avoid using a super-expensive, and thirsty 512-bit memory bus.

Surly 16 chips with a 256-bit bus can't be that much less expensive and power hungry than going with a 512-bit bus, which is 16 chips also.

With RDNA3, Infinity Cache is used for more than increasing bandwidth.
It also connects the chiplets and other stuff.

Also if your going with 32 GB HBM and want a higher bandwidth, but also less stacks.
You can go Infinity Cache + 2 stacks of HBM @ 2.8 Gbps, instead of 4 stacks.
Which not only reduces cost but also reduces implementation difficulties and much more power efficient than 32 GB of GDDR6.

Those HBM3 specs look kind of low, in fact they look closer to HBMNext which IIRC is more Micron's version of HBM2E that SK-Hynix has had for a few years now. The HBM3 specs I've seen mentioned are closer to 5 Gbps, and one company I think speculated it could reach 7 Gbps per pin.

Here is some more information on more recent HBM3 developments

That being said, they could always clock the pins below spec if it means hitting a certain power budget. But at that point, you have to start weighing if the power savings are worth it over the likely premium HBM3 would bring versus GDDR6/GDDR6X (maybe GDDR7 but I don't think that's coming anytime soon).

TheThreadsThatBindUs · Nov 14, 2021

Loxus said:
I thought having 16 chips would of gone against the reason for implementing Infinity Cache.

16x chips in clamshell mode with a 256-bit bus is merely for doubling capacity, as it doesn't provide any additional memory bandwidth than 8x chips on a 256-but interface.

Infinity Cache is required to offset the need for high memory bandwidth for a GPU of this performance level. As you can store more data locally, closer to the execution cores where you really need it and therefore have to go out to main memory for the needed data much less often.

Loxus said:
Which is the engineering team's desire to avoid using a super-expensive, and thirsty 512-bit memory bus.

A 512-bit bus is expensive because of the silicon footprint, i.e. the PHYs take up significantly more area on the die. This increases the chip size which leads to less dies per wafer and thus lower yields (assuming a given point defect rate). This invariably results in increased costs because the cost is paid per wafer rather than per die, so less useable dies per wafer lead to that $5000 - 7000+ cost per wafer for your bleeding edge 5nm process being spread over fewer dies.

Loxus said:
Surly 16 chips with a 256-bit bus can't be that much less expensive and power hungry than going with a 512-bit bus, which is 16 chips also.

The cost of the APU is the single largest cost contributor, so making the die larger by going with a 512-bit bus will make the GPU overall considerably more expensive than doubling the number of GDDR chips that come in at close to $10-20 per chip.

Loxus said:
With RDNA3, Infinity Cache is used for more than increasing bandwidth.
It also connects the chiplets and other stuff.

No, its primary purpose is not to serve as an interconnect to connect the chiplets. That's Infinity Fabric. Infinity Cache is a hardware-managed final level cache, shared between two chiplets that seeks to minimise the performance impact of inter-communication between separate chiplet dies, as well as communication with the rest of the system, e.g. main memory.

It primarily serves to reduce the need for higher external bandwidth because you can keep more data local to your execution cores. That's what all cache is for.

Loxus said:
Also if your going with 32 GB HBM and want a higher bandwidth, but also less stacks.
You can go Infinity Cache + 2 stacks of HBM @ 2.8 Gbps, instead of 4 stacks.
Which not only reduces cost but also reduces implementation difficulties and much more power efficient than 32 GB of GDDR6.

No argument there. But 2 stacks of HBM alone isn't cheap by any means and may still be outside the realm of affordability for high GPUs and consoles next-gen.

It's a pity that low-cost HBM with fewer TSVs didn't take off.

Loxus · Nov 14, 2021

TheThreadsThatBindUs said:
A 512-bit bus is expensive because of the silicon footprint, i.e. the PHYs take up significantly more area on the die. This increases the chip size which leads to less dies per wafer and thus lower yields (assuming a given point defect rate). This invariably results in increased costs because the cost is paid per wafer rather than per die, so less useable dies per wafer lead to that $5000 - 7000+ cost per wafer for your bleeding edge 5nm process being spread over fewer dies.

128 MB of Infinity Cache takes up about 128mm² on the die. While 512-bit bus takes up about 64mm².
So I don't think it's because of increasing chip size, but you are right about power consumption and cost.

512 MB of Infinity Cache may not be a problem as there are rumours the RDNA3 will have stacked L3 Cache.

TheThreadsThatBindUs said:
No, its primary purpose is not to serve as an interconnect to connect the chiplets. That's Infinity Fabric. Infinity Cache is a hardware-managed final level cache, shared between two chiplets that seeks to minimise the performance impact of inter-communication between separate chiplet dies, as well as communication with the rest of the system, e.g. main memory.

It primarily serves to reduce the need for higher external bandwidth because you can keep more data local to your execution cores. That's what all cache is for.

Well I got that idea from their Active Bridge Chiplet Patent.
A Glimpse of RDNA 3 Graphics Architecture Based GPUs & APUs, AMD Patents Active Bridge Chiplet With Integrated Cache For Multi-Chiplet Designs
The main block diagram of the conceptual design shows a chip featuring multiple chiplets. The CPU portion is connected to the first GPU chiplet via a communication bus (future generation of Infinity Fabric) while the GPU chiplets are interconnected via the active bridge chiplet. This is an on-die bus interface that connects an n-number of GPU chiplets. What's more interesting is that the bridge will also feature an L3 LLC (Last Level Cache) which is coherent and unified across the multiple chiplets, reducing cache bottlenecks. The AMD Active Bridge Chiplet hence allows for the parallel working of the chiplets on existing programming models and reduces the need for having separate L3 caches for each GPU chiplet.

From my understanding,
CPC <-> GPU = Infinity Fabric.
GPU <-> GPU = Active Bridge, which also features L3 Cache.

Just thought I'll say Infinity Cache since it's apart of the bridge.

TheThreadsThatBindUs · Nov 14, 2021

Loxus said:
128 MB of Infinity Cache takes up about 128mm² on the die. While 512-bit bus takes up about 64mm².
So I don't think it's because of increasing chip size, but you are right about power consumption and cost.

512 MB of Infinity Cache may not be a problem as there are rumours the RDNA3 will have stacked L3 Cache.

If RDNA 3 includes multi-chiplets for the GPU, then each chiplet will require its own 512-bit interface, so the total PHY area taken up by the 512-bit bus will be similar to the area of the Infinity Cache. But I guess you have to think that if the IC is required anyway to help with the inter-chiplet comms, the decision has pretty much already been made for you, and you can get away with a smaller 256-bit bus and save on both chiplet area and power consumption costs. It's a win-win.

Loxus said:
Well I got that idea from their Active Bridge Chiplet Patent.
A Glimpse of RDNA 3 Graphics Architecture Based GPUs & APUs, AMD Patents Active Bridge Chiplet With Integrated Cache For Multi-Chiplet Designs
The main block diagram of the conceptual design shows a chip featuring multiple chiplets. The CPU portion is connected to the first GPU chiplet via a communication bus (future generation of Infinity Fabric) while the GPU chiplets are interconnected via the active bridge chiplet. This is an on-die bus interface that connects an n-number of GPU chiplets. What's more interesting is that the bridge will also feature an L3 LLC (Last Level Cache) which is coherent and unified across the multiple chiplets, reducing cache bottlenecks. The AMD Active Bridge Chiplet hence allows for the parallel working of the chiplets on existing programming models and reduces the need for having separate L3 caches for each GPU chiplet.

From my understanding,
CPC <-> GPU = Infinity Fabric.
GPU <-> GPU = Active Bridge, which also features L3 Cache.

Just thought I'll say Infinity Cache since it's apart of the bridge.

This is a nice find. Thanks for sharing. I figured we were discussing Infinity Cache in general, but the above essentially combined IC and IF into a fast inter-chiplet interconnect with integrated stacked SRAM cache. It's a really cool setup, acknowledging the inherent latency challenges of multi-die GPUs.

KuraiShidosha · Nov 14, 2021

The 4090 will be the card to finally get me to leave my trusty 1080 Ti behind. Not even the 3090 was reason enough. After 4 years I demanded more than just 1.8-2.2x performance. This looks like it will be it. Here's hoping we also get one more amazing desktop VR kit so I can have my final build. Just once more.

Support NeoGAF

More next-gen graphics card rumors - AMD FP32 ~75Teraflops | Nvidia FP32 ~ 85Terraflops

Banned

voted poster of the decade by bots

Banned

World’s Biggest Weeb

voted poster of the decade by bots

hide your water-based mammals

Golden Boy

Member

Member

Member

Member

Gold Member

World’s Biggest Weeb

Member

Member

Member

Member

°Temp. member

Member

voted poster of the decade by bots

Member

I cry about Xbox and hate PlayStation.

Banned

Member

Member

Member

Cores, shaders and BIOS oh my!

Banned

Member

Member

Banned

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Banned

Member

Banned

Member

Member

Member

Member

Member

Member

Similar threads