AMD Radeon Fury X Series | HBM, Small Form Factor And Water Cooling | June 16th

Because the 980Ti is on the way. If it's anything like the 780Ti was in comparison to the Titan, it will be cheaper and faster.

If I was in Nvidia's position, I wouldn't release it anytime soon, and instead get people to buy TITAN for 1k$ to make 300-400$ more. AMD doesn't seem competition at this point. So why not release 980ti really late in fall.
 
If I was in Nvidia's position, I wouldn't release it anytime soon, and instead get people to buy TITAN for 1k$ to make 300-400$ more. AMD doesn't seem competition at this point. So why not release 980ti really late in fall.

Faster at 2/3rds of the price is definitely competition.
 
Titan X is already full GM200. 980Ti could have higher baseclocks though and perhaps a higher powerlimit.


4GB is limiting given the resolution you would want to play with such hardware.

Do you have the benchmarks to prove how this 4GB of HBM will work? Do you have the frame pacing data? I'd love to seei t.
 
Do you have the benchmarks to prove how this 4GB of HBM will work? Do you have the frame pacing data? I'd love to seei t.

no matter how high bandwidth it is you cant makeup for the physical storage, we've already seen games hitting memory usage walls at 4k, there is no reason to think HBM will somehow lessen that. For the flagship card 4gb is too small.
 
No point of that much power when your RAM will bottleneck the moment you push it above 4GB, which would be frequent.

From this link it seems the HBM allows for lower ram usage.

"You're not limited in this world to any number of stacks, but from a capacity point of view, this generation-one HBM, each DRAM is a two-gigabit DRAM, so yeah, if you have four stacks you're limited to four gigabytes. You could build things with more stacks, you could build things with less stacks. Capacity of the frame buffer is just one of our concerns. There are many things you can do to utilise that capacity better. So if you have four stacks you're limited to four [gigabytes], but we don't really view that as a performance limitation from an AMD perspective."

"If you actually look at frame buffers and how efficient they are and how efficient the drivers are at managing capacities across the resolutions, you'll find that there's a lot that can be done. We do not see 4GB as a limitation that would cause performance bottlenecks. We just need to do a better job managing the capacities. We were getting free capacity, because with [GDDR5] in order to get more bandwidth we needed to make the memory system wider, so the capacities were increasing. As engineers, we always focus on where the bottleneck is. If you're getting capacity, you don't put as much effort into better utilising that capacity. 4GB is more than sufficient. We've had to go do a little bit of investment in order to better utilise the frame buffer, but we're not really seeing a frame buffer capacity [problem]. You'll be blown away by how much [capacity] is wasted."
 
Do you have the benchmarks to prove how this 4GB of HBM will work? Do you have the frame pacing data? I'd love to seei t.

I just know how textures are placed in VRAM is all and the necessity of G buffers getting fatter and fatter as resolutions increase (or the necessities of increased VRAM usage due to MSAA samples). It doesn't take real actual data to speculate on what a certain amount of VRAM can be capable of in spite of its speed.
From this link it seems the HBM allows for lower ram usage.

That is completely unproven and goes against anything I know about what RAM is. Did the bump from GDDR3 to GDDR5 lead to lower ram usage? no. Perhaps they could do some driver tricks to make sure stuff is streamed in and out of VRAm more often instead of being stored, but that would lead to texture popin, or worse, completely being unplayable in some instances (3.5 GB like situation)
 
I just know how textures are placed in VRAM is all and the necessity of G buffers getting fatter and fatter as resolutions increase (or the necessities of increased VRAM usage due to MSAA samples). It doesn't take real actual data to speculate on what a certain amount of VRAM can be capable of in spite of its speed.

So at 1080p how will this card fare?

I don't plan on upgrading to 4k for a while.
 
I just know how textures are placed in VRAM is all and the necessity of G buffers getting fatter and fatter as resolutions increase (or the necessities of increased VRAM usage due to MSAA samples). It doesn't take real actual data to speculate on what a certain amount of VRAM can be capable of in spite of its speed.


That is completely unproven and goes against anything I know about what RAM is. Did the bump from GDDR3 to GDDR5 lead to lower ram usage? no. Perhaps they could do some driver tricks to make sure stuff is streamed in and out of VRAm more often instead of being stored, but that would lead to texture popin, or worse, completely being unplayable in some instances (3.5 GB like situation)

Then let us look forward to the benchmarks and see how this all plays out.

So at 1080p how will this card fare?

I don't plan on upgrading to 4k for a while.


It'll be a monstrous beast at 1080p.
 
Because the 980Ti is on the way. If it's anything like the 780Ti was in comparison to the Titan, it will be cheaper and faster.

Do we have a launch date on the 980Ti and any guesses as to the price point (lower than or higher than the 390X) ?
 
I just know how textures are placed in VRAM is all and the necessity of G buffers getting fatter and fatter as resolutions increase (or the necessities of increased VRAM usage due to MSAA samples). It doesn't take real actual data to speculate on what a certain amount of VRAM can be capable of in spite of its speed.


That is completely unproven and goes against anything I know about what RAM is. Did the bump from GDDR3 to GDDR5 lead to lower ram usage? no. Perhaps they could do some driver tricks to make sure stuff is streamed in and out of VRAm more often instead of being stored, but that would lead to texture popin, or worse, completely being unplayable in some instances (3.5 GB like situation)

It seems something they did with the card (or driver?) allowed them to lower the size of the framebuffer.
 
So at 1080p how will this card fare?

I don't plan on upgrading to 4k for a while.

I think it would be overkill on 1080p. Hell, even a 290x it's overkill. But Yea, It will destroy anything you throw at it. I'm currently playing at 1080 and I'm considering going for a dual 290x setup instead just because of how cheap they gonna be.
 
So at 1080p how will this card fare?

I don't plan on upgrading to 4k for a while.
It reads like it will be powerful at 1080p.

Then let us look forward to the benchmarks and see how this all plays out.




It'll be a monstrous beast at 1080p.
I am fine with waiting for more concrete information, but for posterity reasons, I want to express that hitting the VRAM wall is one of the most frustrating things in all of PC gaming. I hope then for those people that buy this thing, that they never come to fight against that terrible beast that is limited VRAM.

And yeah, as a 1080p card it will probably be great (partially defeats the whole point of having such crazy bandwidth though unless games start adding in special high bandwidth effects).
It seems something they did with the card (or driver?) allowed them to lower the size of the framebuffer.
I think that is totally doable. Remember the argument about the 970 3.5 issue being a problem was that it limited the card to a very specific resolution set as well as required driver work to make it work (and there is only so much the driver can do when the game is trying to shove 4K textures down the GPU's throat).
 
I wouldn't get my hopes up. I bet the timing will be similar to the Maxwell series, GP204 in the fall of 2016 and GP200 as a Titan in the spring/summer of 2017. I hope I'm wrong and it's earlier though,

Maxwell 2 series has no relation to 16FF+ process so its timing means nothing for Pascal basically. FF16 is in risk production right now with mass production start scheduled for 3Q15. The question is when (and after 20nm fiasco - IF) this node will be mature enough to produce a complex and high volume GPU like a "GP104". I'm hoping that Spring'16 will be the time.

As for the "GP100" - this one is ways off for the gaming / GeForce market. I think it's reasonable to expect it no earlier than 2017.

Maxwell 2 -> Pascal transition should be somewhat similar to Fermi -> Kepler transition and less like the last Kepler -> "Kepler Refresh" -> Maxwell -> Maxwell 2 transitions.
 
Let me play devil's advocate. What if HBM is super revolutionary in that the super fast speeds offset the low VRAM
That is actually possible. Well. People often don't understand what the actual memory requirements are for their games in terms of video memory. The largest motivator for these massive cards was memory configuration. You increase bandwidth simple by adding more chips and increasing how much data you can access in parallel. This in turn requires memory allocation on your GPU to be somewhat wasteful sometimes duplicating things. In that case having lots of memory on lots of chips makes sense from a performance standpoint.
Stacked memory greatly eliminates the need to do that. Will it be a 4GB to 12GB difference? No. But 4GB will probably be enough for what most people play on and at 4K your bottleneck isn't your memory but your performance.

AMD will still get fucked if they go $849.
 
Just took a quick look on newegg and the cheapest 980 is $550 without rebates or any other special discounts. If Amd can price this around $600 and no more they could have a winner. This is assuming the rumored performance and size is true.
 
HDMI is a superset of DVI with audio pins.

Lol... no.

Display Port -> DVI-Dual Link -----> HDMI

HDMI is the most limited of the 3 interfaces. If you are connecting to a monitor (especially a high refresh rate monitor) you should never use HDMI when DisplayPort is present.

I mean... unless you are really using it for audio :|
 
That is actually possible. Well. People often don't understand what the actual memory requirements are for their games in terms of video memory. The largest motivator for these massive cards was memory configuration. You increase bandwidth simple by adding more chips and increasing how much data you can access in parallel. This in turn requires memory allocation on your GPU to be somewhat wasteful sometimes duplicating things. In that case having lots of memory on lots of chips makes sense from a performance standpoint.
Stacked memory greatly eliminates the need to do that. Will it be a 4GB to 12GB difference? No. But 4GB will probably be enough for what most people play on and at 4K your bottleneck isn't your memory but your performance.

AMD will still get fucked if they go $849.

All GDDR5 controllers that I know of are fully coherent. There is no reason to store the same data on two memory chips as accessing it on any chip is the same speed as on the other one. Well, GTX970 may be the only exception here.

A larger bandwidth have no influence on how much memory is used by assets and buffers. Bandwidth is speed, not quantity. Saying that larger bandwidth somehow makes the size of the memory pool less important sounds like damage control and nothing more. The amount of allocated memory may somewhat depend on the architecture and drivers but not on the bandwidth available. But even in this case we're talking about 100s of MBs at best. A 4GB card will never be able to store as much data as an 8 or even 6 GB card will.
 
All GDDR5 controllers that I know of are fully coherent. There is no reason to store the same data on two memory chips as accessing it on any chip is the same speed as on the other one. Well, GTX970 may be the only exception here.

A larger bandwidth have no influence on how much memory is used by assets and buffers. Bandwidth is speed, not quantity. Saying that larger bandwidth somehow makes the size of the memory pool less important sounds like damage control and nothing more. The amount of allocated memory may somewhat depend on the architecture and drivers but not on the bandwidth available. But even in this case we're talking about 100s of MBs at best. A 4GB card will never be able to store as much data as an 8 or even 6 GB card will.
I'm agreeing with you. Yes it's an aspect or architectural design and memory allocation is handled in the driver layer, available bandwidth can enable different architectural decisions, including those that could affect memory usage. That is down to whether you believe their engineers or not. Obviously it's not the same!

I did admit that it being a 4GB vs 8GB difference is probably complete hogwash. There might be gains however but what they are and how they affect performance is anyone's guess. What usecases are favoured by these approaches how are games affected etc etc.
 
Lol... no.

Display Port -> DVI-Dual Link -----> HDMI

HDMI is the most limited of the 3 interfaces. If you are connecting to a monitor (especially a high refresh rate monitor) you should never use HDMI when DisplayPort is present.

I mean... unless you are really using it for audio :|

I'm not sure what you're arguing here. What I said is factually correct. HDMI is DVI with audio pins, they are otherwise electrically identical. This is why a simple adapter which just changes the connector shape lets you convert an HDMI connector to a DVI and vice versa.

The rest of your post has literally nothing to do with what I said.
 
All GDDR5 controllers that I know of are fully coherent. There is no reason to store the same data on two memory chips as accessing it on any chip is the same speed as on the other one. Well, GTX970 may be the only exception here.

A larger bandwidth have no influence on how much memory is used by assets and buffers. Bandwidth is speed, not quantity. Saying that larger bandwidth somehow makes the size of the memory pool less important sounds like damage control and nothing more. The amount of allocated memory may somewhat depend on the architecture and drivers but not on the bandwidth available. But even in this case we're talking about 100s of MBs at best. A 4GB card will never be able to store as much data as an 8 or even 6 GB card will.

Surely this statement is false. Bandwidth is effectively 'quantity per second" and not speed. I.e. the size of the pipe as opposed to the speed of the data flowing through it. Perhaps what you're attempting to point out is that 4GB of memory will simply be utilised/exhausted faster, the greater the bandwidth, than 8GB of memory. The complexity arises in factoring how that pool of memory is being utilised. 8GB will store more but depending on how that 8GB is accessed, it could effectively end up having worse performance than 4GB.

Having more is always ideal but not necessarily better
 
Do streaming engines like GTAV use the VRAM for everything to keep it simple? eg the caches for storing data outside of what is visible that you will be moving into soon. That kind of thing should be able to live in main system memory and he PCI bus should be fast enough to transfer across when it is needed?
 
Surely this statement is false. Bandwidth is effectively 'quantity per second" and not speed. I.e. the size of the pipe as opposed to the speed of the data flowing through it. Perhaps what you're attempting to point out is that 4GB of memory will simply be utilised/exhausted faster, the greater the bandwidth, than 8GB of memory. The complexity arises in factoring how that pool of memory is being utilised. 8GB will store more but depending on how that 8GB is accessed, it could effectively end up having worse performance than 4GB.

Having more is always ideal but not necessarily better

Quantity per second is speed. Km/second is speed you know. Bandwidth is just a top limit - like a maximum possible speed. But it is still speed nevertheless.

Saying that 4GB RAM with high bandwidth is the same as 8 with low bandwidth is bullshit. It's like saying that your 128GB SSD is the same as a 3TB HDD because it's four times faster. You can't substitute size with bandwidth, these are two different metrics.
 
Quantity per second is speed. Km/second is speed you know. Bandwidth is just a top limit - like a maximum possible speed. But it is still speed nevertheless.

Saying that 4GB RAM with high bandwidth is the same as 8 with low bandwidth is bullshit. It's like saying that your 128GB SSD is the same as a 3TB HDD because it's four times faster. You can't substitute size with bandwidth, these are two different metrics.

I don't recall stating it is the same. All i stated is that 4GB at a specific bandwidth would get exhausted faster than 8GB at the same bandwidth (which is what i assumed you were attempting to point out rather poorly0. Your SSD vs HDD analogy is... rather silly. The performance issue arises when data has to be swapped in and out of that 4GB or 8GB of memory. There are always compression techniques utilised when computing frame buffers so perhaps AMD is trying to emphasise what they are able to do regarding these with HBM that was not possible with GDDR5 in minimizing latencies. That remains to be seen of course
 
If its limited to 4GB then the Fudzilla rumor about an $849 price-point is even more absurd. Given the mid-range price points for the 370/380X will be $200-$400, realistically they'll need to shoot for a $549 - $599 price point for the flagship if they want these to sell in large numbers.
 
So AMD hopes they can spend more time compressing data, but then send larger chunks of data to make use of the extra 150GB/s.
 
Because the 980Ti is on the way. If it's anything like the 780Ti was in comparison to the Titan, it will be cheaper and faster.
I don't think 980Ti is coming...

The hypothetical 980Ti is Nvidia's response if 390X is a big gamechanger, now it doesn't seem like it. Nvidia can just let Titan X maintain the performance crown and not reduce their margins with 980Ti.

Faster at 2/3rds of the price is definitely competition.
Not enough to sway people. The price/perf gap can be covered by typical nvidia-justifications like heat, driver, shadowplay, etc. Or GTX980 gets a price cut.
 
I dont know what fuzzy math people are using to come up with 2/3 price? The price in the fudzilla piece is 85% of the price.
 
It's a far different technology than merely comparing GDDR3 to GDDR5.

One stack of HBM has 1GB capacity and 1024-bit memory bus. This stack can give ~128 GB/s in bandwidth. There are 4 of these stacks on Fiji with 4GB HBM and 4096-bit in total.

One GDDR5 chip is connected through a 32bit memory bus. This GDDR5 chip only has ~28GB/s in bandwidth. You have many of these chips connected to each 32bit memory bus, and you add them up.

So to take advantage of memory bandwidth of GDDR5, you have to spread assets across all the chips. AKA, use more memory and keep using it so that all the chips are engaged.

What might be happening is that there is a large amount of data that's copied across all GDDR5, then left there unused (cached) when the GPU moves to another set of data it needs to use. You may not have do this with HBM. And AMD hired 2 engineers to try to make this possible.

With HBM, you only need 2 or 3 stacks to exceed the bandwidth of any GDDR5 card ever made. And they have 4. There's no need to waste memory in cache to get faster bandwidth.

You also have the benefit of low latency which means the GPU can access and use memory much quicker which may let AMD do more memory usage on the fly without caching too much like they do with GDDR5 cards.

We shall see. 4GB of HMB1 may not hit the same bottleneck as 4GB DDR5. We will have to wait for reviews, and see how it compares to other 4GB cards at high resolutions. It's not worth dismissing the card outright because of 4GB when such a new technology is on display.
 
I'll agree with the post above me. Wait and see.

When I build my PC later this year maybe I'll put this in it. There's no need for it to run at 4K but it will need to handle 1440p.
 
It's a far different technology than merely comparing GDDR3 to GDDR5.

One stack of HBM has 1GB capacity and 1024-bit memory bus. This stack can give ~128 GB/s in bandwidth. There are 4 of these stacks on Fiji with 4GB HBM and 4096-bit in total.

One GDDR5 chip is connected through a 32bit memory bus. This GDDR5 chip only has ~28GB/s in bandwidth. You have many of these chips connected to each 32bit memory bus, and you add them up.

So to take advantage of memory bandwidth of GDDR5, you have to spread assets across all the chips. AKA, use more memory and keep using it so that all the chips are engaged.

What might be happening is that there is a large amount of data that's copied across all GDDR5, then left there unused (cached) when the GPU moves to another set of data it needs to use. You may not have do this with HBM. And AMD hired 2 engineers to try to make this possible.

With HBM, you only need 2 or 3 stacks to exceed the bandwidth of any GDDR5 card ever made. And they have 4. There's no need to waste memory in cache to get faster bandwidth.

You also have the benefit of low latency which means the GPU can access and use memory much quicker which may let AMD do more memory usage on the fly without caching too much like they do with GDDR5 cards.

We shall see. 4GB of HMB1 may not hit the same bottleneck as 4GB DDR5. We will have to wait for reviews, and see how it compares to other 4GB cards at high resolutions. It's not worth dismissing the card outright because of 4GB when such a new technology is on display.

We have a winner.

The main issue of memory in general is the latency increase as you increase the size of memory available. If you don't need to cache too much you don't need that much memory. The physical memory bottleneck won't be too apparent anyways since I doubt any game is designed from the ground up to use more than 4GB as a minimum. By the time it does (a year or two maybe) HBM2 will be ready to go - AMD knows at the end of the day people just care about the FPS/benchmark points it scores - the actual card doesn't matter. And with it by the sounds of things being ITX friendly might find it's way into Steam Machines - we got plenty of time until November.
 
I read Crisium's most recent post, and thought it made sense.
But then i thought to myself, well why don't they just run the gddr5 chips in "raid 01", so they always get the full bandwidth.
And then it the obvious dawned on me, that would of course mean that the data had to be mirrored across all chips.

For example, in the case of the 290Xs 16 chips, "raid 01" would mean that it had 256 MB insead of 4 GB

Super wide access to such a big memory amount as 1 GB per stack is starting to sound pretty cool.
 
It's a far different technology than merely comparing GDDR3 to GDDR5.

One stack of HBM has 1GB capacity and 1024-bit memory bus. This stack can give ~128 GB/s in bandwidth. There are 4 of these stacks on Fiji with 4GB HBM and 4096-bit in total.

One GDDR5 chip is connected through a 32bit memory bus. This GDDR5 chip only has ~28GB/s in bandwidth. You have many of these chips connected to each 32bit memory bus, and you add them up.

So to take advantage of memory bandwidth of GDDR5, you have to spread assets across all the chips. AKA, use more memory and keep using it so that all the chips are engaged.

What might be happening is that there is a large amount of data that's copied across all GDDR5, then left there unused (cached) when the GPU moves to another set of data it needs to use. You may not have do this with HBM. And AMD hired 2 engineers to try to make this possible.

With HBM, you only need 2 or 3 stacks to exceed the bandwidth of any GDDR5 card ever made. And they have 4. There's no need to waste memory in cache to get faster bandwidth.

You also have the benefit of low latency which means the GPU can access and use memory much quicker which may let AMD do more memory usage on the fly without caching too much like they do with GDDR5 cards.

We shall see. 4GB of HMB1 may not hit the same bottleneck as 4GB DDR5. We will have to wait for reviews, and see how it compares to other 4GB cards at high resolutions. It's not worth dismissing the card outright because of 4GB when such a new technology is on display.

Makes sense, but who has the burden of ensuring games take advantage of this?

Would this be something entirely handled by drivers?

Games would probably need at least some changes, eg detecting and accounting for
memory efficiency to know that 4GB on this card is same as say 6GB requirement on other cards
 
I don't recall stating it is the same. All i stated is that 4GB at a specific bandwidth would get exhausted faster than 8GB at the same bandwidth (which is what i assumed you were attempting to point out rather poorly0. Your SSD vs HDD analogy is... rather silly. The performance issue arises when data has to be swapped in and out of that 4GB or 8GB of memory. There are always compression techniques utilised when computing frame buffers so perhaps AMD is trying to emphasise what they are able to do regarding these with HBM that was not possible with GDDR5 in minimizing latencies. That remains to be seen of course

This is just another word play. I can only repeat that no amount of bandwidth will ever make up for 4 GB of memory being missing. There is nothing "silly" in my SSD vs HDD analogy as this is the exact same situation as with fast 4 GB RAM vs slow 8 GB RAM. You can't fit 8 GB of data in 4 GB of RAM no matter how fast it is.

Compression techniques work on the data stream between MCs and memory chips, they don't really compress the amount of data stored in the memory. Again, I can only repeat that via some driver tricks they may be able to save a couple of hundreds of MBs in total - but they won't be able to cover the whole 2-4-8 GB difference. This is just impossible. A 4 GB card is a 4 GB card.

We have a winner.

The main issue of memory in general is the latency increase as you increase the size of memory available. If you don't need to cache too much you don't need that much memory. The physical memory bottleneck won't be too apparent anyways since I doubt any game is designed from the ground up to use more than 4GB as a minimum. By the time it does (a year or two maybe) HBM2 will be ready to go - AMD knows at the end of the day people just care about the FPS/benchmark points it scores - the actual card doesn't matter. And with it by the sounds of things being ITX friendly might find it's way into Steam Machines - we got plenty of time until November.

Latency is not the problem for stream GPU processing. Having better latency won't mean anything - at least until some GPU apps will take advantage of this.

I would say that 4GB isn't enough for anything more than 1080p right now. There are a lot of games which hit the VRAM limit on a 4GB card with downsampling or MSAA in resolutions between 1080p and 4K. So having 4GB on a top end card right now is a big problem.
 
Top Bottom