Support NeoGAF

bobbytkc · Jan 25, 2013

Fancy Corndog said:
So what's it looking like now?

Orbis: ~3.5GB @176GB/S
Durango: ~5.5GB @68GB/S [32MB @102GB/S]

Correct?

The question for me is, why does Sony want its RAM to be that fast? It seems sort of outlying, like if we could figure out the answer to this we might know something more about the system/next-gen.

Ram that fast is standard fir current high end ggpu.They are not the outlier.

DSN2K · Jan 25, 2013

its bizarre for me that Nintendo and MS have gone cheap on ram. DDR3 sucks, I expected better.

RoboPlato · Jan 25, 2013

Kung Fu Grip said:
Since when did he have sources?

Since before the PS3 came out, apparently.

R10Neymarfan · Jan 25, 2013

DSN2K said:
its bizarre for me that Nintendo and MS have gone cheap on ram. DDR3 sucks, I expected better.

DDR4?

Boss Man · Jan 25, 2013

bobbytkc said:
Ram that fast is standard fir current high end ggpu.They are not the outlier.

I keep seeing screenshots arguing that you basically can't make a game consume 1GB per frame.

Yet here we have Orbis (and only Orbis) designed to push 3GB/frame at 60FPS. Why?

THE:MILKMAN · Jan 25, 2013

Fancy Corndog said:
I keep seeing screenshots proving that you basically can't make a game consume 1GB per frame.

Life (devs) will find a way.

Proelite · Jan 25, 2013

DSN2K said:
its bizarre for me that Nintendo and MS have gone cheap on ram. DDR3 sucks, I expected better.

Esram is expensive, more so than edram.

onQ123 · Jan 25, 2013

Fancy Corndog said:
So what's it looking like now?

Orbis: ~3.5GB @176GB/S
Durango: ~5.5GB @68GB/S [32MB @102GB/S]

Correct?

The question for me is, why does Sony want its RAM to be that fast? It seems sort of outlying, like if we could figure out the answer to this we might know something more about the system/next-gen.

Well there is this

http://semiaccurate.com/2012/03/02/...l-be-an-x86-cpu-with-an-amd-gpu/#.UQHnsR1EGSo

One of the things that we had heard about the PS4 chip, or should we say PS4 SoC, is that Sony is really keen on the idea of TSVs. The other bit is that they are going to have lots of extras, we have heard about sensors, but that could just be part of the other odd bit, FPGAs. Yeah, there is a lot of weird talk coming out of Sony engineers, and programmable logic, aka an FPGA, is just one of the things. Additional media processing blocks, DSPs, and similar blocks are all part of the concept.

To do all of this, and I do realize how odd it sounds, you would need some monumental memory bandwidth for it not so starve.

& this

http://mandetech.com/2012/01/10/sony-masaaki-tsuruta-interview/

He describes the architecture in broad terms: "You are talking about powerful CPU and GPU with extra DSP and programmable logic."
This, and Sony's target of no more than 50ms latency even for 8k x 4k resolution at 300fps, clearly points to the need for a highly integrated TSV-based package - and so far TSV has stuttered in manufacturing for anything other than the stacking of like-on-like, typically memories.

Desty · Jan 25, 2013

It seems weird that the ESRAM in Durango is slower than GDDR5. It is a special dedicated memory, it should be supa fast right?

SSM25 · Jan 25, 2013

AgentP said:
Are you getting your info from these threads or is someone with real insider info telling you? If you are reading the same info the rest of are, then the answer is "depends".

GDDR5 can be much higher bandwidth than DDR3/4 (200GB/s vs 60-90GB/s). The DDR gets paired with eDRAM to offset the low bandwidth DDR.

If the difference between DDR3/4 and GDDR5 was a "bit", then Sony would use the cheaper DDR solution too.

I'm starting to get that WiiU RAM speed thread vibe in here

ikioi · Jan 25, 2013

DSN2K said:
its bizarre for me that Nintendo and MS have gone cheap on ram. DDR3 sucks, I expected better.

DDR3 and GDDR5 are technologically very similar. GDDR5 is based on DDR3 fyi.

Neither format is better then the other, it all depends on the context it's being used in.

Sounds like Sony are going for a smaller physical amount of main pool memory, but this memory has a higher bandwidth and lower latency then the next Xbox's.

The Xbox has more physical main pool memory, but its bandwidth is lower and has higher latency then the PS4s. DDR3 2133mhz has pretty high CAS latency times, among the highest you can get. Microsoft however are intending to over come the high latency issue by having a small pool of eDRAM , ESRAM, or SRAM which developers can use for work loads that require low latency and high bandwidth. Similar to the eDRAM in the Xbox 360.

Both approaches while different achieve similar reuslts. These days memory latency is more important then bandwidth, and thats likely to be a trend that continues into the future.

Out of the two approaches, i personally see more potential in Microsoft's implamentation. While it's pure speculation as we don't know the facts, 64 megabytes of on die eDRAM on the GPU with a high bus + the DDR3, i'd say that has more versatility and capabilities then the PS4's pure GDDR5 approach. Similar approaches have worked well for the Gamecube, Xbox 360, and it's also the approach Nintendo went with the Wii U.

Also you can't compare the Wii U's DDR3 to the Xbox 360s. The Wii U uses DDR3 1600mhz on a 64bit bus, Durango appears to be using a 256bit bus and DDR3 2133mhz memory. So the Xbox 360's DDR3 memory pool should be a lot faster then the Wii U's. Wii U's DDR3 has a throughput of 12.8 gigabytes per second, Durango's would be 68.25 gigabytes per second in comparison. Not only that but Durango's DDR3 has 5-6x the capacity. Nintendo really gimped the Wii U with its DDR3 memory pool, its very slow.

oversitting · Jan 25, 2013

how much ram can a ram ram if a ram could ram ram?

onQ123 · Jan 25, 2013

oversitting said:
how much ram can a ram ram if a ram could ram ram?

As much damn ram as a damn ram can if a damn ram could ram ram!

NBtoaster · Jan 25, 2013

Desty said:
It seems weird that the ESRAM in Durango is slower than GDDR5. It is a special dedicated memory, it should be supa fast right?

Super low latency instead.

Zen · Jan 25, 2013

ikioi said:
DDR3 and GDDR5 are technologically very similar. GDDR5 is based on DDR3 fyi.

Neither format is better then the other, it all depends on the context it's being used in.

Sounds like Sony are going for a smaller physical amount of main pool memory, but this memory has a higher bandwidth and lower latency then the next Xbox's.

The Xbox has more physical main pool memory, but its bandwidth is lower and has higher latency then the PS4s. DDR3 2133mhz has pretty high CAS latency times, among the highest you can get. Microsoft however are intending to over come the high latency issue by having a small pool of eDRAM or SRAM which developers can use for work loads that require low latency and high bandwidth. Similar to the eDRAM in the Xbox 360.

Both approaches while different achieve similar reuslts. These days memory latency is more important then bandwidth, and thats likely to be a trend that continues into the future.

Out of the two approaches, i personally see more potential in Microsoft's implamentation. While it's pure speculation as we don't know the facts, 64 megabytes of on die eDRAM on the GPU with a high bus + the DDR3, i'd say that has more versatility and capabilities then the PS4's pure GDDR5 approach. Similar approaches have worked well for the Gamecube, Xbox 360, and it's also the approach Nintendo went with the Wii U.

Also you can't compare the Wii U's DDR3 to the Xbox 360s. The Wii U uses DDR3 1600mhz on a 64bit bus, Durango appears to be using a 256bit bus and DDR3 2133mhz memory. So the Xbox 360's DDR3 memory pool should be a lot faster then the Wii U's. Wii U's DDR3 has a throughput of 12.8 gigabytes per second, Durango's would be 68.25 gigabytes per second in comparison. Not only that but Durango's DDR3 has 5-6x the capacity. Nintendo really gimped the Wii U with its DDR3 memory pool, its very slow.

How would the latency of the Edram/Sram compare to the latency of the DDR5?

oversitting · Jan 25, 2013

Zen said:
How would the latency of the Edram/Sram compare to the latency of the DDR5?

SRAM should be 8 or more time less latency than GDDR5.

Kreunt · Jan 25, 2013

when was the last time a videocard has slower than gddr5 ram? ~2008?

oversitting · Jan 25, 2013

Kreunt said:
when was the last time a videocard has slower than gddr5 ram? ~2008?

they still make GDDR3 video card in the low end.

Death Dealer · Jan 25, 2013

DSN2K said:
its bizarre for me that Nintendo and MS have gone cheap on ram. DDR3 sucks, I expected better.

Nintendo is just cheap. But I don't think MS had any other options because they likely wanted at least a few GB for their OS and other media apps. So they needed an affordable option that would allow for a lot of memory. Apparently DDR4 won't be ready in time, and more than 4GB of GDDR5 is prohibitively expensive - the board and memory bus complexity, along with the cost of the memory chips. But DDR3 is cheap so why not go full boat with 8GB ? And MS knew plain DDR3 wouldn't cut the mustard and so they added the 32mb esram and worked with AMD on custom hw to help work around/alleviate some of those bottlenecks.

AgentP · Jan 25, 2013

ikioi said:
DDR3 and GDDR5 are technologically very similar. GDDR5 is based on DDR3 fyi.

Neither format is better then the other, it all depends on the context it's being used in.

Sounds like Sony are going for a smaller physical amount of main pool memory, but this memory has a higher bandwidth and lower latency then the next Xbox's.

The Xbox has more physical main pool memory, but its bandwidth is lower and has higher latency then the PS4s. DDR3 2133mhz has pretty high CAS latency times, among the highest you can get. Microsoft however are intending to over come the high latency issue by having a small pool of eDRAM , ESRAM, or SRAM which developers can use for work loads that require low latency and high bandwidth. Similar to the eDRAM in the Xbox 360.

Both approaches while different achieve similar reuslts. These days memory latency is more important then bandwidth, and thats likely to be a trend that continues into the future.

Out of the two approaches, i personally see more potential in Microsoft's implamentation. While it's pure speculation as we don't know the facts, 64 megabytes of on die eDRAM on the GPU with a high bus + the DDR3, i'd say that has more versatility and capabilities then the PS4's pure GDDR5 approach. Similar approaches have worked well for the Gamecube, Xbox 360, and it's also the approach Nintendo went with the Wii U.
.

1. Where is latency more important, clearly not graphics rendering. High end video cards all use GDDR5 and not for their latency, without the bandwidth the GPUs horsepower is useless. Some cards are pushing 300GB/s, clearly they could go with cheaper lower bandwidth bus/RAM that has equal or less latency.

2. Why, that opinion is very counter intuitive and goes against what several develops have already said. BTW no rumors are suggesting 64MB.

nib95 · Jan 25, 2013

Death Dealer said:
Nintendo is just cheap. But I don't think MS had any other options because they likely wanted at least a few GB for their OS and other media apps. So they needed an affordable option that would allow for a lot of memory. Apparently DDR4 won't be ready in time, and more than 4GB of GDDR5 is prohibitively expensive - the board and memory bus complexity, along with the cost of the memory chips. But DDR3 is cheap so why not go full boat with 8GB ? And MS knew plain DDR3 wouldn't cut the mustard and so they added the 32mb esram and worked with AMD on custom hw to help work around/alleviate some of those bottlenecks.

I think pretty much this. Given the OS needs, and with DDR4 and GDDR6 still a ways off, not sure they had much choice. Unless they went DDR3 and GDDR5 split, which causes all sorts of other issues.

oversitting · Jan 25, 2013

Bandwidth >>>> latency when accessing large amount of data like textures.

Latency > bandwidth when dealing with a small frame buffer.

AgentP · Jan 25, 2013

oversitting said:
Latency > bandwidth when dealing with a small frame buffer.

That must explain the 256GB/s eDRAM in the 360.

Death Dealer · Jan 25, 2013

onQ123 said:
Well there is this

"and programmable logic, aka an FPGA"

That sounds interesting. It seems like an infinite variety of tasks it could do, from sound DSP, to physics calculations.

oversitting · Jan 25, 2013

AgentP said:
That must explain the 256GB/s eDRAM in the 360.

There was also low latency since the ram is coupled right onto the ROPs.

AgentP · Jan 25, 2013

oversitting said:
There was also low latency since the ram is coupled right onto the ROPs.

Give me a reference where they even mention the latency.

Death Dealer · Jan 25, 2013

AgentP said:
Give me a reference where they even mention the latency.

Still only a rumor, but ERP on B3D said early that basically the esram would be useless without low latency, and then later he said someone told him it does in fact have low latency.

oversitting · Jan 25, 2013

AgentP said:
Give me a reference where they even mention the latency.

I don't have a reference but the edram's latency is much less the the GDDR3 main memory just by pure logic.

James Sawyer Ford · Jan 25, 2013

Death Dealer said:
Still only a rumor, but ERP on B3D said early that basically the esram would be useless without low latency, and then later he said someone told him it does in fact have low latency.

So they confirmed it's SRAM?

Pretty large and expensive if true.

oversitting · Jan 25, 2013

James Sawyer Ford said:
So they confirmed it's SRAM?

Pretty large and expensive if true.

very large, Im surprised they could do it.

Death Dealer · Jan 25, 2013

James Sawyer Ford said:
So they confirmed it's SRAM?

Pretty large and expensive if true.

I take that back. ERP didn't say for sure it was low latency. Rangers told him he had been told it was in fact very low latency. I trust Rangers.

It's on this thread:

http://beyond3d.com/showpost.php?p=1697311&postcount=403

http://beyond3d.com/showpost.php?p=1697600&postcount=429

ikioi · Jan 25, 2013

AgentP said:
1. Where is latency more important, clearly not graphics rendering.

Modern GPUs don't just do graphics rendering. GPGPU, Compute processing, programmable shaders, etc. Modern GPU workloads and capablities far exceed traditional graphics rendering. Physics, AI, GPGPU, there's so much more modern GPUs can do.

The next gen Xbox and Playstation are both rumored to be using by PC standards pretty low end CPUs. If you had a Jaguar CPU in your PC and 680GTX people would be saying you've bottlenecked your card. Well not if the CPU stops doing half the work it traditionally had.

2. Why, that opinion is very counter intuitive and goes against what several develops have already said. BTW no rumors are suggesting 64MB.

Just my personal preference i guess.

AgentP · Jan 25, 2013

Death Dealer said:
I take that back. ERP didn't say for sure it was low latency. Rangers told him he had been told it was in fact very low latency. I trust Rangers.

Ranger is a clown and is about as biased as they get.

I found this from last year by Sebbi, who is a 360 dev and a fan of eDRAM.

Let me explain why huge amounts of low bandwidth memory is not a good idea. Slow memory is pretty much unusable, simply because we cant access it

The GDDR3 memory subsystem in current generation consoles gives theoretical maximum of 10.8 GB/s read/write bandwidth (both directions). For a 60 fps game this is 0.18 GB per frame, or 184 MB, assuming of course that you are fully memory bandwidth bound at all times, and there's no cache trashing, etc happening. In practice some of that bandwidth gets wasted, so you might be able to access for example 100 MB per frame (if you try to access more, the frame rate will drop).

So with 10.8 GB/s theoretical bandwidth, you cannot access much more than 100 MB of memory per frame, and memory accesses do not change that much from frame to frame, as camera & object movement has to be relatively slow in order for animation to look smooth (esp. true at 60 fps). How much more memory you need than 100 MB then? It depends on how fast you can stream data from the hard drive, and how well you can predict the data you need in the future (latency is the most important thing here). 512 MB has proven to be enough for our technology, as we use virtual texturing. The only reason why we couldn't use 4k*4k textures on every single object was the downloadable package size (we do digitally distributed games), the 512 MB memory was never a bottleneck for us.

Of course there are games that have more random memory access patterns, and have to keep bigger partitions of game world in memory at once. However no matter what, these games cannot access more than ~100 MB of memory per frame. If you can predict correctly and hide latency well, you can keep most of your data in your HDD and stream it on demand. Needless to say, I am a fan of EDRAM and other fast memory techniques. I would always opt for small fast memory instead of large slow memory. Assuming of course we can stream from HDD or from flash memory (disc streaming is very much awkward, because of the high latency).

http://forum.beyond3d.com/showpost.php?p=1646788&postcount=15

oversitting · Jan 25, 2013

AgentP said:
Ranger is a clown and is about as biased as they get.

I found this from last year by Sebbi, who is a 360 dev and a fan of eDRAM.

http://forum.beyond3d.com/showpost.php?p=1646788&postcount=15

the only thing is, the harddrive isn't really getting faster this gen but the ram is. This means streaming from the harddrive would require much more preemption.

Death Dealer · Jan 25, 2013

AgentP said:
Ranger is a clown and is about as biased as they get.

I found this from last year by Sebbi, who is a 360 dev and a fan of eDRAM.

So you think embedded sram is just too expensive for the size they need ?

I'm not an expert, but my logic is if an expert says "the architecture is only interesting IF the ESRAM pool can be used as input to the GPU and if it's low latency."

Why wouldn't they do it ?

Do some people think Durango is using edram like X360 ?

XtremeXpider · Jan 25, 2013

- from the GPUs perspective the bandwidths of system memory and ESRAM are parallel providing combined peak bandwidth of 170 GB/sec.

IS this possible?

oversitting · Jan 25, 2013

Death Dealer said:
So you think embedded sram is just too expensive for the size they need ?

I'm not an expert, but my logic is if an expert says "the architecture is only interesting IF the ESRAM pool can be used as input to the GPU and if it's low latency."

Why wouldn't they do it ?

Do some people think Durango is using edram like X360 ?

The SRAM is very very expensive. Given how the leaks all say SRAM its probably true. There were things devs didn't like when dealing with eDRAM sinces accessing it specifically was hard.

XtremeXpider said:
IS this possible?

depends on if the Data Move Engine can do it or not, who knows. Realistically no.

James Sawyer Ford · Jan 25, 2013

correct me if i'm wrong, but not all "eSRAM" = "SRAM"

-ImaginaryInsider · Jan 25, 2013

oversitting said:
The SRAM is very very expensive. Given how the leaks all say SRAM its probably true. There were things devs didn't like when dealing with eDRAM sinces accessing it specifically was hard.

Supposedly SRAM can be manufactured in one process, and it's lower latency than EDRAM. I'd link to the thread, but there are like 3 next-gen speculation threads in Beyond3d, I can't remember which one that was mentioned in.

On a related note, Shifty made these comments a week ago:

It makes perfect sense! there's no pointing combing fast eDRAM with a fast main RAM. If you're going to use eDRAM, you'll use it as an economical way to add significant bandwidth, just as 360 did. And with 360's design, use of that eDRAM can be extremely efficient. IMO the only economically sound reason to go with GDDR5 on a wide bus like Sony is rumoured to be doing is if you have an eye on a stacked RAM module in a future iteration. If that doesn't happen, the minimum cost of your machine is going to be kept much higher than a machine offering similar BW to the whole system via eDRAM.

There's a whole thread discussing eDRAM's value (there's probably a whole thread here discussing each aspect of the consoles if someone cared to go look) and it's a suitably complex argument to have no clear 'best option'. Personally, if the choice is 190 GB/s unified RAM or 60 GB/s system RAM + 100 GB/s Xenos-style RAM (ROPs having own BW), I'd prefer the latter. The former is easier to work with but more limited. The ROPs having their own BW is going to save a massive amount from that system BW. That of course depends on eDRAM capacity, as too small could be a pain to work with. Orbis's unified RAM is the simplest option for devs to use.

So MS may be playing it extra safe just in case stacking doesn't become viable as soon as is it appears it will... DDR3 effectively guarantees them a safer route to cost reduce.

ypo · Jan 25, 2013

ikioi said:
While it's pure speculation as we don't know the facts, 64 megabytes of on die eDRAM on the GPU with a high bus + the DDR3, i'd say that has more versatility and capabilities then the PS4's pure GDDR5 approach. Similar approaches have worked well for the Gamecube, Xbox 360, and it's also the approach Nintendo went with the Wii U.
.

The embedded RAM in GNC, X360 are way faster than the conventional RAM in other systems at the time. That's where the advantages came from. In the case of Orbis and Durango this is not the case. The embedded RAM is actually slower than the DDR5 in Orbis. I don't see how this comparison is valid.

Death Dealer · Jan 25, 2013

XtremeXpider said:
IS this possible?

My guess is under super optimal conditions.

The little I have been able to glean is that properly keeping the esram fed will be the tricky part. Since it's only 32mb, it's impossible to always have everything the system needs in that memory, and you won't always know what to have loaded, so the DDR3 bandwidth will presumably still be somewhat of a limitation. Except for certain scenarios, it's probably not as good as a straight 170GB/s pipe.

imjust1n · Jan 25, 2013

Yea stop thinking your console is your computer.
The console is made to play video games for a cheaper price then buying a PC. Pc are expensive because you can do wayyy more with a PC with a expensive graphics card.
With 4 gb GDDR5 that's plenty in fact my graphics card only has one gb GDDR5 plus if there is more memory in the gpu then why are you complaining. Oh I know why because you want to know spec wise which system is going to be better... If the new Sony has GDDR5 memory it will be better because it's faster then ddr3 if the new xbox has ddr3.
If you want to play max resolution gaming build a PC. Companies like Sony and Microsoft spend hours apon hours thinking what you're thinking. You will be satisfied with what you see.

Slayer-33 · Jan 25, 2013

Wait so 720 has SRAM?

Ashes · Jan 25, 2013

James Sawyer Ford said:
correct me if i'm wrong, but not all "eSRAM" = "SRAM"

I cannot correct you.

Brad Grenz · Jan 25, 2013

Don't know. Maybe. I think that's improbable. 32MBs of actual SRAM would be like 100+ mm^2 at 28nm wouldn't it? That's why most rumors centered around a kind of embedded DRAM which should take drastically less space on the die. A lot of people have been caught up in some very nebulous "eSRAM" branding, though, so it's hard to know without official confirmation.

specialguy · Jan 25, 2013

Death Dealer said:
I take that back. ERP didn't say for sure it was low latency. Rangers told him he had been told it was in fact very low latency. I trust Rangers.

It's on this thread:

http://beyond3d.com/showpost.php?p=1697311&postcount=403

http://beyond3d.com/showpost.php?p=1697600&postcount=429

Too be fair I dont remember if I was told "very low" or just "low" latency, and I really would have no idea if it's the type of "low" ERP's suggestions would require.

But wow, people "trust" me, lol.

The GDDR3 memory subsystem in current generation consoles gives theoretical maximum of 10.8 GB/s read/write bandwidth (both directions). For a 60 fps game this is 0.18 GB per frame, or 184 MB, assuming of course that you are fully memory bandwidth bound at all times, and there's no cache trashing, etc happening. In practice some of that bandwidth gets wasted, so you might be able to access for example 100 MB per frame (if you try to access more, the frame rate will drop).

Weird how he just magically halved the calculations we do (eg, using total bandwidth divided by frames). for example 360 has ~22gb/s and he calls it 10.8 ("both directions") I wonder if that's unique to GDDR3 or always applies?

If it always applied our RAM per frame access calculations are double reality...so it'd be more 1.5 GB per frame Orbis and .5 GB per frame Durango at 60 FPS, rather than 3 and 1.

sebbi wrote a great post a few months ago about ram quantity vs speed, and he preferred the speed (eg, theoretically the PS4's setup in our current discussion). I'll probably crosspost that soon.

specialguy · Jan 25, 2013

here's that sebbi post

http://forum.beyond3d.com/showthread.php?t=62108&highlight=sebbi

The next gen speculation thread started to have interesting debate about memory bandwidth vs memory amount. I don't personally want to contribute to the next gen speculation, but the "memory bandwidth vs memory amount" topic is very interesting in it's own. So I decided to make a thread for this topic, as I have personally been doing a lot of memory access and bandwidth analysis lately for our console technology, and I have programmed our virtual texturing system (and many other JIT data streaming components).

Historically memory performance has improved linearly (very slowly) compared to exponential (Moore's law) growth of CPU performance. Relative memory access times (latencies) have grown to be over 400x higher (in clock cycles) compared to first PC computers, and there's no signs that this development will slow down in the future, unless we invent some radically new ways of storing data. None of the currently known future technologies is going to solve the problem, just provide some band aid. So we need to adapt.

Some links to background information first:

1. Presentation by Sony R&D. Targeted for game technology programmers. Has a very good real life example how improving your memory access pattern can improve your performance by almost 10x. Also has nice charts (slides 17 and 18) showing how memory speed has increased historically compared to ALU:
http://harmful.cat-v.org/software/OO...ng_GCAP_09.pdf

2. Benchmark results of a brand new x86 chip with unified memory architecture (CPU & GPU share the same memory & memory controller). Benchmark shows system performance with all available DDR3 speeds from DDR3-800 to DDR3-1866. All other system settings are identical, only memory bus bandwidth is scaled up/down. We can see an 80% performance (fps) improvement in the gaming benchmark just by increasing the DDR3 memory clock:
http://www.tomshardware.com/reviews/...0k,3224-5.html

3. A GPU benchmark comparing old Geforce GTS 450 (1 GB, GDDR5) card to a brand new Kepler based Geforce GT 640 (2 GB, DDR3). The new Kepler based card has twice the memory amount and twice the ALU performance, but only half of the memory bandwidth (because of DDR3). Despite the much faster theoretical shader performance and twice the memory amount, it loses pretty badly in the benchmarks because of it's slower memory bus:
http://www.anandtech.com/show/5969/z...gt-640-review-

Quote:
Originally Posted by aaronspink View Post
In the console space, using 2GB as a disk cache alone will make for a better end user experience than 2x or even 3-4x gpu performance.
I completely disagree with this. And I try now to explain why. As a professional, you of course know most of the background facts, but I need to explain that first, so that my remarks later aren't standing without a factual base.

--- ---

I will use the x86 based Trinity APU [link 2] as my example system, as it has close enough performance and memory bandwidth compared to current generation consoles (it's only around 2x-4x faster overall) and it has unified memory (single memory bus shared between CPU & GPU). It's much easier to talk about a well known system, with lots of public benchmarks results around the net.

Let's assume we are developing a vsync locked 60 fps game, so each frame must complete in 16.6 ms time. Let's assume our Trinity system is equipped with the fastest DDR3 it supports (DDR3-1866). According to Tom's Hardware synthetic bandwidth benchmark, this configuration gives us 14 GB bandwidth per second. Divide that by 60, and we get 233 MB bandwidth per frame. Let's round that down to even 200 MB per frame to ease up our calculations. A real game newer utilizes memory bandwidth as well as a synthetic benchmark, so even the 200 MB per frame figure is optimistic.

Now I know that my game should never access more than 200 MB of unique memory per frame if I want to reach my vsync locked 60 fps. If I access more memory, my frame rate dips as the memory subsystem cannot give me enough data, and my CPU & GPU start stalling.

How about CPU & GPU caches? Caches only help with repeated data access to the same data. Caches do not allow us to access any more unique data per frame. Also it's worth noticing that if you access the same memory for example at beginning of your frame, at middle of your frame and at end of your frame, you will pay as much as if you did three unique memory accesses. Caches are very small, and old data gets replaced very fast. Our Trinity CPU has 4 MB of L2 cache and we move 200 MB of data to the cache every frame. Our cache gets fully replaced by new data (200/4 =) 50 times every frame. Data only stays in cache for 0.33 ms. If we access it again after this period, we must fetch it from the memory again (wasting our valuable 200 MB per frame bandwidth). It's not uncommon that a real game accesses every data in the current working set (on average) twice per frame, leaving us with 100 MB per frame unique accessible memory. Examples: Shadowmaps are first rendered (to textures in memory) and sampled later during lighting pass. Physics simulation moves objects (positions & rotations) and later in frame those same objects are rendered (accessing those same position and rotation datas again).

However let's keep the theoretical 200 MB per frame number, as engines differ, and access patterns differ (and we do not really want to got that far in the analysis). In a real game you can likely access only around 100 MB - 150 MB of unique memory per frame, so the forthcoming analysis is optimistic. A real game could likely access less memory and thus have a smaller working set.

So far we know that the processing and rendering of a single frame never requires more than 200 MB of memory (we can't reach 60 fps otherwise). If your game has a static scene, you will not need more memory than that. However static scenes are not much fun, and thus this scenario is highly unlikely in real games (except for maybe a chess game with a fixed camera). So the billion dollar question becomes, how much does the working set (memory accesses) change from frame to frame in a 60 fps game?

In a computer game, objects and cameras do not really "move" around, they get repositioned every frame. In order for this repositioning to look like smooth movement we can only change the positions very slightly from frame to frame. This basically means that our working set can only change slightly from frame to frame. According to my analysis (for our game), our working set changes around 1%-2% per frame in general case, and peaks at around 10%. Especially notable fact is that our virtual texturing system working set never changes more than 2% per frame (textures are the biggest memory user in most games).

We assume that a game with a similar memory access pattern (similarly changing working set from frame to frame) is running on our Trinity example platform. Basically this means that in average case our working set changes from 2 MB to 4 MB per frame, and it peaks at around 20 MB per frame. We can stream this much data from a standard HDD. However HDDs have long latencies, and long seek times, so we must stream data in advance and bundle data in slightly bigger chunks than we like to combat the slow seek time. Both streaming in advance (prefetching) and loading in bigger chunks (loading slightly wider working set) require extra memory. Question becomes, how much larger the memory cache needs to be than our working set?

The working set is 200 MB (if we want to reach that 60 fps on the imaginary game on our Trinity platform). How much more memory we need for the cache? Is working set x2.5 enough (512 MB)? How about 5x (1 GB) or 10x (2 GB)?

Our virtual texture system has a static 1024 page cache (128x128 pixel pages, 2x DXT5 compressed layer per page). Our average working set per frame is around 200-400 pages, and it peaks as high as 600 pages. The cache is so small that it has to reload all textures if you spin the camera around in 360 degrees, but this doesn't matter, as the HDD streaming speed is enough to push new data in at steady pace. You never see any texture popping when rotating or moving the camera. The only occasion where you see texture popping is when the camera suddenly teleports to a completely different location (working set changes almost completely). In our game this only happens if you restart to a checkpoint or restart the level completely, so it's not a big deal (and we can predict it).

If the game behaves similarly to our existing console game, we need a cache size of around 3x the working set for texture data. Big percentage of the memory accessed per frame (or stored to the memory) goes to the textures. If we assume for a moment that all other memory accesses are as stable as texture accesses (cache multiplier of 3x) we only need 600 MB of memory for a fully working game. For some memory bandwidth hungry parts of the game this actually is true. And things are even better for some parts: shadow maps, post processing buffers, back buffer, etc are fully generated again every frame, so we need no extra memory storage to hold caches of these (cache multiplier is 1x).

Game logic streaming is a harder thing to analyze and generalize. For example our console game has a large free roaming outdoor world. It's nowhere as big as worlds in Skyrim for example, but the key point here is that we only keep a slice of the world in memory at once so the world size could theoretically be limitless (with no extra memory cost). Our view distance is 2 kilometers, so we do need to keep full representation of the game world in memory after that. Data quality required for a distance follows pretty much logarithmic scale (texture mip mapping, object geometry quality, heightfield quality, vegetation map quality, etc etc). Data required as distance grows shrinks dramatically. This is of course only true for easy cases such as graphics processing, heightfields, etc. Game logic doesn't automatically scale. However you must scale it manually to reach that 200 MB per frame memory access limit. Your game would slow down to halt if you just tried to simply read full AI data from every single individual NPC in the large scale world, no matter how simple your processing would be.

Our heightmap cache (used in physics, raycasts and terrain visualization) keeps around 4x the working set. We do physics simulation (and exact collision) only for things near the player (100 meters max). When an object enters this area, we add corresponding physics objects to our physics world. It's hard to exactly estimate how big percentage of our physics world structures are accessed per frame, but I would estimate around 10%. So we basially have a 10x working set "cache" for physics.

Basically no component in our game required more than 10x memory compared to its working set. Average requirement was around 3x. So theoretically a game with similar memory access patterns would only need 600 MB of memory on our example Trinity platform. And this includes as much texture resolution as you ever want (virtual texturing works that way). And it includes as much other (physics, game logic, etc) data as you can process per frame (given the limited bandwidth). Of course another game might need for example average of 10x working set for caches, but that's still only 2 GB. Assuming game is properly optimized (predictable memory accesses are must have for good performance) and utilizes JIT streaming well, it will not benefit much if we add more main memory to our Trinity platform beyond that 2 GB.

More memory of course makes developers life easier. Predicting data access patterns can be very hard for some styles of games and structures. But mindlessly increasing the cache sizes much beyond working set sizes doesn't help either (as we all know that increasing cache size beyond working set size gives on average only logarithmic improvement on cache hit rate = diminishing returns very quickly).

My conclusion: Usable memory amount is very much tied to available memory bandwidth. More bandwidth allows the games to access more memory. So it's kind of counterintuitive to swap faster smaller memory to a slower larger one. More available memory means that I want to access more memory, but in reality the slower bandwidth allows me to access less. So the percentage of accessible memory drops radically.

Although it's notable rereading it he seems to ask for 2-3X+ RAM as accessible per frame, which actually seems more like Durango's setup.

Ashes · Jan 25, 2013

Thank you for that.

edit: not your conclusion.

Bombadil · Jan 25, 2013

32MB of SRAM won't be enough. The DDR3 is simply too slow. The only way MS can allay my apprehension is by having each Durango ship with an SSD standard.

If the Orbis devkits are any indication of the final product, the PS4 will have 4GB of DDR5 and 1GB of DDR3.

If that's the case then Sony will have made the right decision. Microsoft opted for more RAM for Kinect and OS reservations. Hopefully they'll have time to change everything up and add some GDDR5 RAM like Sony did to even the playing field.

Damn, I really hope they didn't shoot themselves in the foot just for Kinect.

Ashes · Jan 25, 2013

Bombadil said:
32MB of SRAM won't be enough. The DDR3 is simply too slow. The only way MS can allay my apprehension is by having each Durango ship with an SSD standard.

I guess SSDs will have on loading, but other than that, it's a poor substitute for ram.

If the Orbis devkits are any indication of the final product, the PS4 will have 4GB of DDR5 and 1GB of DDR3.

Is that prediction on your part? or new information?

Support NeoGAF

RAM thread of Next Generation

ADD New Gen Gamer

Member

I'd be in the dick

Member

Member

Member

Member

Member

Banned

Member

Banned

Banned

Member

Member

Banned

Banned

Banned

Banned

Member

Thinks mods influence posters politics. Promoted to QAnon Editor.

Banned

Banned

Thinks mods influence posters politics. Promoted to QAnon Editor.

Member

Banned

Thinks mods influence posters politics. Promoted to QAnon Editor.

Member

Banned

Banned

Banned

Member

Banned

Thinks mods influence posters politics. Promoted to QAnon Editor.

Banned

Member

Member

Banned

Banned

Member

Member

Member

Banned

Liverpool-2

Banned

Member

Banned

Banned

Banned

Banned

Banned

Similar threads