Xbox One SDK & Hardware Leak Analysis CPU, GPU, RAM & More - [Part One]

mocoworm

Member
One for the techies. This is part 1 of a series of articles. I'll post the others when they are published. It's very in-depth.

Take the time to read it before commenting to avoid this breaking down into a thread that ends up locked, yeah?

http://www.redgamingtech.com/xbox-one-sdk-hardware-leak-analysis-cpu-gpu-ram-more-part-one-tech-tribunal/

When the hacker group H4LT leaked the Xbox One’s SDK and its accompanying documentation, we gamer’s and journalists were given a fantastic insight of the hardware and software Microsoft’s Xbox One is comprised of. When the the Xbox One’s SDK leak first hit, gaming news headlines primarily focused their attentions on the revelation the seventh CPU core of the Xbox One (well, up to 80 percent of it anyway), was now usable by game developers. This change further extended the CPU performance lead the Xbox One has over Sony’s Playstation 4 (thanks to the Xbox One’s higher CPU clock speed).

But in reality, there’s a lot more revealed inside the documentation than just that.
For example, if you’ve ever wondered Xbox One games are more likely to experience frame rate drops during an “Achievement Unlocked” popping up on screen, you’ll have your answer soon enough. It’s our mission, starting with this – a first in a series of articles, to take you through the various improvements and changes in the Xbox One’s architecture, SDK and development cycle; explaining the language and providing insights into Compute, ESRAM usage, APIs and just about everything else that makes Microsoft’s Next-Gen console tick.
 
As much as I love my Xbone, it's architecture is a mess and really doesn't fare well against PS4 even now, let alone the future. The main reason I think is the kinect, it took a good chunk of the budget, seeing how they sell it for $150 separately, so essentially Xbone is a $300-350 machine, while PS4 is $400, and I think that fits in with the performance gap. Personally I'm not affected, but I feel sorry for devs who have to struggle with this convoluted design rather than having a streamlined machine akin to ps4 and focusing on the content, rather than desperately trying to squeeze out 1080p.
 
Still that PS4 is the horse to bet on for multiplats. Don't think that those changes to CPU will influence that much.
 
Damn, some developers might actually find some of these things interesting in the long run.

You do realize that licensed developers have full access to the SDK and can read any of this information directly from the docs. I'm not really sure what any developer would get from this analysis.
 
I love reading this kind of stuff and could spend hours doing it but I never quite understand half of these tech articles/analyses, lol. Often it feels like reading jibberish. Thanks for posting though!
 
As much as I love my Xbone, it's architecture is a mess and really doesn't fare well against PS4 even now, let alone the future.

If you look at the big picture the actual architecture is pretty much the same (which is also part of the article). There are two/three bigger differences in certain parts but not the overall architecture - those being: higher main memory bandwidth on the PS4, the existence of an eSRAM cache on the Xbox One and of course the additional CUs and RUs in the PS4 GPU (only one of those is an architectural difference - the eSRAM).

If you look at the number of transistors used on the chips (Xbox One including the eSRAM, PS4 including the extra CUs) which is the main metric of how expensive (big) such a chip gets those are likely to cost a quite similar amount.

Where Sony does use more expensive components is the RAM. As has often been stated they profited immensly from price drops on GDDR5 memory which enabled them to use 8GB while still achieving the $ 399 pricepoint.

The more interesting part of the article however is how the software deals with the different states and how resources are allocated. I think it is quite an achievement that they were able to free parts of the seventh core for developers while still keeping the voice command functionality of Kinect active. Meanwhile Sony by all accounts is still reserving two full cores for their OS.

I'm looking forward to the future entries in that article series. I'd also be interested to see how Sony is dealing with the resource allocations on the PS4.
 
Where Sony does use more expensive components is the RAM. As has often been stated they profited immensly from price drops on GDDR5 memory which enabled them to use 8GB while still achieving the $ 399 pricepoint.

Depends how you look at it. ESRAM is way more expensive than GDDR5.
 
Depends how you look at it. ESRAM is way more expensive than GDDR5.

By how much? I think it's hard to say how expensive it really is as it is in one piece of hardware and part of the ALU as a whole. And likely to get way cheaper as the die shrinks.
 
If a CPU core wants to access the other modules level 2 cache (so for example, CPU core 1 which is housed in module A wishes to access the cache housed in Module B) it’ll be considerably slower. This logically means transferring data, and the same could be said for a level 1 Cache hit too (which is even slower).
Ah I remember ND's recent presentation also touched on designing their data structures around minimizing cache misses on Jaguar.

Regardless, effing Bulldozer needs to be retired.
 
One has to wonder if it would have been better to have 512MB of GDDR5 instead of eSRAM. Probably would have been cheaper and have the same amount of bandwidth as the PS4 as well as more space for things that require high bandwidth.
 
I never understood the need for 3 OS's.

It's for scheduling resources, mainly. there are 2 os's who are fighting for resources (one game os, one "other" os) of the given hardware and a 3rd one is to distribute them.
Problem is that people imagine 3 windows 8 running in parallel and conclude it's resource heavy to have 3 os running.
 
One has to wonder if it would have been better to have 512MB of GDDR5 instead of eSRAM. Probably would have been cheaper and have the same amount of bandwidth as the PS4 as well as more space for things that require high bandwidth.

Probably not. Addressing would be different, it would be a disadvantage to use it "like you want" and especially latency would be much higher.
 
Depends how you look at it. ESRAM is way more expensive than GDDR5.

The eSRAM is more expensive, however the same amount of transistors is used by Sony for the additional CUs on their chip. So basically the APUs in both systems are likely to cost roughly the same (one including eSRAM, one accomodating the additional computing power of the CUs). Therefore I'm only comparing the added cost of the GDDR5 to the DDR3.

When we talk about cost savings with die shrinks - you also see cost savings due to cheaper manufacturing on the RAM side but the market there is harder to predict given that RAM is more of an Off The Shelf component.

As for the need for 3 OS's. One is really only the most basic kind of OS. It only manages the resource alottment for the others - that's all it does. One works like a classic console OS with down to the metal development, etc. and is used for the games. The last one is for Apps and entertainment services. Microsoft there opted to use one that is derived off their regular OSes which means developers use pretty much the same tools they use for desktop development.

So instead of forcing people to learn how to code for a new OS with new APIs, etc. regular Apps could basically be written by anyone who does Windows App development. They didn't fully get to that goal with the current Xbox Dashboard OS however (I know someone who does App development for Xbox 360 and Xbox One - not games however). It's likely to change with Windows 10.

So basically games developers get what they need and App developers can use something simpler with more abstractions. And those are decoupled pretty well so the functionality used by one doesn't interfere with the other in a big way. We'll see how that plays out but it could potentially give Microsoft a big advantage when it comes to new entertainment Apps becoming available on the platform.
 
Probably not. Addressing would be different, it would be a disadvantage to use it "like you want" and especially latency would be much higher.

But at least you would be able to fit in deferred rendering frame buffers in there, rather than shameful 720p vs 1080p on PS4 in games like MGSV. So far the only games Xbone manages to handle 1080p is either lastgen ports like GTA V and Destiny, or games with outdated Forward rendering like FM5 and FH2 that don't require much power due to simple lighting.
 
But at least you would be able to fit in deferred rendering frame buffers in there, rather than shameful 720p vs 1080p on PS4 in games like MGSV. So far the only games Xbone manages to handle 1080p is either lastgen ports like GTA V and Destiny, or games with outdated Forward rendering like FM5 and FH2 that don't require much power due to simple lighting.

Forward rendering is outdated? Says who? You? Based on what? And who cares anyway if something uses "outdated" rendering when it produces games like Forza Horizon 2? Serious question. Btw, FH2 uses Forward+ and not Forward.
And yes, it is shameful that there is one engine that doesn't fit to the bone's hardware. But there were a lot of discussions about who is to blame here. Given your post it is also clear which side you chose.
Btw, from http://www.eurogamer.net/articles/digitalfoundry-2014-hands-on-with-driveclub:
""It is. It's a mixture actually, we do lots of things," he says. "We've got a deferred rendering system, tile-based rendering system, a forward rendering system, all mixed together just to get all the variety you can [have] in certain situations and put them together.""
There is no definite solution on how to render.
 
Forward rendering is outdated? Says who? You? Based on what? And who cares anyway if something uses "outdated" rendering when it produces games like Forza Horizon 2? Serious question. Btw, FH2 uses Forward+ and not Forward.
And yes, it is shameful that there is one engine that doesn't fit to the bone's hardware. But there were a lot of discussions about who is to blame here. Given your post it is also clear which side you chose.
Btw, from http://www.eurogamer.net/articles/digitalfoundry-2014-hands-on-with-driveclub:
""It is. It's a mixture actually, we do lots of things," he says. "We've got a deferred rendering system, tile-based rendering system, a forward rendering system, all mixed together just to get all the variety you can [have] in certain situations and put them together.""
There is no definite solution on how to render.

Deffered rendering gives you tons of lights with actual shadows being cast from them, FM5/FH22 only have shadows from the sun, headlight and lampposts don't cast anything, forward rendering is used mostly for transparencies, because deferred doesn't play well with it, but other than that it's mostly obsolete. Deffered rendering gives you much better lighting with good performance, but of course required a big frame buffer, which Xbone's eSRAM can't fit, and DDR3 is too slow to handle in 1080p. Not to mention that FM5/FH2 don't use any of the advanced modern techniques like screen space reflections or any kind of GI.
 
Deffered rendering gives you tons of lights with actual shadows being cast from them, FM5/FH22 only have shadows from the sun, headlight and lampposts don't cast anything, forward rendering is used mostly for transparencies, because deferred doesn't play well with it, but other than that it's mostly obsolete. Deffered rendering gives you much better lighting with good performance, but of course required a big frame buffer, which Xbone's eSRAM can't fit, and DDR3 is too slow to handle in 1080p. Not to mention that FM5/FH2 don't use any of the advanced modern techniques like screen space reflections or any kind of GI.

Crysis 3 is forward + and has SSR + GI.
I don't believe forward+ is outdated at all.
 
Deffered rendering gives you tons of lights with actual shadows being cast from them, FM5/FH22 only have shadows from the sun, headlight and lampposts don't cast anything, forward rendering is used mostly for transparencies, because deferred doesn't play well with it, but other than that it's mostly obsolete. Deffered rendering gives you much better lighting with good performance, but of course required a big frame buffer, which Xbone's eSRAM can't fit, and DDR3 is too slow to handle in 1080p. Not to mention that FM5/FH2 don't use any of the advanced modern techniques like screen space reflections or any kind of GI.

Again, there is no "this or that", as the quote from the DC devs shows.
And there is that from a banned site:
"Forza Horizon 2 is definitely a sweet looking game, and part of the reason for that visual glitz is in a rather new technology named Forward Plus Rendering, as explained today by Creative Director Ralph Fulton during a behind-closed-doors presentation at Gamescom.

According to Fulton Forza Horizon 2 is the first console game to use the tech, which allows it to display thousands of dynamic lights in real time, making the representation of cities like Nice possible.

In addition to this we also learn that the game has seven different radio stations, and over twice as many tracks as those included in the first Forza Horizon.

Finally, Fulton also had words of praise for the Drivatar feature, mentioning that the day in which they turned them on at the studio, racing in the game “completely changed in front of their eyes.”

That’s definitely promising, and I honestly can’t wait to get my hands on the wheel again tomorrow."

Again, FH2 used forward+ and not only forward rendering:

"Digital Foundry: You're first to ship a console game with Forward+ rendering - what is it and what specific advantages does it bring to the table?

John Longcroft-Neal: The Forward+ technique we use is probably better described as 'Clustered Forward+'. All Forward+ techniques revolve around splitting up the screen into a regular grid of sub-rectangles (typically 32x32 pixels), then finding out which lights potentially effect each sub-rectangle before you start rendering. During surface shading, you load up the reduced light list for that sub-rectangle and process only those lights. The goal is to avoid processing lights that have no effect on the surfaces in the sub-rectangle.

The standard Forward+ technique uses a depth texture of the scene to cull lights from the list. There are two issues with this approach; firstly you need to render the depth texture as a pre-pass before the main scene in order to create the light lists; secondly semi-transparent surfaces cannot render to the depth pre-pass.

Clustered Forward+ avoids the need for a depth pre-pass altogether by calculating light lists at multiple depths for each sub-rectangle and using the most appropriate cluster during surface shading. We generate the light cluster data all on the GPU using Compute shaders and this is done for any rendered view that requires lights.

The advantage of Forward+ for us is that it just works with MSAA, at any level, whereas deferred techniques struggle to maintain decent anti-aliasing. Secondly you get the other benefits of forward shading such as allowing complex material types such as carbon fibre and car paint that are difficult to achieve using deferred techniques. We found that we could easily 'plug-in' Forward+ to the existing shaders which were already designed for forward rendering. The advantages of the Clustered approach to Forward+ for us were that semi-transparent surfaces did not need special consideration and most importantly we did not need to render a depth pre-pass."

http://www.eurogamer.net/articles/digitalfoundry-2014-the-making-of-forza-horizon-2

And it even features 4xMSAA which I don't know any other game of on current gen.
 
Basically Microsoft used eDRAM on the Xbox 360 which gave them quite a nice advantage back then. Based on that experience they decided to make use of that component again with an even faster eSRAM.

Deferred Rendering can reduce calculations when you have lots of dynamic lighting in a scene but it also limits you in other regards (transparencies, etc.).

As for the eSRAM and it's usage. As we know from Microsoft/AMD presentations it's currently mainly used as a full render target. They expect more efficient uses of the eSRAM in time (such as partitioning render targets to partially recide in the eSRAM and partially in main memory based on how much work is expected in certain areas (e.g. sky doesn't have dynamic lighting applied).

So there is potential there for those issues to be overcome. Wether the FOX engine can/will be updated in the end remains to be seen.
 
Crysis 3 is forward + and has SSR + GI.
I don't believe forward+ is outdated at all.

word.

rendering-technologies-from-crysis-3-gdc-2013-15-638.jpg
 
Deffered rendering gives you tons of lights with actual shadows being cast from them, FM5/FH22 only have shadows from the sun, headlight and lampposts don't cast anything, forward rendering is used mostly for transparencies, because deferred doesn't play well with it, but other than that it's mostly obsolete. Deffered rendering gives you much better lighting with good performance, but of course required a big frame buffer, which Xbone's eSRAM can't fit, and DDR3 is too slow to handle in 1080p. Not to mention that FM5/FH2 don't use any of the advanced modern techniques like screen space reflections or any kind of GI.

I dont know where to start here.......but to sum it up DDR3 IS NOT TOO SLOW FOR 1080p, Where did you get that nonsense? You have no idea what you are talking about at all
 
But at least you would be able to fit in deferred rendering frame buffers in there, rather than shameful 720p vs 1080p on PS4 in games like MGSV. So far the only games Xbone manages to handle 1080p is either lastgen ports like GTA V and Destiny, or games with outdated Forward rendering like FM5 and FH2 that don't require much power due to simple lighting.

The esram along with move engines and virtual addresses are enough for fat 1080p gbuffers. The problem lies in the software side of things, as Ms didn't expose many ways where developers could tackle this problem until later versions of the SDK. So developers were essentially locked into: Does it fit entirely on esram? Great, otherwise reduce it until it does.

This is likely to change for newer games because they now have more options. You can now for example, partition the buffer and keep the most expensive pixels on esram and the cheap ones on ddr3. You can also partition the pixels themselves, for instance, if you have a fat gbuffer like Infamous (96 bits per pixel, IIRC), running the profile tool you could find out which channels of the gbuffer are actually being bandwidth bound, and for example store the more hungry 64bits per pixel on esram and the 32 that are less used on ddr3.

On top of that, instead of statically allocating the buffers on esram they can allocate a small portion of it as a working buffer, and coordinate the move engines to move used data back to the ddr3 (to make room for new data) and bring new data to be used from it. That obviously is the most difficult to achieve, but I believe the simultaneous read/writes nature of the esram make this a very viable option.

Oh, and FH2 uses forward +, which also allows for many light sources in the scene, their lighting model is not simple at all.
 
Crysis 3 is forward + and has SSR + GI.
I don't believe forward+ is outdated at all.

Could Crysis 3 run on Xbone in 1080p? Probably not. Ryse uses a more advanced version of CE3 and has GI/SSR/SSS and what not, but is 900p with dips from 30 fps.
 
Could Crysis 3 run on Xbone in 1080p? Probably not. Ryse uses a more advanced version of CE3 and has GI/SSR/SSS and what not, but is 900p with dips from 30 fps.

Yep, a launch game with a bad sdk. And why shouldn't Crysis 3 not be able to run on Xbone in 1080p?
 
Could Crysis 3 run on Xbone in 1080p? Probably not. Ryse uses a more advanced version of CE3 and has GI/SSR/SSS and what not, but is 900p with dips from 30 fps.

I have no idea if Crysis 3 could run at 1080p/very high settings, that's a question for Crytek. Who knows what kind of optimization is possible with proper time/budget.
In the meantime your comment about forward+ is not accurate, I don't know if Ryse is strictly forward+.

Forza Horizon looks magnificent to me. If a forward+ renderer allows for GI, SSR and other advanced techniques sub1080p could be a worthy trade off.
 
If you look at the big picture the actual architecture is pretty much the same (which is also part of the article). There are two/three bigger differences in certain parts but not the overall architecture - those being: higher main memory bandwidth on the PS4, the existence of an eSRAM cache on the Xbox One and of course the additional CUs and RUs in the PS4 GPU (only one of those is an architectural difference - the eSRAM).

If you look at the number of transistors used on the chips (Xbox One including the eSRAM, PS4 including the extra CUs) which is the main metric of how expensive (big) such a chip gets those are likely to cost a quite similar amount.

Where Sony does use more expensive components is the RAM. As has often been stated they profited immensly from price drops on GDDR5 memory which enabled them to use 8GB while still achieving the $ 399 pricepoint.

The more interesting part of the article however is how the software deals with the different states and how resources are allocated. I think it is quite an achievement that they were able to free parts of the seventh core for developers while still keeping the voice command functionality of Kinect active. Meanwhile Sony by all accounts is still reserving two full cores for their OS.

I'm looking forward to the future entries in that article series. I'd also be interested to see how Sony is dealing with the resource allocations on the PS4.
What you call an "achievement" with the use of the 7th core, I actually call the utter failure of their original vision compounded.

That extra useage doesn't come free, out of thin air, or pure "l33t coder skillz". That extra power was supposed to be 100% available for things such as Kinect always available at no additional cpu cost, 3 OSes running in the background and multitasking apps simultaneously while snap is active.

The more they push the envelop that way, the more you'll be likely to see situations where you'll see the non gaming related uses (voice commands/ skype/ app snapped...) cause slow downs and small hang ups, particularly where games will start really pushing the hardware.

The kind of "pushing the resources" we're seeing now is looking more like "desperation to find any little crumb of power available" to me, at the possible cost of overall instability in some situations.

Time will tell.
 
Interesting, is this from the XBO? Near perfect balance of FP/Int performance if this is, I didn't know that about Jaguar. The past generations (and the Wii U) tended to emphasize one at the expense of the other, making developers use mostly one to use the processor better.

branching-vs-integer-mask.jpg
 
The more they push the envelop that way, the more you'll be likely to see situations where you'll see the non gaming related uses (voice commands/ skype/ app snapped...) cause slow downs and small hang ups, particularly where games will start really pushing the hardware.

Time will tell.

MS clearly says to devs what they can expect. Snapped apps won't "harm" the game os, therefore they have the hypervisor as explained before in this thread.
 
I'm a huge fan of "Forward+" techniques (which are usually really just forward rendering with smarter light selection), simply because they allow you to do real AA rather cheaply.
 
The esram along with move engines and virtual addresses are enough for fat 1080p gbuffers. The problem lies in the software side of things, as Ms didn't expose many ways where developers could tackle this problem until later versions of the SDK. So developers were essentially locked into: Does it fit entirely on esram? Great, otherwise reduce it until it does.

This is likely to change for newer games because they now have more options. You can now for example, partition the buffer and keep the most expensive pixels on esram and the cheap ones on ddr3. You can also partition the pixels themselves, for instance, if you have a fat gbuffer like Infamous (96 bits per pixel, IIRC), running the profile tool you could find out which channels of the gbuffer are actually being bandwidth bound, and for example store the more hungry 64bits per pixel on esram and the 32 that are less used on ddr3.

On top of that, instead of statically allocating the buffers on esram they can allocate a small portion of it as a working buffer, and coordinate the move engines to move used data back to the ddr3 (to make room for new data) and bring new data to be used from it. That obviously is the most difficult to achieve, but I believe the simultaneous read/writes nature of the esram make this a very viable option.

Oh, and FH2 uses forward +, which also allows for many light sources in the scene, their lighting model is not simple at all.

Okay, maybe there is a workaround solution, but will anyone apart from 1st Devs get down to fully utilize it? Seems like a ton of work instead of simply reducing resolution like it's done now.
For FH2 again, yes it has many light sources, but none of them cast shadows. Only the sun does. Compare that to deferred rendering like Gran Turismo, DC, PCars where shadows can be cast from street light and headlight, it's much more impressive looking and requires a lot more processing power. Playing on X1 myself, and FH2 looks very similar to GTA V in terms of graphics, except again in GTA V street lights actually cast dynamic shadows for all objects.
 
The Xbox One’s DDR3 2133MHZ RAM can theoretically push 68GB/s (that number is the sum of both read and write to the DRAM).

Wait, so it's 34GB in either direction? Argh, I hate it when companies get confusing whether it's read and write or read or write.

Typically DRAM bandwidth is measured in either direction, not combined, right? So say dual channel DDR3 1600 is 25.6 GB/s, and that's going both up and down, it's not a combined figure. What is the PS4s figure, combined or one direction?
 
If you look at the big picture the actual architecture is pretty much the same (which is also part of the article). There are two/three bigger differences in certain parts but not the overall architecture - those being: higher main memory bandwidth on the PS4, the existence of an eSRAM cache on the Xbox One and of course the additional CUs and RUs in the PS4 GPU (only one of those is an architectural difference - the eSRAM).

If you look at the number of transistors used on the chips (Xbox One including the eSRAM, PS4 including the extra CUs) which is the main metric of how expensive (big) such a chip gets those are likely to cost a quite similar amount.

Where Sony does use more expensive components is the RAM. As has often been stated they profited immensly from price drops on GDDR5 memory which enabled them to use 8GB while still achieving the $ 399 pricepoint.

The more interesting part of the article however is how the software deals with the different states and how resources are allocated. I think it is quite an achievement that they were able to free parts of the seventh core for developers while still keeping the voice command functionality of Kinect active. Meanwhile Sony by all accounts is still reserving two full cores for their OS.

I'm looking forward to the future entries in that article series. I'd also be interested to see how Sony is dealing with the resource allocations on the PS4.

You won't be hearing about it.

Nice article, but i find it funny that the XB1's hardware is still being so scrutinised. As if people are looking for something that's not there.
 
Wow, so much games? Thanks for entertaining me. Oh, it's not even 1080p. Even better.

MSAA doesn't play well with deferred rendering and seeing how most games use it MSAA is out of question, they have to use post processing AA. Now, MSAA isn't perfect either, the foliage in FH2 is particularly bad in this regard, sometimes it ends up looking horrendous.
 
Top Bottom