Wii U CPU |Espresso| Die Photo - Courtesy of Chipworks

Here is a a quick analysis based on some of what I've gathered so far.

I believe it has been stated that Espresso is 6 stage(was this confirmed?)? up from 4 stage with broadway.

The CPU in the PS4/XbO are 21 stage I believe?

So as I'm seeing their performance:

Espresso : 3core, 1.24 Ghz, 6 stage

PS4 CPU/XboxOne CPU: 8core, 1.6 Ghz, 21 stage

Since the PS4/XboxOne CPUs has over 3 times as many stages, that makes data execution take a little over 3 time as long.

Comparing the CPU's using these facts, that would make the numbers 3.72/12.8 would make the PS4/One CPU 3.4 times as strong as is, but then you add in the fact that the they take over 3 times as long to complete as cycle(3.4/3), and you get about 1.15 times as strong?

Please inform yourself about the functionality of an pipeline (wikipedia, for example). Pipelining doesn't decrease instruction throughput.

How many instruction does the XboxOne/PS4 CPU execute per core, per cycle? It is read4execute2 per core 4 Espresso if no changes have been made to the architecture.

Jaguar can execute 8 operations per cycle (128bit-SIMD with FMA), Espresso can do 4 (64bit-SIMD with FMA). This is only the theoretical maximum though that won't be reached in most of normal game code. Thus is doesn't have to say much about real world performance.
 
Please inform yourself about the functionality of an pipeline (wikipedia, for example). Pipelining doesn't decrease instruction throughput.
From what I read, there a lot of hangups brought about through pipe lining primarily because of branching. I may need to look into this more. I've seen the structure of the PP750. It is practically devoid of branching bottlenecks.

Jaguar can execute 8 operations per cycle (128bit-SIMD with FMA), Espresso can do 4 (64bit-SIMD with FMA). This is only the theoretical maximum though that won't be reached in most of normal game code. Thus is doesn't have to say much about real world performance.


Espresso can only output 4? How did you come to that conclusion? Last I saw, each core had the functionality of a Broadway core. That would mean that each core is capable of receiving 4 instructions and outputting 2 per cycle.
 
From what I read, there a lot of hangups brought about through pipe lining primarily because of branching. I may need to look into this more. I've seen the structure of the PP750. It is practically devoid of branching bottlenecks.

As with most things pipelining also has disadvantages, primarily with branches and data hazards. It's far from being anywhere near as bad as in your previous calculation though which would destroy the whole purpose of pipelining (and of course modern Intel and AMD CPUs do have very sophisticated branch predicition and out of order capatibilities in order to compensate for that).

Espresso can only output 4? How did you come to that conclusion? Last I saw, each core had the functionality of a Broadway core. That would mean that each core is capable of receiving 4 instructions and outputting 2 per cycle.

My numbers were meant per core.
 
As with most things pipelining also has disadvantages, primarily with branches and data hazards. It's far from being anywhere near as bad as in your previous calculation though which would destroy the whole purpose of pipelining (and of course modern Intel and AMD CPUs do have very sophisticated branch predicition and out of order capatibilities in order to compensate for that).

I'm not supreme hardware expert, but yeah, this is a common misconception I see. Pipeline depth won't significantly hurt throughput (really at all) when no branching is involved. When branching is involved, the shallower pipeline will beat it (all else being equal) in most cases, but it's not as simple as using multipliers based on amount of stages to determine performance. Real world performance will be somewhere in between the "ideal" scenario and the worst case scenario (krizzx's calculations).
 
Real world performance will be somewhere in between the "ideal" scenario and the worst case scenario (krizzx's calculations).

Even in worst case the amount of discarded instructions doesn't equal the amount of pipeline stages btw. Just the ones up to the stage where the branch is actually executed (usually there are a few stages behind that).


Wait, are you saying that all 8 cores of the PS4/XOne CPU can do 8 instruction per cycles(making the total count 64 instructions per cycle)?

Yes. Through SIMD instructions (SSE, AVX), a Jaguar core can execute 4 multiplications and 4 additions simultaneously (source).
 
tumblr_mky23oiTiz1rblqp8o1_250.gif
 
http://24.media.tumblr.com/445e9e3e26ccdf8ce9380658f044e3fa/tumblr_mky23oiTiz1rblqp8o1_250.gif

What exactly is the purpose of posting something like this in here?


Well, anyway, lets I wanted to get back to the graphical functionality that the CPU is capable of. I was rereading some of the older Shin'en interviews and this caught my attention.

For instance, FAST – Racing League on Wii had problems to maintain solid 60fps when having two players splitscreen. For that case we added a CPU based Occlusion culling system. Since then every game we do can use that system, no matter if it’s on 3DS or Wii U. As the complete engine is powered by an own designed scripting language there are no boundaries. New code and modules are exposed to the script and then they can be freely used.

Also could someone explain the bolded here?

They put a lot of thought on how CPU, GPU, caches and memory controllers work together to amplify your code speed. For instance, with only some tiny changes we were able to optimize certain heavy load parts of the rendering pipeline to 6x of the original speed,

This makes me think of what Fourth Storm was saying above. Might there be better performance then what the raw numbers are suggesting? People are factoring out the technical efficiency of the PowerPC series when doing these comparisons.

We need to explore this increase in registers a little bit more. In general, what types of performance gain can this bring.
 
Occlusion culling is determining what's visible on screen and what's not so things that are hidden/occluded can be prevented from rendering, to save resources. So not direct graphics work per se, but something the CPU can do to ensure that GPU resources aren't being wasted.
 
It's a very commonly used technique in modern games. Culling on the SPUs was actually really important on the PS3 because its GPU was so triangle setup limited compared to the 360's. Hell, back when 3Dfx was on their last gasp one of their last ditch innovations, after falling behind to nVidia's introduction of hardware transformation and lighting, was occlusion culling in software as an attempt to lessen that gap.
 
I see. Now What about the 6x increase in speed from using the CPU and GPU in conjunction. Might this mean that the bus is not a limiting as previously thought?
 
At this point, it should be at least safe to say that Espresso is 10x the strength of Broadway given the multicore capability, increased cache and presumably increased registers.

I want to get more into the functional usage of the CPU. What can it realistically do and not do?

Taken what we've seen on the Wii/GC as a baseline, Espresso should be capable of anything you've seen x10.

Also, since the CPU does have the GC functionality, it should also be able to help with drawing polygons and producing texture effects. I feel that there is some hidden advantage here that we are missing especially going back to the Shin'en comment about getting a huge performance boost from using the GPU and CPU in conjunction.

We have a CPU that can help with graphics and a GPU that can help with general purpose code. Time to start hypothesizing.
 
It just occured to me, we've had enough reasonable information that the launch game worren't using all of the CPU cores by now I think.

We use the eDRAM in the Wii U for the actual framebuffers, intermediate framebuffer captures, as a fast scratch memory for some CPU intense work and for other GPU memory writes.

In fact, I thin kin general a most of them were mainly using only core one. Sold, with that being said that would mean that the Wii U was able to handle those CPU heavy ports from the PS3 and 360 the way it did with little more than the main core. Would that not mean that the Espresso when utilized properly is considerably more capable than the PS3/360 CPUs in games?
 
It just occured to me, we've had enough reasonable information that the launch game worren't using all of the CPU cores by now I think.



In fact, I thin kin general a most of them were mainly using only core one. Sold, with that being said that would mean that the Wii U was able to handle those CPU heavy ports from the PS3 and 360 the way it did with little more than the main core. Would that not mean that the Espresso when utilized properly is considerably more capable than the PS3/360 CPUs in games?
I thought everyone paying attention had already kinda figured that was the case, considering that it wasn't until after those had released that the system not automatically sending tasks to the other cores came to light. It seems to me they'd have to rewrite way too much for the resources and time allocated to the ports to have done much with the other cores.
 
I thought everyone paying attention had already kinda figured that was the case, considering that it wasn't until after those had released that the system not automatically sending tasks to the other cores came to light. It seems to me they'd have to rewrite way too much for the resources and time allocated to the ports to have done much with the other cores.

This is the point I was getting at. The Wii U CPU was keeping up with the PS3/360 in ports relatively well with only 1 core being fully utilized. Would this not mean that the the CPU gets superior performance more often than not if all cores are used?
 
This is the point I was getting at. The Wii U CPU was keeping up with the PS3/360 in ports relatively well with only 1 core being fully utilized. Would this not mean that the the CPU gets superior performance more often than not if all cores are used?
Makes sense to me, but I'm not one of the tech gurus... or a dev that worked on any of the launch ports to know if they used the cores at all or not.
 
Just had a crazy dumb idea: what if Pikmin 3 used only one core and change? Or is it too unrealistic?

The game was a Wii game for most its life. It was made to run on the Wii's CPU and GPU. Honestly, it doesn't look like they did much to improve it over the how it looked on the Wii other than making it 720p when you see the graphics in games like Nintendo Land,

Just look at how muddy and undetailed the textures are in comparison.

I wouldn't be surprised if it wasn't even using all of the main core and was only using as many resources as it did on the Wii. Pikmin 3 should have looked so much better looking at the power of the Wii U in the games that have been built from the ground up for the console.

Going by all of the aliasing it had, I'm certain that it was using upscaled textures and not true 720p textures. That is also probably why it wasn't 1080p. Upscaling 480p textures to 1080p doesn't look all that great.
 
The game was a Wii game for most its life. It was made to run on the Wii's CPU and GPU. Honestly, it doesn't look like they did much to improve it over the how it looked on the Wii other than making it 720p when you see the graphics in games like Nintendo Land,


Just look at how muddy and undetailed the textures are in comparison.

I wouldn't be surprised if it wasn't even using all of the main core and was only using as many resources as it did on the Wii. Pikmin 3 should have looked so much better looking at the power of the Wii U in the games that have been built from the ground up for the console.

Going by all of the aliasing it had, I'm certain that it was using upscaled textures and not true 720p textures. That is also probably why it wasn't 1080p. Upscaling 480p textures to 1080p doesn't look all that great.


I think you are speculating too much ...


I agree with lherre, and I'm going to go with him knowing a lot more on the subject than we do.


Lherre, I recall you having insider info, yes? Does your knowledge base include any of the launch games and whether or not they made effective use of all 3 cores?
 
I agree with lherre, and I'm going to go with him knowing a lot more on the subject than we do.


Lherre, I recall you having insider info, yes? Does your knowledge base include any of the launch games and whether or not they made effective use of all 3 cores?

Speculation is the primary point of these threads. Its the best way to unfold results.

Right now, I'm only following leads to determine the the potential of the CPU.
 
Ok, I'm going to step in here and say you haven't seen Pikmin 3 in the flesh. Yes, the floor textures are one of the telling signs it's from a Wii build (although they look great from POV you play), to say they made no upgrades is hilarious.

zlCfzRDSoSMgy0CPw_


Burgeoning_Spiderwort.png
 
We aren't speculating nearly enough : )

Indeed. This thread is horribly underused. There is more talk about the CPU in the GPU thread than in this one.

I want to know the truth beyond was the more inexperienced and/or incompetent devs stated like the one that called the CPU horrible and slow without even actually writing a single line of code for it.
 
Ok, I'm going to step in here and say you haven't seen Pikmin 3 in the flesh. Yes, the floor textures are one of the telling signs it's from a Wii build (although they look great from POV you play), to say they made no upgrades is hilarious.

zlCfzRDSoSMgy0CPw_


Burgeoning_Spiderwort.png

I wasn't say the entirety of Pikmin 3 still looked like it did for the Wii, but it clear that in the case of this game and New Super Mario Bros U that they didn't recreate most of the assets.

As for the processing aspects of the game, which is more relevant to the CPU, it also doesn't seem like it doing that much. Pikmin 1 and 2 for the Gamecube both gave you 100 pikmin to control using Gekko with all else that was going on. This one seems to be operating on the same subsystem.

Correct me if I'm wrong, but isn't the game still limited to only 100 Pikmin on land at a time right? Pikmin 3 doesn't seem to be a great example of the CPU's capabilities.
 
Correct me if I'm wrong, but isn't the game still limited to only 100 Pikmin on land at a time right? Pikmin 3 doesn't seem to be a great example of the CPU's capabilities.

Yeah, but I'm not sure it's a CPU problem. The maps seem to small for more than 100 Pikmin. I just don't see the use of let's say 300 Pikmin running around. The game would have to be much larger.
 

Another advantage that the PPC750 tech has over the substantially ancient X86 tech I believe. You honestly can't match the performance of the CPU's per clock.

The PPC tech seems to be almost devoid of bottlenecks on top of being able to execute two instructions per core. These things never pop up in the conversation when you people posting things like "PS4/XOne CPU 8 core 1.6Ghz, Wii U CPU 3 Core 1.24(assuming they don't know off the 4 like usual)Ghz. PS4 CPU 14 billion times stronger". The most important thing in any moder CPU is the tech.

Just like a 2core 2.4 Ghz Pentium D is outperformed by a 2core 2.4 AMD Ahlon
A 2core 2.4 Ghz AMD Athlon is outperformed by a 2.4 Ghz Core 2 Duo
A 2.4 Ghz Core 2 Duo is outperformed by a 2Core 2.4 Ghz core i3
A 2.4 Ghz Corei3 is outperformed by a 2.4 Ghz Core i5 gen1
A 2.4 Ghz Core i5 gen 1 is outperformed by a 2.4 Ghz Core i5 gen2
A 2.4 Ghz Core i5 gen 2 is outperformed by a 2.4 Ghz Core i5 gen3.

Clock speeds mean next to nothing for performance anymore. All they tell is how much energy it takes for your CPU to do what its does, not what it can do.
 
Another advantage that the PPC750 tech has over the substantially ancient X86 tech I believe. You honestly can't match the Performance of the CPU's for clock.

The PPC tech seems to be almost devoid of bottlenecks on top of being able to execute two instructions per core.

360 was also PPC...
 
360 was also PPC...

I was unaware that I scaled it back to just PPC in general and didn't specify PPC750.

Though, actually, I think you are incorrect. The 360 CPU is using a "Power" CPU not PowerPC". Its a Power5/5a derivative if I'm not mistaken.

Espresso is PPC 750CL "based" CPU if my memory serves me correctly and it has some Power7 tech in it(I believe the cache and maybe a few other things)
 
Pikmin 3 only has 100 Pikmin on the field at any given time but you also have to take into account the AI for each Pikmin has improved by a huge margin. You also have more than one protagonist that you can give orders to and switch back and forth between. That can be CPU intensive right?

As a side-note the graphics are also amazing in the game <3
 
I was unaware that I scaled it back to just PPC in general and didn't specify PPC750.

Though, actually, I think you are incorrect. The 360 CPU is using a "Power" CPU not PowerPC". Its a Power5/5a derivative if I'm not mistaken.

Espresso is PPC 750CL "based" CPU if my memory serves me correctly and it has some Power7 tech in it(I believe the cache and maybe a few other things)

IBM, on the other hand, had started a chip engineering services business and was perfectly willing to customize a PowerPC design for Microsoft, says Jim Comfort, an IBM vice president. At first IBM didn't believe that Microsoft wanted to work together, given a history of rancor dating back to the DOS and OS/2 operating systems in the 1980s. Moreover, IBM was working for Microsoft rivals Sony and Nintendo. But Microsoft pressed IBM for its views on multicore chips and discovered that Big Blue was ahead of Intel in thinking about these kinds of designs.
http://www.designnews.com/document.asp?doc_id=224354&dfpPParams=ind_182,aid_224354&dfpLayout=article

The CPU was designed uniquely for Microsoft and for use in the Xbox 360 using the system architecture specifically defined around customer requirements. Microsoft and IBM engineers worked together during the definition phase of the project to specify a design to satisfy the constraints of a mass-produced consumer device. We used existing PowerPC processor and subsystem technology and designs as a foundation to jump-start the development. The chip was developed by the IBM Engineering & Technology Services group, leveraging results from the IBM R&D labs

http://www.ibm.com/developerworks/power/library/pa-fpfxbox/

Xenon is based on PPC, they just went a different direction than they did with Nintendo's chips.
My point, though, was that you used a tweet talking about the differences between two PPC variations to make a statement about x86. There's no logical connection between your claim and what that tweet says...
 
I was unaware that I scaled it back to just PPC in general and didn't specify PPC750.

Though, actually, I think you are incorrect. The 360 CPU is using a "Power" CPU not PowerPC". Its a Power5/5a derivative if I'm not mistaken.

PowerPC is just consumer branding for IBM's POWER ISA.
 
The difference is that x86 has had waaaaaaaaaaaayyyyyyy more evolution in the consumer market.

You're right that PPC was clock-for-clock more efficient. But that was before Core happened.
 
The difference is that x86 has had waaaaaaaaaaaayyyyyyy more evolution in the consumer market.

You're right that PPC was clock-for-clock more efficient. But that was before Core happened.

Not to mention that the gap has significantly been reduced since Sandy Bridge/Ivy Bridge/ and now Haswell.

The Core (Core iX) series was the best thing to happen to the x86 arch.

It made Apple (a company that swore by IBM's PowerPC arch) switch to Intel for the first time ever 7 years ago. A company that came up with advertisements rightfully deriding the x86 arch with Intel as the MHZ myth and touting the superiority of their G4/G5 CPUs in their Macs with Photoshop apps etc.

Nintendo is probably the last man standing when it comes to down to it.

Everyone's moving/moved away from the application of PowerPC into consumer based products in the upcoming generation onto x86 or hell, even ARM for that matter.
 
Not to mention that the gap has significantly been reduced since Sandy Bridge/Ivy Bridge/ and now Haswell.

The Core (Core iX) series was the best thing to happen to the x86 arch.

It made Apple (a company that swore by IBM's PowerPC arch) switch to Intel for the first time ever 7 years ago. A company that came up with advertisements rightfully deriding the x86 arch with Intel as the MHZ myth and touting the superiority of their G4/G5 CPUs in their Macs with Photoshop apps etc.

Nintendo is probably the last man standing when it comes to down to it.

Everyone's moving/moved away from the application of PowerPC into consumer based products in the upcoming generation onto x86 or hell, even ARM for that matter.

The fact that it took all of the iterations and revisions to come close the efficiency of the PPC arhictexture is telling of the x86 inefficiecy. I'm sure you patch, staple on, and trim the edges of anything enough, it will become better.

Though that just make me wonder how much more efficient the PPC tech would be with the same number of revisions and reiterations. Aren't they up to the Power7 now? I remember hearing that those defecated on the Cell and Espresso does have a small amount of tech from the Power7 series. I also recall someone saying that Espresso is 6 stage as opposed to 4 like the PPC 750CL. What was added with the 2 extra stages?
 
PPC in the desktop space lived on beyond the 750, the G4 and G5 chips by Apple's parlance. It got more powerful but less efficient. After the G5 IBM wouldn't or couldn't make a good enough mobile version for Apple, hence the aforementioned Intel switch, which was again, seven years ago. That was with the original Core series, which was based on the Pentium M, originally from 2003 (which itself was apparently based on P6/Pentium Pro micro architecture, originally from 1995).

As for extra stages, it's usually to be able to clock higher afaik. I think by itself it inherently makes it less efficient from a theoretical standpoint, but the increased performance from the higher clock cancels that out, along with the other features like improved branch prediction and other fancy stuff that usually comes with newer CPUs that improves efficiency (hell if we know if Espresso got any major changes in that regard though).
 
PPC in the desktop space lived on beyond the 750, the G4 and G5 chips by Apple's parlance. It got more powerful but less efficient. After the G5 IBM wouldn't or couldn't make a good enough mobile version for Apple, hence the aforementioned Intel switch, which was again, seven years ago. That was with the original Core series, which was based on the Pentium M, originally from 2003 (which itself was apparently based on P6/Pentium Pro micro architecture, originally from 1995).

As for extra stages, it's usually to be able to clock higher afaik. I think by itself it inherently makes it less efficient from a theoretical standpoint, but the increased performance from the higher clock cancels that out, along with the other features like improved branch prediction and other fancy stuff that usually comes with newer CPUs that improves efficiency (hell if we know if Espresso got any major changes in that regard though).

Well, there is the known enhanced cache that is use Power7 memory and there was a comment from a person who counted more registers then what were assumed to be present going by it being based on Broadway.

Of course, finding all of that out is the point of this thread. I just which there was more buzz in here.
 
The fact that it took all of the iterations and revisions to come close the efficiency of the PPC arhictexture is telling of the x86 inefficiecy


PPC in the desktop space lived on beyond the 750, the G4 and G5 chips by Apple's parlance. It got more powerful but less efficient. After the G5 IBM wouldn't or couldn't make a good enough mobile version for Apple, hence the aforementioned Intel switch, which was again, seven years ago. That was with the original Core series, which was based on the Pentium M, originally from 2003 (which itself was apparently based on P6/Pentium Pro micro architecture, originally from 1995)
What's wacky, is krizzx seems to think PPC is so efficient, whereas Apple who actually designed hardware based on the architecture saw a dead end (despite them marketing PPC as being all powerful compared to X86 prior to the switch). As soon as Apple did make the switch to x86, the performance was far greater in most areas, and at lower power consumption. Without the switch to x86 all the current apple products wouldn't be possible (which are pretty much all laptop mobo designs, to fit in the slim chassis's of mac mini's, imacs' etc..).
http://www.anandtech.com/show/1702/5
http://arstechnica.com/apple/2006/08/macpro/7/
 
What's wacky, is krizzx seems to think PPC is so efficient, whereas Apple who actually designed hardware based on the architecture saw a dead end (despite them marketing PPC as being all powerful compared to X86 prior to the switch). As soon as Apple did make the switch to x86, the performance was far greater in most areas, and at lower power consumption. Without the switch to x86 all the current apple products wouldn't be possible (which are pretty much all laptop mobo designs, to fit in the slim chassis's of mac mini's, imacs' etc..).
http://www.anandtech.com/show/1702/5
http://arstechnica.com/apple/2006/08/macpro/7/

You are mistaken and it was a swtich to a much more modern and revised x86.

Espresso and the Power7 both prove that they didn't reach a "dead end" at all. They reached a point where Intel CPU had gained the upper hand for cost vs. performance in hardware. x86 simply progressed at a faster rate as there are "two" major x86 manufacturers competing with each other for dominance to push its growth along.

Every time a new iterations of Power CPU's come out, they are usually better than all X86 CPUs at the time. When the switch was made, it was simply not one of those times.
 
So, how did we get the 1.24 Ghz is the clock rate?

Its a slow day, and I am curious about what the rumored/ speculated specs are, so if someone could tell me, I would be very appreciative.
 
What's wacky, is krizzx seems to think PPC is so efficient, <snip>
PPC is an ISA (and it's pretty darn good as such). I suspect you're referring to actual CPU devices. If so, then indeed G3 is an extremely efficient CPU design lineup. It does not scale well with clock, but that's beyond the point (P4 scaled very well with clock, and look where it's now).
 
PPC is an ISA (and it's pretty darn good as such). I suspect you're referring to actual CPU devices. If so, then indeed G3 is an extremely efficient CPU design lineup. It does not scale well with clock, but that's beyond the point (P4 scaled very well with clock, and look where it's now).

The Gekko CPU was very efficient compared to it's rivals, de-cached P3s (celerons?), but this was a long time ago, that generation while it's been updated by nintendo over the years, likely hasn't seen the gains today compared to x86 has.

Apple could have kept using the G3 CPU on it's mobile parts, it made for stellar powerbooks in it's day, but really, there is only so far it seemed it could go on battery power, or in lower power applications that apple wanted to do. x86 IS more efficient if we're talking performance per watt. Apple made a good move.

If however, there was competition and the same investment in the low power PPC processors it could have been another story, but clearly IBM was looking at heavy metal, server CPU's and not investing in the low power stuff.
 
The Gekko CPU was very efficient compared to it's rivals, de-cached P3s (celerons?), but this was a long time ago, that generation while it's been updated by nintendo over the years, likely hasn't seen the gains today compared to x86 has.

Apple could have kept using the G3 CPU on it's mobile parts, it made for stellar powerbooks in it's day, but really, there is only so far it seemed it could go on battery power, or in lower power applications that apple wanted to do. x86 IS more efficient if we're talking performance per watt. Apple made a good move.

If however, there was competition and the same investment in the low power PPC processors it could have been another story, but clearly IBM was looking at heavy metal, server CPU's and not investing in the low power stuff.

This. As I've mentioned earlier, it's been a long, long time since IBM cared about the consumer market. Low power? How about practically never.

Plus, x86 is evidently more developer friendly, if Sony choosing it at the behest of developer feedback is anything to go off of.

Kind of curious: what's the best way to give Nintendo feedback?
 
Top Bottom