Wii U CPU |Espresso| Die Photo - Courtesy of Chipworks

What would be the probability of Expresso hitting around the 1.5-1.6 Ghz mark?
It may be possible, but it is probably already be the fastest CPU with a 4-stage pipeline to date at 1.24GHz. Its architecture may not be able to get pushed to a clockspeed much beyond what it is now aside from another die shrink.
 
It may be possible, but it is probably already be the fastest CPU with a 4-stage pipeline to date at 1.24GHz. Its architecture may not be able to get pushed to a clockspeed much beyond what it is now aside from another die shrink.

if the 750cl can do 1.1 i see no reason why espresso couldn't do well over 1.24, wether the wii u is set up in such a way that such a clock bump is possible who knows but i don't think the chip isn't the problem
 
if the 750cl can do 1.1 i see no reason why espresso couldn't do well over 1.24, wether the wii u is set up in such a way that such a clock bump is possible who knows but i don't think the chip isn't the problem

They clocked Wii's 750CL to only 729 MHz because they wanted an easy to cool, durable console and wanted to make sure that preferably every CPU produced reaches that clock rate (as there always is variance in the quality of chips). The Wii U follows the same principle.
If they don't have a sudden change of mind in design philosophy and don't want to risk a noteable increase in warranty cases, it's very unlikely that clock rates will change via an update.
 

evIFWWH.jpg

That's why...
 
It runs at 729Mhz in Wii mode... what they say is that it's impossible to go higher than that or than 1,24Ghz even on WiiU mode.

About the A9 being better than the Espresso, I highly doubt it except for some synthetic tests. In theoretical integer performance it may be true, in some small tests it can also be true, but what sets the Espresso apart from the A9 is it's huge caches, and considering that both the Ouya and mobile phones use slow RAM those 2+0.5+0.5 MB of L2 cache configuration will make a huge difference in real world performance.
Those Cortex A9 doesn't even have on-die L2 cache, and the L2 cache is near 1/2 of the Espresso die. That is a HUGE difference that in no way the A9 can compensate for. Even in SIMD code I doubt that the A9 could compete against the Espresso in real world situations where you always tends to have much more data that can't be stored in the L1 caches.

The pretty sure Tegra 3 does have on die L2 cache.
 
Soooo to simplify what I'm reading, it seems the consensus is that marginal improvements to both the clock speed of the CPU and RAM allocation are possible, Nintendo actually enabling or allowing devs to do so is not plausible? Is this accurate?
 
Is it possible that IBM increased the number of rename registers for this thing? There appear to be three floating point register files, but a regular 750 has 38 physical floating point registers. Looks like they increased the number of physical registers to 48? The integer registers make even less sense to me - should also be 38, but there are four register files. 64 integer registers?
 
Soooo to simplify what I'm reading, it seems the consensus is that marginal improvements to both the clock speed of the CPU and RAM allocation are possible, Nintendo actually enabling or allowing devs to do so is not plausible? Is this accurate?

I don't know if that's the consensus, but the reality is that Nintendo already set the clock speeds for the WiiU and they have been shipping hardware for 6 months based on validation at those clock speeds, so there's no reason anyone should expect any change to miraculously appear. Any performance benefits that actually manifest (and are not simply the result of a placebo effect) are solely improvements in software, whether that's the OS and drivers, or compilers on the development end.
 
Is it possible that IBM increased the number of rename registers for this thing? There appear to be three floating point register files, but a regular 750 has 38 physical floating point registers. Looks like they increased the number of physical registers to 48? The integer registers make even less sense to me - should also be 38, but there are four register files. 64 integer registers?

I'll pretend I vaguely follow you but really? And what could that mean for performance
 
Is it possible that IBM increased the number of rename registers for this thing? There appear to be three floating point register files, but a regular 750 has 38 physical floating point registers. Looks like they increased the number of physical registers to 48? The integer registers make even less sense to me - should also be 38, but there are four register files. 64 integer registers?

Ah, now this is what I like to see. Progressive information.

What would this mean performance wise?
 
I'll pretend I vaguely follow you but really? And what could that mean for performance
I'll just quote Wikipedia:

Consider this piece of code running on an out-of-order CPU:

1. R1=M[1024]
2. R1=R1+2
3. M[1032]=R1
4. R1=M[2048]
5. R1=R1+4
6. M[2056]=R1

Instructions 4, 5, and 6 are independent of instructions 1, 2, and 3, but the processor cannot finish 4 until 3 is done, because 3 would then write the wrong value.

We can eliminate this restriction by changing the names of some of the registers:

1. R1=M[1024] | 4. R2=M[2048]
2. R1=R1+2 | 5. R2=R2+4
3. M[1032]=R1 | 6. M[2056]=R2

Now instructions 4, 5, and 6 can be executed in parallel with instructions 1, 2, and 3, so that the program can be executed faster.

When possible, the compiler would detect the distinct instructions and try to assign them to a different register. However, there is a finite number of register names that can be used in the assembly code. Many high performance CPUs have more physical registers than may be named directly in the instruction set, so they rename registers in hardware to achieve additional parallelism.
Essentially, adding more registers increases the efficiency, and using rename registers means the ISA stays the same. All PowerPC cores expose 32 general purpose registers and 32 floating point registers as defined by the PPC ISA, yet a normal PPC750 has 38 each, an MPC7440 also has 38, an MPC7450 has 48 and a PPC970 has 80.
 
Such an alteration would indicate the chip has been tweaked quite a bit. Indeed it is possible some pieces of the design have been influenced by the Power 7 chips. Maybe the earlier statement by IBM of it being based on watson may have been partly true. The handling of it's mutlicore nature could be derived from it for example. The reality of it could be a mish mash between earlier and later designs. The design pahase may have been a case of trying to get the best power per watt while maintaing backwards compatability, so they take some stuff from Power 7 (OOE for example as opposed to In order from Power 6) marry it with the older PPC7 and bobs your uncle.
 
I'll just quote Wikipedia:


Essentially, adding more registers increases the efficiency, and using rename registers means the ISA stays the same. All PowerPC cores expose 32 general purpose registers and 32 floating point registers as defined by the PPC ISA, yet a normal PPC750 has 38 each, an MPC7440 also has 38, an MPC7450 has 48 and a PPC970 has 80.

So we could be looking at a hefty increase in efficiency (and thus CPU power) over what we believed?

And thinking more could the almost complete lack of specs and documentation from Nintendo pretty launch mean that these early games we've seen have been coded without taking such improvements into account?
 
One rather curious thing is that IBM's Power 6 and Power 7 lines had a low power variant name Express. I wonder....
Having said that AMD had radeons codename Mario and Luigi.
 
So we could be looking at a hefty increase in efficiency (and thus CPU power) over what we believed?
I don't think it's a hefty increase, but there are some things IBM might have done to make the whole thing more efficient. Internal stuff that isn't even visible to developers.
 
Such an alteration would indicate the chip has been tweaked quite a bit. Indeed it is possible some pieces of the design have been influenced by the Power 7 chips. Maybe the earlier statement by IBM of it being based on watson may have been partly true. The handling of it's mutlicore nature could be derived from it for example. The reality of it could be a mish mash between earlier and later designs. The design pahase may have been a case of trying to get the best power per watt while maintaing backwards compatability, so they take some stuff from Power 7 (OOE for example as opposed to In order from Power 6) marry it with the older PPC7 and bobs your uncle.
Even Gekko was already OooE, and so was Power 6. I think the PPE aka Xenon cores are the only commercialized designs in the entire PPC family that weren't.

e: Wikipedia names PPC604 (December 1994) as the first confirmed OooE-design.
 
Even Gekko was already OooE, and so was Power 6. I think the PPE aka Xenon cores are the only commercialized designs in the entire PPC family that weren't.

e: Wikipedia names PPC604 (December 1994) as the first confirmed OooE-design.

The fairly recent PowerPC a2 is in order too
 
Just checking it and Power 6 was in order, power 5 was OOE and so was power 7 so it's rather odd why they took a step back with power 6.
 
That's why...

A small bump to 1.5 wouldn't melt the Wii U. If he said 3.2 ghz yeah you'd be fair in posting that image, but a small .26ghz - .36 ghz bump isn't going to produce that much more heat.

I'm not saying it happened or that it is or isn't feasible, but saying it's not possible at that small of a bump cause the system will overheat and melt is silly.
 
Just checking it and Power 6 was in order, power 5 was OOE and so was power 7 so it's rather odd why they took a step back with power 6.
Yes, I just did my diligence too, and I'm pretty shocked they dropped that for a single architecture version.
Arstechnica said:
IBM has confirmed that POWER6 has the same pipeline depth and roughly the same execution unit configuration as the POWER5. However, there are serious questions that remain to be answered about the new processor's out-of-order execution capabilities, as nothing in the presentation I attended today touched on the POWER6's instruction window. The only mention of the instruction window was IBM's statement that POWER6 has an out-of-order FPU. This fact, along with some of the other details listed below (especially dispatch bandwidth) suggets that POWER6 has little to do with POWER4/5/970, and is indeed an entirely new design.
http://arstechnica.com/uncategorized/2007/02/8823/


So let me move my goal posts and narrow down to the following:
Gamecube and Wii used an OOOE CPU. Finding an OOOE CPU in the Wii U should not be a big revelation at this point, and in no way indicates significant change in architecture.
(increased physical features OTOH do)
 
Any difference in fan speed or heat output from the console would be because of simply fan speed setting tweaking or Nintendo is utilizing the CPU more effectively to speed up the OS.

The most realistic scenario of what the 3.0 update did and what the Summer update will continue to do is Nintendo is shifting the operating system from a more single-threaded OS to something that's using all three of the CPU's cores, using more of the CPU resources reserved for the OS, and therefore potentially outputting more heat.
 
Yes, I just did my diligence too, and I'm pretty shocked they dropped that for a single architecture version.

http://arstechnica.com/uncategorized/2007/02/8823/


So let me move my goal posts and narrow down to the following:
Gamecube and Wii used an OOOE CPU. Finding an OOOE CPU in the Wii U should not be a big revelation at this point, and in no way indicates significant change in architecture.
(increased physical features OTOH do)

We have already been over this earlier in the thread. Espresso appears to have much more capable OOOE functionality than Broadway and Gekko.

Also, taking wsippel comment about the registry increase into account, I think its safe to say that this CPU is far more than 3 Braodway's sandwhiched together on the same die. The clock alone should have made that apparent to most.
 
Is it possible that IBM increased the number of rename registers for this thing? There appear to be three floating point register files, but a regular 750 has 38 physical floating point registers. Looks like they increased the number of physical registers to 48? The integer registers make even less sense to me - should also be 38, but there are four register files. 64 integer registers?

Very interesting. Where are you seeing these additional registers?

We have already been over this earlier in the thread. Espresso appears to have much more capable OOOE functionality than Broadway and Gekko.

Mind pointing out where this was concluded? I'll update the OP if it's true.
 
every confusion would've been cleared if nintendo said something about this. stupid nintendo.

then, there'll be no more fun speculating.

which way to go? exact information or countless hilarious/meaningless speculation?
 
Very interesting. Where are you seeing these additional registers?



Mind pointing out where this was concluded? I'll update the OP if it's true.

Wasn't that found earlier when it was being compared to other PPC750 CPUs?

every confusion would've been cleared if nintendo said something about this. stupid nintendo.

then, there'll be no more fun speculating.

which way to go? exact information or countless hilarious/meaningless speculation?

Nintendo has nothing to gain from revealing its hardware details publicly and much to lose as it would allow the competition an edge by letting them know what they would need to include in order to match/counter various advantages in the hardware. There is also the possibility that Nintendo doesn't fully know exactly what it is capable of being it that they did not make the chip themselves. That is actually likely giving the reports of that have been popping up regarding support.
 
Just look at Marcan's annotated die shot, the registers are below the L2 tags. There are four GPR register files and three FPR register files, and 38 is neither multiple of three nor four.

That is an intriguing possibility. And if I read that Wikipedia entry correctly, this would not necessarily be even detectable to developers, except that code would run faster? I wish we had a better shot of Broadway/Gekko in order to compare and detect if there really was a change made.
 
Nintendo has nothing to gain from revealing its hardware details publicly and much to lose

That much is true, but the real reason is cause nobody would be impressed.
It's the same reason they didn't reavel the Wii specs.
By contrast now we have people like you (deja vu) who believe in some "special sauce".
 
Just checking it and Power 6 was in order, power 5 was OOE and so was power 7 so it's rather odd why they took a step back with power 6.

Yeah, IBM went through a phase of chasing high clock speeds at the expense of architecture efficiency during the Power6 days, just like Intel with the Pentium 4. It's also reflected in the PS3 and 360 processors, no Out of Order processing but lots of cores and threads (for the time and even now) and high clock speeds.

Regarding this whole "maybe they'll increase the clock speed in the future" thing, that would have to make sure every Wii U shipped already is capable of running at the higher clock speed (or risk having essentially two different consoles for devs to target, and risk fragmentation), while it's true that we don't know that Nintendo didn't already validate the CPUs for higher clock speeds what would they have to gain from making it lower from the start? This isn't a mobile device like the PSP where they want to conserve battery power, if it's been validated to clock higher it would have been clocked higher. Plus as mentioned, this is already a high clock for this short a pipeline if not the highest we know about. I won't discount the possibility completely, but I'm putting the chances at Russell's teapot level.
 
That is an intriguing possibility. And if I read that Wikipedia entry correctly, this would not necessarily be even detectable to developers, except that code would run faster? I wish we had a better shot of Broadway/Gekko in order to compare and detect if there really was a change made.
Yeah, I believe this should be completely transparent, as should some other potential changes to the microarchitecture. And it should be possible to disable such improvements if required to ensure perfect backwards compatibility - maybe that's part of what the new HID5 special purpose register does.
 
Yeah, I believe this should be completely transparent, as should some other potential changes to the microarchitecture. And it should be possible to disable such improvements if required to ensure perfect backwards compatibility - maybe that's part of what the new HID5 special purpose register does.
Very interesting. Do we know if Broadway was modified like that too? Since the 750cl is suppose to be sellable version of Broadway, such a change should be present on that processor as well.
 
Very interesting. Do we know if Broadway was modified like that too? Since the 750cl is suppose to be sellable version of Broadway, such a change should be present on that processor as well.
No idea. I know there have been some redesigns and optimizations, but those were probably related to the die shrink.
 
Such an alteration would indicate the chip has been tweaked quite a bit. Indeed it is possible some pieces of the design have been influenced by the Power 7 chips. Maybe the earlier statement by IBM of it being based on watson may have been partly true. The handling of it's mutlicore nature could be derived from it for example. The reality of it could be a mish mash between earlier and later designs. The design pahase may have been a case of trying to get the best power per watt while maintaing backwards compatability, so they take some stuff from Power 7 (OOE for example as opposed to In order from Power 6) marry it with the older PPC7 and bobs your uncle.

No, the PPC 7xx line has always had basic OoOE.

The fact that the Wii U CPU downclocks itself to the same speed as Broadway when running Wii software proves that it can not be based on anything else than Broadway.
If it was a different core then it would not need to clock at the same speed as Broadway to run programs at around the same speed.
Also console games are very sensitive to how fast each system component is (eg the XCGPU needs logic to emulate the latency of the FSB so software designed for the old versions of the 360 did not break), a CPU that can do somethings at faster speeds will likely break a lot of software.

More registers might be fine though (if the registers are GPRs or FPRs and not something else) as long as they are turned off in Wii mode.
 
While I think that it's safe, and even healthy not to give too much credence to the possibility of a clock speed bump, one question that we should keep in mind is that for some reason that none of us have ever been able to point to outside of blaming it all on the game pad, the Wii U sells for a loss at $350. There's either something more deep than we know going on, or Nintendo are flaming idiots. Considering that none of us can make better than 60% sense or so of these chips, I keep an open mind.

As of now, my malleable opinion is that the Wii U is a console whose innards, and how they work together to produce the final on screen product were intentionally built in a way that not only allows it to 'punch far above its weight', but also to allow Nintendo flexibility in moving forward in ways that we can't really know yet.

Call it secret sauce, or whatever you want. But we should also keep it real. We aren't going to have 2 Ghz clock bumps. We aren't going to find some amazing, industry progressing secret within the secrets of these chips. But what we likely will find, are quite a few relatively little things that continue to feed into the idea that we've all seen become apparent. That this console is efficient beyond reason. And because of that, it will do a lot more than most people believe possible as of now. /my2cents
 
So where would this info place the performance estimates now?

The bare minimum estimate from earlier was X6 as powerful bearing that it was absolutely nothing more than 3 Broadways at a higher clock. Now taking other data into the equation, it has probable register count increase from 38 and 38 to 48 and 64. Then there is the possibility of it having better out of order functionality(I went back and checked and it wasn't aboslutely confirmed so I won't factor that in).

If the math on that was added to the baseline hypothesis,(let me get my calculator) that would give Espresso a X12.764 increase over Broadway in performance.

Now, if it is a better out of order design and taken into account that it has more cache, I would estimate ruffly an x15 increase in overall performance. Take this with a grain of salt of course.

Does anyone have the vital statistics for Broadway? I feel that there should more that was upgraded/augmented than that.

Also, there is a question I asked a while back that never got properly answered. Which would be more capable, or produce bettresults:
1. A single core that takes 3 instructions and sends out 2 per cycle(with out of order process handling)
2. A single core that hyper threads(in order)
How much different would there be in real world performance between them?
 
1. A single core that takes 3 instructions and sends out 2 per cycle(with out of order process handling)
2. A single core that hyper threads(in order)
How much different would there be in real world performance between them?

In Intels Silvermont next generation Atom architecture, they are trading away the die space used for hyperthreading for Out of Order processing. So for that at least, clearly they thought the performance was worth it. But that can't possibly answer your question as different cores benefit from hyperthreading to different degrees, the Pentium 4 only saw 20-30% boosts on very well threaded programs, but the current Core i7s can practically double performance as threads scale. All this depends on the architecture, caches, as well as main memory bandwidth.

Also your question doesn't say anything about how many instructions per cycle the second theoretical core could do, hyperthreading does not tell us anything on that.
 
In Intels Silvermont next generation Atom architecture, they are trading away the die space used for hyperthreading for Out of Order processing. So for that at least, clearly they thought the performance was worth it. But that can't possibly answer your question as different cores benefit from hyperthreading to different degrees, the Pentium 4 only saw 20-30% boosts on very well threaded programs, but the current Core i7s can practically double performance as threads scale. All this depends on the architecture, caches, as well as main memory bandwidth.

Also your question doesn't say anything about how many instructions per cycle the second theoretical core could do, hyperthreading does not tell us anything on that.

This is news to me. I always thought that all of Intel's CPUs had OoO execution since the Pentium Pro. I guess they cheapened out with the Atom; no wonder it's so bloody slow, they didn't bother to put in an OoO unit and used hyperthreading instead.
 
This is news to me. I always thought that all of Intel's CPUs had OoO execution since the Pentium Pro. I guess they cheapened out with the Atom; no wonder it's so bloody slow, they didn't bother to put in an OoO unit and used hyperthreading instead.

Yeah, Atom was the exception. Good read:

http://techreport.com/review/24767/the-next-atom-intel-silvermont-architecture-revealed

Within the scope of these limitations, Silvermont's architects have reached for a much higher performance target, especially for individual threads. The big news here is the move from the original Atom's in-order execution scheme to out-of-order execution. Going out-of-order adds some complexity, but it allows for more efficient scheduling and execution of instructions. Most big, modern CPU cores employ OoO execution, and newer low-power cores like AMD's Jaguar, ARM's Cortex-A15, and Qualcomm's Krait do, as well. Silvermont is joining the party. Belli Kuttanna, Intel Fellow and Silvermont chief architect, tells us the new architecture will achieve lower instruction latencies and higher throughput than the prior generation.

Interestingly, Silvermont tracks and executes only a single thread per core, doing away with symmetric multithreading (SMT)—or Hyper-Threading, in Intel's lingo. SMT helped the prior generations of Atom achieve relatively strong performance for an in-order architecture, but the resource sharing between threads can reduce per-thread throughput. Kuttanna says SMT and out-of-order execution have a similar cost in terms of die area, so the switch from SMT to OoO was evidently a fairly straightforward tradeoff.
 
This is news to me. I always thought that all of Intel's CPUs had OoO execution since the Pentium Pro. I guess they cheapened out with the Atom; no wonder it's so bloody slow, they didn't bother to put in an OoO unit and used hyperthreading instead.

Atom was based on a very ancient core IIRC.

Moving up to quad cores, among other architectural improvements in addition to OoO, should do wonders.
 
In Intels Silvermont next generation Atom architecture, they are trading away the die space used for hyperthreading for Out of Order processing. So for that at least, clearly they thought the performance was worth it. But that can't possibly answer your question as different cores benefit from hyperthreading to different degrees, the Pentium 4 only saw 20-30% boosts on very well threaded programs, but the current Core i7s can practically double performance as threads scale. All this depends on the architecture, caches, as well as main memory bandwidth.

Also your question doesn't say anything about how many instructions per cycle the second theoretical core could do, hyperthreading does not tell us anything on that.

I'm only talking about single cores. I'm trying to keep it as simple and specific as possible. Then I intend to build from there.

From what I can tell, two full processes should give better performance than hyperthreaded processes hertz for hertz, but i wanted to be sure.

What areas would an older PowerPC processor exceed a more modern Power processor in?
 
I'm only talking about single cores. I'm trying to keep it as simple and specific as possible. Then I intend to build from there.

From what I can tell, two full processes should give better performance than hyperthreaded processes hertz for hertz, but i wanted to be sure.

I know you were only talking about single cores, but you mentioned
3 instructions and sends out 2 per cycle
for the first core while giving no information on the second, hyperthreading tells us nothing about that. So you seem to be talking about two different things in your two different posts. The first was about instructions per second on a single core vs another core with hyperthreading, in the second you brought up a second core vs hyperthreading, so I'm not sure which you were asking but I guess I answered both by now.

And yes, of course an equivalent physical core will always beat a hyperthreaded virtual thread.


EDIT Ah ok I misread, I see what you mean now. Yeah, I don't think they are directly comparable. Hyperthreading is independant of how many instructions per second the core can process, it helps fill up empty registers. More instructions per second would allow a faster core if they were all used well, but there may be bubbles in the pipeline which cause a slowdown which is what hyperthreading tries to address. They aren't competing things.
 
I know you were only talking about single cores, but you mentioned for the first core while giving no information on the second, hyperthreading tells us nothing about that. So you seem to be talking about two different things in your two different posts. The first was about instructions per second on a single core vs another core with hyperthreading, in the second you brought up a second core vs hyperthreading, so I'm not sure which you were asking but I guess I answered both by now.

And yes, of course an equivalent physical core will always beat a hyperthreaded virtual thread.


EDIT Ah ok I misread, I see what you mean now. Yeah, I don't think they are directly comparable. Hyperthreading is independant of how many instructions per second the core can process, it helps fill up empty registers. More instructions per second would allow a faster core if they were all used well, but there may be bubbles in the pipeline which cause a slowdown which is what hyperthreading tries to address. They aren't competing things.

Wouldn't the out of order capabilities take care of bubbles in the pipeline?
 
Wouldn't the out of order capabilities take care of bubbles in the pipeline?

Out of Order would still be working on a single thread. It doesn't load up information from another thread in order to quickly switch over whenever one thread stalls. It helps by rearranging tasks within one thread.

And for your earlier question, a processor being able to fetch three instructions per cycle isn't the same as having separate processes running at once. That model is still working on one thread. So asking if more instructions per cycle or hyperthreading is better is kind of apples to oranges.

In any case I'm not sure what a lengthy discussion of this is for, the Wii U CPU is 3 cores 3 threads (and if I'm not mistaken the Wii U CPU cores aka the 750 fetch up to four instructions per cycle into its six-entry instruction queue, and it dispatches up to two non-branch instructions per cycle from the IQ's two bottom entries. ).
 
Out of Order would still be working on a single thread. It doesn't load up information from another thread in order to quickly switch over whenever one thread stalls. It helps by rearranging tasks within one thread.

And for your earlier question, a processor being able to fetch three instructions per cycle isn't the same as having separate processes running at once. That model is still working on one thread. So asking if more instructions per cycle or hyperthreading is better is kind of apples to oranges.

In any case I'm not sure what a lengthy discussion of this is for, the Wii U CPU is 3 cores 3 threads (and if I'm not mistaken the Wii U CPU cores aka the 750 fetch up to four instructions per cycle into its six-entry instruction queue, and it dispatches up to two non-branch instructions per cycle from the IQ's two bottom entries. ).

Ah, thanks. I was referring to the basic PPC750, but its interesting to know that it fetches 4 for the Wii U CPU.

From what I've analyzed to far, Expresso should in fact be able to produce marginally better results than the Xenon in a lot of areas especially when you take things like the DSP and the extra ARMS coprocessors into account. I guess that the reasons for the performance issues is that most game code and game engines are optimized for the strengths of the Power6 generation processors.
 
Ah, thanks. I was referring to the basic PPC750 but its interesting to that is 4 for the Wii U CPUs.

It's 4 for the basic PPC 750, and if we're believing Marcan and other Wii U hackers the Wii U CPU is unchanged from the 750 front end (fetchers etc) at least (and I'm not discounting other internal changes to it, but the instruction fetch appears unchanged).

http://arstechnica.com/features/2004/10/ppc-2/

That whole series is a great read for uber geeks like me and everyone in this thread :)
They also have a follow up article on the newer PowerPC 970, aka the G5 in the Power Macs.
 
It's 4 for the basic PPC 750, and if we're believing Marcan and other Wii U hackers the Wii U CPU is unchanged from the 750 front end (fetchers etc) at least (and I'm not discounting other internal changes to it, but the instruction fetch appears unchanged).

http://arstechnica.com/features/2004/10/ppc-2/

That whole series is a great read for uber geeks like me and everyone in this thread :)
They also have a follow up article on the newer PowerPC 970, aka the G5 in the Power Macs.

Excellent, and I see now. Expresso's strong piont is integers, where the the Xenon/Cell's (especially the Cell) strong point is floating points. Now it makes more sense why early PS3/360 ports to the Wii U had such performance issues. A game optimized for the 360/PS3 would naturally have a problem with running on hardware with a different focus even if potentially more capable.
 
Excellent, and I see now. Expresso's strong piont is integers, where the the Xenon/Cell's (especially the Cell) strong point is floating points. Now it makes more sense why early PS3/360 ports to the Wii U had such performance issues. A game optimized for the 360/PS3 would naturally have a problem with running on hardware with a different focus even if potentially more capable.

No, the problem is integer performance is usually far less important than floating point performance for the way modern games actually work. They can't just swap in integer code for the floating point stuff to "optimize" for WiiU. Espresso may be as much as 40% faster than Xenon executing integer heavy gameplay code, but when real games only do one integer operation for every 10 floating point calculations the WiiU is still at an enormous disadvantage.
 
No, the problem is integer performance is usually far less important than floating point performance for the way modern games actually work. They can't just swap in integer code for the floating point stuff to "optimize" for WiiU. Espresso may be as much as 40% faster than Xenon executing integer heavy gameplay code, but when real games only do one integer operation for every 10 floating point calculations the WiiU is still at an enormous disadvantage.

The key statement is "for the way modern games actually work". This is what I was getting at when I said they were made to the PS3/360's strengths.

The entire approach to the game must be done a different way to get the most out of Expresso, much in the same way that a different approach had to be taken to get the Wii/GC TEVs to produce normal mapping compared to how it was done with shader modal 1.1.

The strength and capability is there, it just must be properly utilized.
 
Top Bottom