Support NeoGAF

wsippel · May 28, 2013

krizzx said:
I was unaware that you had documentation on the Wii U that states this. Care to share it with the rest of the class? Please explain to me why it wouldn't work on a system like the Wii U.

You really don't need the documentation. Just ask yourself: Where is the memory controller?

On POWER systems, the MC is on the CPU. On Wii U, it's on the GPU. Because most of the data stored in RAM is meant for the GPU and never touched by the CPU. It would make no sense whatsoever to decompress data meant for the GPU on the CPU even if the CPU could basically do it for free, because then you'd need to shovel everything through the slow 60x bus - twice.

lostinblue · May 28, 2013

wsippel said:
You really don't need the documentation. Just ask yourself: Where is the memory controller?

On POWER systems, the MC is on the CPU. On Wii U, it's on the GPU. Because most of the data stored in RAM is meant for the GPU and never touched by the CPU. It would make no sense whatsoever to decompress data meant for the GPU on the CPU even if the CPU could basically do it for free, because then you'd need to shovel everything through the slow 60x bus - twice.

Not to mention GPU's support compressed data and have done so for aeons; the issue back then was how CPU's couldn't handle GPU compressed data and because Gekko was supposed to do vertex manipulation that condition could hamper the whole rendering pipeline results.

As recently as X360, the issue with the tesselation unit being usable or not had to do with how it didn't support vertex compression; meaning geometry would have to run uncompressed from the get go and it would only bloat it on top; not good at all for a rendering pipeline. Meanwhile the rest of the GPU obviously supported it and used it.

New compression methods/formats being present in the SIMD implementation would make sense and could come in handy, IBM went further a few years back with VMX128 on the X360, and I'd like to see them doing something along those lines for this, because stuff changed from 1999 to 2013 and we'd like to have a CPU reflecting that rather than just having a now incomplete ancient implementation and either have developers avoiding it or conforming to it. I mean Zlib compress/decompress is now a feature on PS4, and was a popular thing to run this gen on the SPE's, it's certainly a good thing to have running somewhere in the architecture (I'm not saying that the CPU ought to do that job for others, but supporting it for itself could save RAM nonetheless).

tipoo · May 28, 2013

lostinblue said:
I'm getting tired of the whole "they must have re-invented the wheel" and clinging onto theories that would mean big changes for the CPU or whatever; someone suggested it, it has merit as an academic assumption of whatever; but let's not jump on that bandwagon unless someone comes out and corroborates it (and no, an ambiguous post by two tribes on twitter doesn't count).

I did just throw it out there as another wild theory, good point that the MC is on the GPU. The Power7+ compression block could in theory be moved to a GPU MC but that's again flailing around in the dark.

lostinblue · May 28, 2013

tipoo said:
I did just throw it out there as another wild theory, good point that the MC is on the GPU. The Power7+ compression block could in theory be moved to a GPU MC but that's again flailing around in the dark.

I have nothing against wild theories, they're fun to throw around and as a means to look outside the box of normal things to happen, sometime they'll be useful and good thing we talked about them prior. This is a speculation thread after all.

What I find silly is when all of a sudden, someone throws a wild theory and that theory gains traction in the lines of "yeah, that must be it, secret sauce oh yeah" and then runs about as a fact for some time despite we having nothing to corroborate it (or something as vaguely unrelated as a two tribes twitter comment), like you said... a wild theory. I feel some times it's like some of us are desperately grasping for straws here.

Regarding the compression block I somehow suspect PS4 is doing the Zlib compress/decompress on the fly on the embedded APU's ARM Cortex A5 (should have one ARM embedded by CPU block, so that could amount to as much as 4 ARM cores in the 8-core jaguar spec), of course I don't know the feasibility of it performing those on the fly, but on the AMD steamroller line it's there for trusted code purposes; and 4 of them performing that on the PS4 seems overkill.

Nintendo was pretty much the first embedding an ARM cpu for those purposes (security+input/output) on top of a different architecture, they did so on the Wii's Hollywood with starlet and are no doubt pulling the same thing here with the Starbuck, even the sound processing unit might be ARM-based, so who knows (I dunno, I've long gave up on analysing cpu/gpu dies) if they didn't manage to sneak in another core for that purpose.

Would make more sense than transposing a IBM PPC part to an AMD part; see, there would be a conflict of interest there, I think AMD doesn't want IBM to go near their design, nor does IBM awnt to share inner workings of new tech/implementations with them.

joesiv · May 28, 2013

tipoo said:
I did just throw it out there as another wild theory, good point that the MC is on the GPU. The Power7+ compression block could in theory be moved to a GPU MC but that's again flailing around in the dark.

In my opinion, it's unlikely. Far as I know, Nintendo works with IBM for CPU, Nintendo works with AMD/ATI for GPU. in this case they put the separate chips on the same MCM. It's a big stretch to think that IBM would give it's Intellectual property to AMD/ATI.

frankie_baby · May 28, 2013

joesiv said:
In my opinion, it's unlikely. Far as I know, Nintendo works with IBM for CPU, Nintendo works with AMD/ATI for GPU. in this case they put the separate chips on the same MCM. It's a big stretch to think that IBM would give it's Intellectual property to AMD/ATI.

not that i think the theory is likely at all but isn't nintendo like IBM's biggest customer so if anyone could get special treatment its them

krizzx · May 28, 2013

lostinblue said:
I'm getting tired of the whole "they must have re-invented the wheel" and clinging onto theories that would mean big changes for the CPU or whatever; someone suggested it, it has merit as an academic assumption of whatever; but let's not jump on that bandwagon unless someone comes out and corroborates it (and no, an ambiguous post by two tribes on twitter doesn't count). Usually the simplest explanation or lack of thereoff is indicative of the implementation; that's simply how it is. Also bare in mind Nintendo is not in it to be a test subject hence the withered technology philosophy; they'll go either for easy things implemented on top or well tested ones/no margin for error.

Like Azak said, they could have uncovered new gpu features that are not all that new for PC or the OpenGL implementation but are new for consoles, like BC6H (the most likely one, IMO), ASTC, ETC2. Hell, it could be down to compiling keychain/previewing code part of the SKU for all we know; for instance FFXII team had a tool to test assets in the engine at real-time so they could decide wether they were using too much resolution for the textures or not.

Don't forget these dudes started or tested the game at 1080p, then went down, I'm sure textures were at some point too good for 720p; shaving that resolution a little would be an impossible to notice at 720p; hence a target for compression; perhaps the "feature" they found was something that assisted them in doing just that.

I mean whatever; not some crazy implementation that would make espresso a Power7+ prototype transplant patient; most likely the only thing from Power7 it has is the eDRAM and let's suppose nothing else unless there's some insider information going for it.

Hell, they could have simply discovered said CPU/GPU data compression and optimization for it on the toolchain, as one would otherwise need to optimize for it. That could surely save 100 MB.

You are taking the analysis out of context.

No one was suggesting that Espresso "is" a Power7. We were suggesting that it had some logic or functionality added from the Power7.

lostinblue · May 28, 2013

krizzx said:
You are taking the analysis out of context.

No one was suggesting that Espresso "is" a Power7. We were suggesting that it had some logic or functionality added from the Power7.

The context you think I'm coming from, is not the context I'm coming from.

I understood just right and it was just my opinion regarding the matter; we're not here to do some crazy theory of why this hardware can punch above it's weight, for that we might as well just label secret sauce or invoke pikmins-on-silicon and start a cult. Stretching it too far is just that, it's good to have and voice those assumptions, as if a means to put them out there, but it's no good to "assume" or speculate they must be there based on a lousy twitter post.

An outlandish theory is just that until or unless something specifically corroborates it. I reiterate, I feel you're grasping for straws if you're clinging that much onto am option thrown into the air like this (and one that would probably mean major architecture changes gone unnoticed until now).

Focus on what's likely; outlandish theories are not gonna get proven (or proper footing) on a forum, so keep them in your distant backburner if you will (your call), but please don't keep it in the bandwagon of immediate suspections. I mean we laugh at shit like "software update puts the Wii U running at 3.2 GHz!", discuss that for 5 pages and perhaps some news site will pick it up and then we'll be no different, the dudes desperately looking for proof the console is way more advanced and powerful than everything points to.

The virtue is in the middle, it's not on the behemoth of a machine wishes nor the "worse than current-gen" soap opera.

StevieP · May 28, 2013

joesiv said:
In my opinion, it's unlikely. Far as I know, Nintendo works with IBM for CPU, Nintendo works with AMD/ATI for GPU. in this case they put the separate chips on the same MCM. It's a big stretch to think that IBM would give it's Intellectual property to AMD/ATI.

Not to throw any fuel to the fire (as this particular fire isn't burning very hot lol) but IBM and AMD had to work together to make the 360 slim APU.

krizzx · May 28, 2013

lostinblue said:
The context you think I'm coming from, is not the context I'm coming from.

I understood just right and it was just my opinion regarding the matter; we're not here to do some crazy theory of why this hardware can punch above it's weight, for that we might as well just label secret sauce or invoke pikmins-on-silicon. Stretching it too far is just that, it's good to have and voice those assumptions, as if a means to put them out there, but it's no good to "assume" or speculate they must be there based on a lousy twitter post.

An outlandish theory is just that until or unless something specifically corroborates it. I reiterate, I feel you're grasping for straws if you're clinging that much onto am option thrown into the air like this (and one that would probably mean major architecture changes gone unnoticed until now).

Focus on what's likely; outlandish theories are not gonna get proven (or proper footing) on a forum, so keep them in your distant backburner if you will, but please don't keep it in the bandwagon of immediate suspections.

That is not what we are trying to do either, once again. We are trying to explain the new hardware technique that Two Tribes spoke about as well as the statement that the Wii U is using watsons brain.

That would solve both question. The CPU could still be 95% the same and just have a single feature from the Power7.

tipoo · May 28, 2013

frankie_baby said:
not that i think the theory is likely at all but isn't nintendo like IBM's biggest customer so if anyone could get special treatment its them

Not even close to the biggest, IBM is worth 228 billion, the Nintendo-IBM contract was 1 billion dollars spread across 3 consoles. Just because you don't see IBM much on the consumer side doesn't mean they have pretty huge sales elsewhere.

Fourth Storm · May 28, 2013

Just to throw this out there, if anyone knows of a free image host that does huge files or would volunteer to host it themselves, we can get a higher quality image in the OP. Sorry, I've been meaning to correct this for a while. The original .tif is ~9MB, but I got it down to a 6MB png.

krizzx · May 28, 2013

What about imgshack?

lostinblue · May 28, 2013

krizzx said:
That is not what we are trying to do either, once again. We are trying to explain the new hardware technique that Two Tribes spoke about as well as the statement that the Wii U is using watsons brain.

And you're looking under the wrong bed.

You realize the "Watson Brain" was extrapolated from "packs the same processor technology found in Watson" which is probably down to eDRAM.

In short, not an actual quote, just bad journalism making sensationalist headlines over nda-ridden unreleased hardware. And you're focusing on it like inspector gadget barking to the wrong tree. Go go baseless assumption!

Holds no weight, and doesn't make much sense knowing what we know now.

krizzx said:
That would solve both question. The CPU could still be 95% the same and just have a single feature from the Power7.

Because a feature that you think can be heralded as "brains" to a CPU can be implemented in such a roundabout way.

Why didn't they implement 256 bit floating point while they were at it? Oh right, perhaps they did. Please carry on.

Fourth Storm said:
Just to throw this out there, if anyone knows of a free image host that does huge files or would volunteer to host it themselves, we can get a higher quality image in the OP. Sorry, I've been meaning to correct this for a while. The original .tif is ~9MB, but I got it down to a 6MB png.

Can I ask for it?

I'm used to compressing stuff for web, perhaps I can pull something. It has to fit under how many megs?

joesiv · May 28, 2013

StevieP said:
Not to throw any fuel to the fire (as this particular fire isn't burning very hot lol) but IBM and AMD had to work together to make the 360 slim APU.

that would be news to me, I thought the Slim was still a separate package. Unless there is a newer revision of the slim?
http://www.pcper.com/reviews/General-Tech/New-Xbox-360-S-Slim-Teardown-Opened-and-Tested

Fourth Storm said:
Just to throw this out there, if anyone knows of a free image host that does huge files or would volunteer to host it themselves, we can get a higher quality image in the OP. Sorry, I've been meaning to correct this for a while. The original .tif is ~9MB, but I got it down to a 6MB png.

I can host it if you want. I'll send you a PM.

tipoo · May 28, 2013

joesiv said:
that would be news to me, I thought the Slim was still a separate package. Unless there is a newer revision of the slim?
http://www.pcper.com/reviews/General-Tech/New-Xbox-360-S-Slim-Teardown-Opened-and-Tested

I can host it if you want. I'll send you a PM.

You are reading the teardown part of the fat model, lol. Next page, "enter the new guy"

Fourth Storm · May 28, 2013

Put up a link to the uncompressed .tif in the OP. The file automatically downloads once you click in my browser (Chrome). Thanks joesiv, for hosting!

joesiv · May 28, 2013

tipoo said:
You are reading the teardown part of the fat model, lol. Next page, "enter the new guy"

He he he, whoops, thanks for the correction.

Schnozberry · May 28, 2013

krizzx said:
That is not what we are trying to do either, once again. We are trying to explain the new hardware technique that Two Tribes spoke about as well as the statement that the Wii U is using watsons brain.

That would solve both question. The CPU could still be 95% the same and just have a single feature from the Power7.

I believe the only feature it could share with Power7 is L2 EDRAM.

Wouldn't the two tribes statement be GPU related, since they were talking about texture compression?

krizzx · May 28, 2013

Schnozberry said:
I believe the only feature it could share with Power7 is L2 EDRAM.

Wouldn't the two tribes statement be GPU related, since they were talking about texture compression?

Yeah, that is more than likely. I was just throwing that out there as possibility since the Power7 compression was brought up.

Though, I have to ask. What is the significance of having Power7 EDRAM? How much difference does that make for performance as compared to the Wii CPU?

EDIT: I just remembered a another question I have related to the CPU and graphics. Didn't Gekko work in conjunction with the Gamecube GPU to produce graphics? I know it was used to help calculate more advanced texture effects and I remember reading about it being able to help with geometry. Might the same features still be present in Espresso?

wsippel · May 28, 2013

Schnozberry said:
I believe the only feature it could share with Power7 is L2 EDRAM.

Wouldn't the two tribes statement be GPU related, since they were talking about texture compression?

L2 eDRAM is an A2 thing, POWER uses eDRAM as L3. In that regard, it probably makes more sense to look at what A2 does with its L2, as the cache controllers might be related. And the A2 cache controller introduced a few funky new features, like atomics and versioning.

Brad Grenz · May 28, 2013

They share an ISA which is the most rational interpretation of the comment in the first place.

Schnozberry · May 28, 2013

wsippel said:
L2 eDRAM is an A2 thing, POWER uses eDRAM as L3. In that regard, it probably makes more sense to look at what A2 does with its L2, as the cache controllers might be related. And the A2 cache controller introduced a few funky new features, like atomics and versioning.

Ok, my mistake. It would be interesting to see if the controllers are layer out in a similar fashion. I'll have to dig around and look for an A2 die shot.

lostinblue · May 28, 2013

krizzx said:
What is the significance of having Power7 EDRAM? How much difference does that make for performance as compared to the Wii CPU?

Objectively, not much, should operate largely the same 99% of the time.

Other than that, this:

Embedded DRAM is an alternative approach to storage arrays, proposed as a replacement for extremely large SRAM arrays. Rather than using 6 or 8 transistors to store each bit, eDRAM cells rely on a capacitor and a single access transistor. It is denser than SRAM, substantially more resilient to soft errors (SER) due to the capacitor and smaller collector area. The overall arrays are roughly 2-4X denser than SRAM (cell size is roughly 4-6X smaller), with 2-3 orders of magnitude improvement in SER. Additionally, there is a slight decrease in active power and a substantial drop in standby power.

(...)

Comparing IBM’s eDRAM in the POWER7 to comparable SRAMs from Intel yields a roughly 2X density advantage at the same node. Equivalently, IBM’s 45nm eDRAM slightly exceeds the density of Intel’s 32nm SRAM. Based on the results demonstrated and IBM comments, the overall array area should scale by 60% at 32nm. This suggests that IBM can expect roughly a 2X advantage for their storage arrays.

Source: http://www.realworldtech.com/iedm-2010/3/

Density is higher on eDRAM, meaning at 45nm it takes the same space as SRAM at 32 nm's does, it's also simpler in operation so it is cheaper. It also uses less energy and cache coherence is better which avoids cache misses in a 2:1 or 3:1 ratio compared to SRAM; this could improve DMIPS/MHz a little; always under 2.8 DMIPS/MHz though.

krizzx said:
I just remembered a another question I have related to the CPU and graphics. Didn't Gekko work in conjunction with the Gamecube GPU to produce graphics? I know it was used to help calculate more advanced texture effects and I remember reading about it being able to help with geometry. Might the same features still be present in Espresso?

The same features are certainly in Espresso or Wii backwards compatibility wouldn't be there/said functions had to be taken over by something else (and since they kept the same architecture said options don't make sense).

That said, GC CPU never helped calculate advanced texture effects; I believe you're confusing stuff it has been said Factor 5 did, like manipulating the TEV pipeline via the GPU ISA via assembly on the CPU. that wasn't common-place at all, and even then they weren't manipulating images on the CPU.

What it did help with though, was vertex calculations, seeing the GC didn't have a Vertex shader unit, T&L was fixed function meaning stuff like skinning was done on the CPU. Details here; the CPU was designed for that/around that need, hence the SIMD compression support including compressed vertex data being able to be shared between CPU and GPU.

krizzx · May 29, 2013

lostinblue said:
Objectively, not much, should operate largely the same 99% of the time.

Other than that, this:

Source: http://www.realworldtech.com/iedm-2010/3/

Density is higher on eDRAM, meaning at 45nm it takes the same space as SRAM at 32 nm's does, it's also simpler in operation so it is cheaper. It also uses less energy and cache coherence is better which avoids cache misses in a 2:1 or 3:1 ratio compared to SRAM; this could improve DMIPS/MHz a little; always under 2.8 DMIPS/MHz though.The same features are certainly in Espresso or Wii backwards compatibility wouldn't be there/said functions had to be taken over by something else (and since they kept the same architecture said options don't make sense).

That said, GC CPU never helped calculate advanced texture effects; I believe you're confusing stuff it has been said Factor 5 did, like manipulating the TEV pipeline via the GPU ISA via assembly on the CPU. that wasn't common-place at all, and even then they weren't manipulating images on the CPU.

What it did help with though, was vertex calculations, seeing the GC didn't have a Vertex shader unit, T&L was fixed function meaning stuff like skinning was done on the CPU. Details here; the CPU was designed for that/around that need, hence the SIMD compression support including compressed vertex data being able to be shared between CPU and GPU.

Alright, so would this feature be of any benefit to the Wii U GPU?

lostinblue · May 29, 2013

krizzx said:
Alright, so would this feature be of any benefit to the Wii U GPU?

If you still want to make the CPU meddle with GPU resources/shared data then possibly yes; but the original need to do so is pretty much gone; which is good because even if the CPU was clocked the same and not multicore it would mean more free resources just from the fact it's not doing vertex manipulation anymore. It's a good thing, GC was doing it like that in order to overcompensate on what would be a bottleneck otherwise.

Like someone pointed out, the main memory bus is integrated onto the GPU and meant to be accessed mostly by it, that means CPU is more cycles away from both MEM1 and MEM2 banks (albeit the MCM configuration reduces this) and it has to access it from it's own 60x bus (and feed it back through that).

This is not necessarily bad, cpu's usually don't need a lot of bandwidth compared to GPU's, most of them have capped access to memory and that's fine; but that also means you won't see it pulling crap like CELL and it's SPE's did like MLAA on CPU; which was never optimal due to latency and having to move stuff around to start with. These cpu's lack both the bandwidth (and dedicated RAM, that CELL had) and the floating point performance to do so.

Since bandwidth was never huge, data compression helped a lot on the GC days and would to this day providing support for compressed format manipulation has been upgraded in order to support newer implementations, to illustrate my point, X360 CPU doesn't do vertex manipulation normally (for it has Vertex Units) but it still features VMX128, which is a successor to Gekko's "50 SIMD instructions" meant for 3D Graphics acceleration; it's useful, seeing it's a game console it's only natural the CPU has to use/manipulate or feed data from the GPU from time to time, and the fact it can be compressed or accelerated somehow is always useful.

Current useful uses could be animation streaming, procedural dynamic animation (like the one used in Uncharted games), AI or physics providing they run on the CPU; and of course, they stated the GPGPU is supposed to help the CPU, so they ought to work well together.

Backtracking a little, the benefits of eDRAM are that 3 MB (512 KB+2048 KB+512 KB) on a short pipeline architecture is huge; and it means CPU has to depend a lot less on it's bus to access memory in a timely maner, that makes it more independent from the pool of RAM, provides a nice ecosystem if you will.

blu · May 29, 2013

lostinblue said:
Current useful uses could be animation streaming, procedural dynamic animation (like the one used in Uncharted games), AI or physics providing they run on the CPU; and of course, they stated the GPGPU is supposed to help the CPU, so they ought to work well together.

I missed the exact context of the above - whether you're referring to float/int compression (aka quantization in Gekko/IBM terms) or to what FP on the CPU would still be used for, but if the latter, FP on the CPU still has its fair share of usage; the bulk of the high-throughput loads of streaming nature move toward the GPGPU, but there's a considerable class of latency-sensitive FP workloads that will stay on the CPU for the foreseeable future. Among those are:

* Skeletal animations - traversing trees an collecting local transformations
* Keyframed animations - traversing key-frame tracks and seeking keys
* IK - sort of like the complementary process to the 1st point above, but an order of magnitude more complex.

Basically, any task that involves a fancy data structure traversal with some moderate per-node FP workload, rinse, repeat, will likely be as latency-sensitive as it is FP-throughout sensitive. So CPUs are still the place to do those at.

lostinblue · May 29, 2013

blu said:
lostinblue said:

Current useful uses could be animation streaming, procedural dynamic animation (like the one used in Uncharted games), AI or physics providing they run on the CPU; and of course, they stated the GPGPU is supposed to help the CPU, so they ought to work well together.

Click to expand...

I missed the exact context of the above - whether you're referring to float/int compression (aka quantization in Gekko/IBM terms) or to what FP on the CPU would still be used for, but if the latter, FP on the CPU still has its fair share of usage; the bulk of the high-throughput loads of streaming nature move toward the GPGPU, but there's a considerable class of latency-sensitive FP workloads that will stay on the CPU for the foreseeable future.

The post was mostly about compression in the pipeline via SIMD helping out somewhat if needed to share data/write it back to the main RAM bank, then along the way and in the specific part you quoted I was hypothesizng what's still likely gonna be CPU-territory, or viable to do on it.

Regarding the GPGPU part, it was regarding compression again (in my head/line of thought), that it could help in that case if both supported the same compressed formats, but that was speculation.

blu said:
Among those are:

* Skeletal animations - traversing trees an collecting local transformations
* Keyframed animations - traversing key-frame tracks and seeking keys
* IK - sort of like the complementary process to the 1st point above, but an order of magnitude more complex.

I feel clever, I specifically thought about all those yesterday

blu said:
Basically, any task that involves a fancy data structure traversal with some moderate per-node FP workload, rinse, repeat, will likely be as latency-sensitive as it is FP-throughout sensitive. So CPUs are still the place to do those at.

Yes, it does make a lot of sense.

krizzx · May 30, 2013

Now that we have some more fuel, lets us continue.

Shin'en said:
“The Wii U GPU is several generations ahead of the current gen. It allows many things that were not possible on consoles before. If you develop for Wii U you have to take advantage of these possibilities, otherwise your performance is of course limited. Also your engine layout needs to be different. You need to take advantage of the large shared memory of the Wii U, the huge and very fast EDRAM section and the big CPU caches in the cores. Especially the workings of the CPU caches are very important to master. Otherwise you can lose a magnitude of power for cache relevant parts of your code. In the end the Wii U specs fit perfectly together and make a very efficient console when used right.”

I want to focus on the big caches in the core.

Let us take one core into example here and put it against Broadway. What type of performance increase are we seeing?

Also, though I'm sure this was answered in the long, long ago, can someone refresh my memory of the advantages of having the CPU/GPU on the board? I was think about the Gekko feature to help would vertices and other things. Having the chips linked together would make the performance for that even better.

You would be able to pull off graphical tricks and techniques that would be impossible otherwise without combination.

tipoo · May 30, 2013

krizzx said:
You would be able to pull off graphical tricks and techniques that would be impossible otherwise without combination.

More than across a PCI-E bus for example to be sure, or even the far separated chips in the PS3/360 (and the 360 s with one chip still has an artificial latency barrier to keep 100% compatibility). However, a single package is in turn much different from a single APU with HUMA.

http://www.tomshardware.com/news/AMD-HSA-hUMA-APU,22324.html

I'm not sure if this is known yet, someone refresh my memory. The 360 had "unified" memory, but unlike todays APUs the GPU had its address space and the CPU had its address space, they were unified but still separate. The point of HUMA is that both the CPU and GPU can simply use pointers to use the data either is using with no swapping of any sort. Would that be impossible without an APU? Single package isn't single chip, of course.

So there's unified and there's really unified, which does the Wii U use?

blu · May 30, 2013

tipoo said:
More than across a PCI-E bus for example to be sure, or even the far separated chips in the PS3/360 (and the 360 s with one chip still has an artificial latency barrier to keep 100% compatibility). However, a single package is in turn much different from a single APU with HUMA.

http://www.tomshardware.com/news/AMD-HSA-hUMA-APU,22324.html

I'm not sure if this is known yet, someone refresh my memory. The 360 had "unified" memory, but unlike todays APUs the GPU had its address space and the CPU had its address space, they were unified but still separate. The point of HUMA is that both the CPU and GPU can simply use pointers to use the data either is using with no swapping of any sort. Would that be impossible without an APU? Single package isn't single chip, of course.

So there's unified and there's really unified, which does the Wii U use?

As long as the GPU has a MMU which understands CPU mem pages, everything is a matter of GPU aperture - i.e. what part of the UMA memory can the GPU MMU see. If it's the entire memory then APU or no APU makes no difference in terms of addressability. What HMA adds to the picture is caches coherence between the CPU and GPU. How much Latte does there is not known. What we do know is that both Latte and Espresso have access to a very low-latency, high-bandwidth pool (MEM1).

lightchris · May 30, 2013

krizzx said:
I want to focus on the big caches in the core.

Let us take one core into example here and put it against Broadway. What type of performance increase are we seeing?

Not easy to say as it will depend on the game and the architecture, but I searched for some benchmarks. Here's what I found: http://www.nordichardware.com/CPU-Chipset/intel-core-2-duo-performance-l2-cache/Benchmark-Games.html

The only difference between these Core 2s are the cache sizes. What we see is on average a 9% increase in gaming performance when the cache size is doubled. In a console environment where you can try optimizing for a specific cache size it might even be a little more.

So, compared to Broadway, the Espresso cores with 512 KB cache could be around 10% faster, and the one with 2 MB cache could be around 30% faster (keep in mind, this is only a rough estimate!).

prag16 · May 30, 2013

Please understand, memory intensive design, bitches! xD

lwilliams3 · May 31, 2013

lightchris said:
Not easy to say as it will depend on the game and the architecture, but I searched for some benchmarks. Here's what I found: http://www.nordichardware.com/CPU-Chipset/intel-core-2-duo-performance-l2-cache/Benchmark-Games.html

The only difference between these Core 2s are the cache sizes. What we see is on average a 9% increase in gaming performance when the cache size is doubled. In a console environment where you can try optimizing for a specific cache size it might even be a little more.

So, compared to Broadway, the Espresso cores with 512 KB cache could be around 10% faster, and the one with 2 MB cache could be around 30% faster (keep in mind, this is only a rough estimate!).

It is interesting to note that if you add up all three cores together with those performance increases in the respective cores, you will get almost exactly 6x the performance of Broadway. This is just a very rough estimate, though, and no other modifications were calculated into that.

tipoo · May 31, 2013

Regarding cache scaling, point of interest:

http://www.tomshardware.com/reviews/cache-size-matter,1709-5.html

Even in open systems like PCs, cache does seem to make a big difference.

It's interesting though that the L2 in Espresso is divvied up by core with fixed amounts, as that means no data sharing unlike any modern uArch I know of.

krizzx · Jun 20, 2013

tipoo said:
Regarding cache scaling, point of interest:

http://www.tomshardware.com/reviews/cache-size-matter,1709-5.html

Even in open systems like PCs, cache does seem to make a big difference.

It's interesting though that the L2 in Espresso is divvied up by core with fixed amounts, as that means no data sharing unlike any modern uArch I know of.

Yes, I was wondering about that. Why are the cache sizes different. There must be some special reason why core 0 has the more than the other. Perhaps some feature it has that we are unaware of.

Disorientator · Jun 21, 2013

Brad Grenz · Jun 22, 2013

Sounds like an OS issue.

disap.ed · Jun 22, 2013

Does that mean the OS doesn't allocate threads automatically to the cores with free CPU cycles? How is that even possible in this day and age?

Brad Grenz · Jun 23, 2013

Nothing about the WiiU hardware or OS seem possible this day in age.

krizzx · Jul 15, 2013

There must be something more to this larger CPU caches vs performance when going by what Shin'en said.

In the GPU thread there was talk of the CPU not naturally delegating tasks to other cores and this requiring specific programming from the dev. Is it possible that this could be exploited to the CPU's advantage? I was thinking something kind of like a standard vs an automatic transmission.

Automatic is easier to use standard gives better performance.

What are the pros and cons of this functionality that aren't immediately noticeable?

BaBaRaRa · Jul 15, 2013

Most software libraries to enable easy threading across cores expect all cores to be the same (see Apple's GCD).

WiiU's core1, with it's much larger cache, means that the cores are now asymmetrical. It could simply be a case of Nintendo not having decent libraries available to make use of asymmetrical cores, so leave it up to the developer. It would mean ignoring a large amount of CPU cache to approach the CPU symmetrically.

I doubt there's much more to it than an odd hardware decision having a knock on effect on the development tools.

ArchangelWest · Jul 19, 2013

krizzx said:
There must be something more to this larger CPU caches vs performance when going by what Shin'en said.

In the GPU thread there was talk of the CPU not naturally delegating tasks to other cores and this requiring specific programming from the dev. Is it possible that this could be exploited to the CPU's advantage? I was thinking something kind of like a standard vs an automatic transmission.

Automatic is easier to use standard gives better performance.

What are the pros and cons of this functionality that aren't immediately noticeable?

Today's interview made concrete that the CPU has direct access to the eDRAM also.

Fourth Storm · Jul 21, 2013

ArchangelWest said:
Today's interview made concrete that the CPU has direct access to the eDRAM also.

Yup, I think most of us have had a pretty good feeling this was true for a while, but it sure is nice to have confirmation.

The question, then, is at what speed? I haven't seen much speculation on the CPU/GPU interface, but actually I think we can take a good guess with the information we have.

1).The CPU's core clock is going to be a multiple of the FSB speed.
2) The 60x interface takes up alot more space on Latte/Espresso's dies compared to Hollywood/Broadway (just by eyeballing, it seems 2x as wide). This could very possibly facilitate a wider bus.

Now, in order for Espresso's 3 cores to actually be useful, they are going to need enough bandwidth. Let's look at Wii (which uses the same ratios as Gamecube, mind you). The FSB from CPU to GPU/North Bridge ran at the 1/3 the speed of the CPU (which also happened to be the speed of the GPU). 64 bits @ 243 Mhz and you get ~1.9 GB/s. We can take this as rate which Nintendo felt was a good fit for Broadway's performance.

Now, if we were to assume that Nintendo would want the same amount of bandwidth relative to the performance of Espresso, we would just multiply 64 bits x 414 (approx 1/3 the speed of Espresso) to get ~3.3 GB/s. But wait, there are 3 cores, each with their own bandwidth needs, so now we're at 9.9 GB/s. Simply put, they're not going to get those speeds on anything less than a 128-bit bus.

To wrap this up, if we assume Espresso's FSB is running at the same FSB/CPU core ratio as Broadway (3:1), then the CPU will be able to access either pool of RAM at 6.6 GB/s. However, Broadway's family of processors can do 2:1 FSB/core clock ratios as well. So, in that case, the FSB would be at 620 Mhz with a data rate of 9.9 GB/s, which would seem to be a perfect fit for the CPU's capabilities.

So in sum, it's most likely 1 of those 2 figures, but I think the higher one makes more sense.

ArchangelWest · Jul 21, 2013

Kind of cool I think, how at this point the CPU and GPU conversations are at a true intersection, with the 'glue/interweaving' as it were being the eDRAM

krizzx · Jul 23, 2013

ArchangelWest said:
Kind of cool I think, how at this point the CPU and GPU conversations are at a true intersection, with the 'glue/interweaving' as it were being the eDRAM

Praise be to Shin'en. They are the only truly great devs working on Nintendo hardware right now. If not for them, everyone would be eating out of the palm of EA's bitter hands. Though, apparently there are still a lot of people who prefer the taste of sour grapes.

Has anyone looked further into the registry increase analysis from earlier in the thread?

ArchangelWest · Jul 25, 2013

I am interested in hearing more about that too if anyone has anything to share.

krizzx · Jul 25, 2013

Here is a a quick analysis based on some of what I've gathered so far.

I believe it has been stated that Espresso is 6 stage(was this confirmed?)? up from 4 stage with broadway.

The CPU in the PS4/XbO are 21 stage I believe?

So as I'm seeing their performance:

Espresso : 3core, 1.24 Ghz, 6 stage

PS4 CPU/XboxOne CPU: 8core, 1.6 Ghz, 21 stage

Since the PS4/XboxOne CPUs has over 3 times as many stages, that makes data execution take a little over 3 time as long.

Comparing the CPU's using these facts, that would make the numbers 3.72/12.8 would make the PS4/One CPU 3.4 times as strong as is, but then you add in the fact that the they take over 3 times as long to complete as cycle(3.4/3), and you get about 1.15 times as strong?

From what I am getting from my analysis, the peak raw number performance for the XboxOne/PS4 CPUs really isn't much higher than it is for Espresso...

How many instruction does the XboxOne/PS4 CPU execute per core, per cycle? It is read4execute2 per core 4 Espresso if no changes have been made to the architecture.

starwarsfan541 · Jul 25, 2013

krizzx said:
Here is a a quick analysis based on some of what I've gathered so far.

I believe it has been stated that Espresso is 6 stage(was this confirmed?)? up from 4 stage with broadway.

The CPU in the PS4/XbO are 21 stage I believe?

So as I'm seeing their performance:

Espresso : 3core, 1.24 Ghz, 6 stage

PS4 CPU/XboxOne CPU: 8core, 1.6 Ghz, 21 stage

Since the PS4/XboxOne CPUs has over 3 times as many stages, that makes data execution take a little over 3 time as long.

Comparing the CPU's using these facts, that would make the numbers 3.72/12.8 would make the PS4/One CPU 3.4 times as strong as is, but then you add in the fact that the they take over 3 times as long to complete as cycle(3.4/3), and you get about 1.15 times as strong?

From what I am getting from my analysis, the theoretical peak performance for the XboxOne/PS4 CPUs really isn't much higher than it is for Espresso...

How many instruction does the XboxOne/PS4 CPU execute per core, per cycle? I know its 2 for the Wii U.

So theoretically, the PS4/ Xbone may not be as strong versus than the Wii U than many think?

krizzx · Jul 25, 2013

starwarsfan541 said:
So theoretically, the PS4/ Xbone may not be as strong versus than the Wii U than many think?

As far as raw numbers on paper are concerened, they may not be. There are still many other factors to take into account though, like floating point vs integer performance.

The PPC750's have extremely stronger integer performance, but middling floating point performance. Modern gaming seems to put all focus on floating point performance(FLOPS) these days with no regard given to integer performance. Then there are dozens of other components functions to take into account.

http://www.neogaf.com/forum/showpost.php?p=58036908&postcount=714(I put no stock in that bobcat theoretical analysis)

I wish there was a way to run benchmarking programs on Espresso.

Support NeoGAF

Wii U CPU |Espresso| Die Photo - Courtesy of Chipworks

Banned

Banned

Banned

Banned

Member

Member

Junior Member

Banned

Banned

Junior Member

Banned

Member

Junior Member

Banned

Member

Banned

Member

Member

Member

Junior Member

Banned

Member

Member

Banned

Junior Member

Banned

Wants the largest console games publisher to avoid Nintendo's platforms.

Banned

Junior Member

Banned

Wants the largest console games publisher to avoid Nintendo's platforms.

Member

Banned

Member

Banned

Junior Member

Member

Member

Member

Member

Junior Member

Member

Member

Member

Member

Junior Member

Member

Junior Member

Banned

Junior Member

Similar threads