VGLeaks: Durango's Move Engines

It's just speculation, but there has to be a reason why MS would be insisting on standard DX API's.

They didn't make a lot of 360 games available on their games for windows live platform...If anything, I think MS has been quite grudging (for obvious reasons) to allow major 360 games to come to PC. Can't see that changing much.
 
They didn't make a lot of 360 games available on their games for windows live platform...If anything, I think MS has been quite grudging (for obvious reasons) to allow major 360 games to come to PC. Can't see that changing much.

This is true but with everything were learning Durango and a Windwos 8 PC's are FAR FAR FAR more alike than Xbox 360 and a Windows 7 PC's


I know this\/ is more about arcade\live ect play there's no doubt something like this is cooking up at MS they've been eluding to it for awhile now.

Microsoft-Studios-PLAY.jpg
 
Yea I'm really curious to get developer feedback specifically on Durango,

Also if it's true that Microsoft is forcing standard DX API libraries I don't see any reason why Durango games wouldn't be playable(with scaling) in some form on even mid range(2014) desktop PC or higher end laptops.

Imagine Durango games being playable on Windows 8 PC(and iterative console hardware ie Durango 2.0 in 2015) , THIS would be the megaton announcement everyone is waiting for.

That would be a pretty big deal if you bought the game on the XBOX and it just worked on the Windows 8 machine as well. It offers its own difficulties but nothing insurmountable. However I am more interested to see if the streaming stuff from the power-point makes its way into announcements.

Is this what they were nick naming the blitter? I was wondering about that.

No the blitter started on B3D and made its way here. These are not blitters in the same sense.
 
SDF are out in force in this thread


Who cares just ignore them, I'm soooo over the whole "power argument" by now.....honestly for me overall the Durango is allot more interesting from a current and future design standpoint and potential outside of gaming.
 
So a typical copy from system RAM to eSRAM would be the 68GB/s (system RAM ,the bottleneck). But the move engines can only do 25.6GB/s peak, but in parallel. I wonder how many parallel copies they are expecting to do to make up for that low peak bandwidth. I'm guessing they are hoping for 4+ at a time (102.4GB/s). It seems they might have shifted the burden of squeezing the frame buffer down into 10MB or eDRAM to parallelizing move engines to maintaining a steady stream of data and thus maintaining good frame rates.

We could see a reversal this gen. Durango devs start slow and get better and better with the system and thus an improvement as the gen goes on, versus a PS4 that is easy to squeeze out of the gate. It will bbe interesting to say the least.

I don't think so. The description is very careful to point out that total available bandwidth for all movers is 25.6GB/s. so if one is being used,it can move 25.6GB/s. if all four are used in parallel, they can still only transfer 25.6Gb/s between them.

No way to magically get 102Gb/s out of 68GB/s DDr3
 
I found this:

http://www.sisoftware.net/?d=qa&f=gpu_mem_latency

The interesting part is the two AMD GPUS, one is an APU with DDR3 and the other a GPU with GDDR5. I'm not sure how applicable it is, but at least there are cool plots and numbers.

Take the highway example posted some pages back.
And put Tollbooths(latency) on either sides of those highway.
If we can get numbers on the time it takes for a tollbooth to open we can maybe check how big of a impact low latency memory has. And what impact it those move engines can have on scheduling those moving trucks.
 
Take the highway example posted some pages back.
And put Tollbooths(latency) on either sides of those highway.
If we can get numbers on the time it takes for a tollbooth to open we can maybe check how big of a impact low latency memory has. And what impact it those move engines can have on scheduling those moving trucks.

Imagining tollbooths isn't going to get me real numbers. I'm interested in the overall latency of accessing data. I understand what latency is.
 
Correct me if wrong please. DDR3 has less latency than GDDR5, but GDDR5 has more bandwidth? So latency is like ping and bandwidth is how wide the load is.

In online gaming low latency is king - small loads that are time critical. Large bandwidth a secondary factor, and certainly irrelevant over a certain threshold unless its saturated by an entire household.

So please humor me, xbox3 with lower latency Ram, supplemented further with the data move engine and the esram would make this a super fast combo for chunks of data.

The move engine and the esram seem to be tailor made for slicing and dicing things into smaller slivers. Is this Ninja like approach not superior to a slower, but heaftier Sumo arriving at the GPU's door?

Rule of thumb:

CPUs love low latencies (hence the big cache pools), GPU's love bandwidth (faster RAM chips and bigger buses).

I dont find the welcomed analogy that good (Inflated mphs for Orbis aside). There are times where you need speed, and times where you need load capacity. Question here is if ESRAM is enough to do the job or not.

As far as Orbis looks like the safer proposal, 4GB GDDR5 might be overkill and unnecesary raising costs. The same can be said for the 8GB RAM. 3rd partys will take more advantage of faster RAM than capacity, once they start to cut to fit into the 4GB limit size. Just as happened with 360 not taking barely any advantage of more capacity for textures given more RAM and compression available.
 
Just so I dont have to read the whole thread. How much does the Move engines mitigate the 176gb/s bandwidth advantage in Orbis, or at all? It it more of a thing to make it easier for devs than decrease the bandwidth load?
 
Just so I dont have to read the whole thread. How much does the Move engines mitigate the 176gb/s bandwidth in Orbis, or at all? It it more of a thing to make it easier for devs than decrease the bandwidth load?

Short answer is not enough

Long answer is it depends on what kind of techniques are used , if things like virtual texturing, tileing, tesselation and mesh's are used to could make a huge difference in performance and capabilities that would be unique to Durango.......still more questions than answers as far as how everything fits together and how it will be used.
 
Short answer is not enough

Long answer is it depends on what kind of techniques are used , if things like virtual texturing, tileing, tesselation and mesh's are used to could make a huge difference in performance and capabilities that would be unique to Durango.......still more questions than answers as far as how everything fits together and how it will be used.

We are just going to have to wait for games to be shown and for developers being relieved of Microsoft's embargo so they can talk more openly about Durango.
 
Just so I dont have to read the whole thread. How much does the MOve engines mitigate the 176gb/s bandwidth in Orbis, or at all? It it more of a thing to make it easier for devs than decrease the bandwidth load?

It mitigates nothing in bandwidth but it does helps moving data around.
Where the gpu has to stall for 300+ cycles and probably do nothing(its not this simple) while waiting for memory to arrive from gddr5 to do routine X.

The move engines could have helped preparing the data for routine X while the GPU was doing the previous routine. So when the Gpu wants to do routine X it doesn't have to stall for 300+ cycles but can continue immediately or has to wait lesser amount of cycles maybe only 70 cycles. This will help the GPU be more efficient doing calculation and not be data starved. This what i think the Move engines will be used for, but if im wrong please correct me.

Now the question is which memory model/architecture is the right choice.
And that is something we will probably only know when the games are showed and when we are a couple of years into next gen.

If next gen is like this gen then i don't see microsoft failing but if the ps4 can push for 60 fps in multiplat games everytime and the next xbox can only reach 30 fps then im i don't know how well it will go for microsoft and wouldn't be surprised if they only got xbox1 kind of numbers and they probably deserve it.
 
Short answer is not enough

Long answer is it depends on what kind of techniques are used , if things like virtual texturing, tileing, tesselation and mesh's are used to could make a huge difference in performance and capabilities that would be unique to Durango.......still more questions than answers as far as how everything fits together and how it will be used.

from reading Drek's moving analogy it seems like the move engines allow Durnago to be more efficient with its bandwidth(no snags) maybe allowing it to reach that theoretical peak of 68gb/s more easily and regularly, but at the end of the day it has a maximum bandwidth of 68gb/s and this doesn't increase it. So in the end Orbis still has 2.5x the bandwidth at 176gb/s. Is this the correct interpretation?

It mitigates nothing in bandwidth but it does helps moving data around.
Where the gpu has to stall for 300+ cycles and probably do nothing(its not this simple) while waiting for memory to arrive from gddr5 to do routine X.

The move engines could have helped preparing the data for routine X while the GPU was doing the previous routine. So when the Gpu wants to do routine X it doesn't have to stall for 300+ cycles but can continue immediately or has to wait lesser amount of cycles maybe only 70 cycles. This will help the GPU be more efficient doing calculation and not be data starved. This what i think the Move engines will be used for, but if im wrong please correct me.

Now the question is which memory model/architecture is the right choice.
And that is something we will probably only know when the games are showed and when we are a couple of years into next gen.

If next gen is like this gen then i don't see microsoft failing but if the ps4 can push for 60 fps in multiplat games everytime and the next xbox can only reach 30 fps then im i don't know how well it will go for microsoft and wouldn't be surprised if they only got xbox1 kind of numbers and they probably deserve it.

Gotcha, so what I said above is basically correct?
 
from reading Drek's moving analogy it seems like the move engines allow Durnago to be more efficient with its bandwidth(no snags) maybe allowing it to reach that theoretical peak of 68gb/s more easily and regularly, but at the end of the day it has a maximum bandwidth of 68gb/s and this doesn't increase it. So in the end Orbis still has 2.5x the bandwidth at 176gb/s. Is this the correct interpretation?



Gotcha, so what I said above is basically correct?

There's more to it than just that though

There's ALLOT of speculation that Microsoft chose their design to suit some VERY specific development techniques(mega mesh's, mega\virtual texturing, tessellation, tiling ect) that were created this gen but were unable to be truly utilized due to hardware limitations.

It's pretty much the only way to explain why MS deiced to do what they did with the overall Durango design since there's several things that just make NO SENSE what so ever from a traditional design(and transistor cost) trade-off standpoint.
 
There's more to it than just that though

There's ALLOT of speculation that Microsoft chose their design to suit some VERY specific development techniques(mesh's, virtual texturing, tessellation, tiling ect) that were created this gen but were unable to be utilized due to hardware limitations.

It's pretty much the only way to explain why MS deiced to do what they did with the overall Durango design since several things that just make NO SENSE what so ever from a traditional design(and transistor cost) standpoint.

Huh? What about the fact they wanted this machine to do a lot of non gaming related applications, and needed 8GB's of ram to do that? This being the most cost effective way of achieving that, or really the only way. To me it seems MS number 1 priority was 8GB's of RAM, everything followed after that and was changed to fit within that requirement.
 
There's more to it than just that though

There's ALLOT of speculation that Microsoft chose their design to suit some VERY specific development techniques(mesh's, virtual texturing, tessellation, tiling ect) that were created this gen but were unable to be truly utilized due to hardware limitations.

It's pretty much the only way to explain why MS deiced to do what they did with the overall Durango design since several things that just make NO SENSE what so ever from a traditional design(and transistor cost) standpoint.

I have just been on B3D and a user brought up a presentation from Lionhead on mega meshes that I can remember reading about. Sounds like you could be on to something.
 
from reading Drek's moving analogy it seems like the move engines allow Durnago to be more efficient with its bandwidth(no snags) maybe allowing it to reach that theoretical peak of 68gb/s more easily and regularly, but at the end of the day it has a maximum bandwidth of 68gb/s and this doesn't increase it. So in the end Orbis still has 2.5x the bandwidth at 176gb/s. Is this the correct interpretation?



Gotcha, so what I said above is basically correct?

Also dont get caught up with the term texture. Corrine yu describes how texture is defined when they're looking at gpgpu in one of the channel 9 videos. The texture is a way of passing information into the gpu to run calculations.

sorta, but the big thing is how the move engine will work with CPU/GPU tasks. IE you write from your GPU to ram, the move engine writes to esram, takes it back out of esram then writes it to ram again and vice versa.

It means that neither the gpu or cpu should have idle clock cycles, since they should be always dealing with some instruction or task. Seems like a hardware version of their memexport function, the movement of data is very important, reading and writing directly to target renders can greatly influence how people utilze the system.

MEMEXPORT expands the graphics pipeline further forward and in a general purpose and programmable way. For instance, one example of its operation could be to tessellate an object as well as to skin it by applying a shader to a vertex buffer, writing the results to memory as another vertex buffer, then using that buffer run a tessellation render, then run another vertex shader on that for skinning. MEMEXPORT could potentially be used to provide input to the tessellation unit itself by running a shader that calculates the tessellation factor by transforming the edges to screen space and then calculates the tessellation factor on each of the edges dependant on its screen space and feeds those results into the tessellation unit, resulting in a dynamic, screen space based tessellation routine. Other examples for its use could be to provide image based operations such as compositing, animating particles, or even operations that can alternate between the CPU and graphics processor.

With the capability to fetch from anywhere in memory, perform arbitrary ALU operations and write the results back to memory, in conjunction with the raw floating point performance of the large shader ALU array, the MEMEXPORT facility does have the capability to achieve a wide range of fairly complex and general purpose operations; basically any operation that can be mapped to a wide SIMD array can be fairly efficiently achieved and in comparison to previous graphics pipelines it is achieved in fewer cycles and with lower latencies. For instance, this is probably the first time that general purpose physics calculation would be achievable, with a reasonable degree of success, on a graphics processor and is a big step towards the graphics processor becoming much more like a vector co-processor to the CPU.


It'll also be interesting to see how microsoft leverages the GPU for OS/App tasks.
 
from reading Drek's moving analogy it seems like the move engines allow Durnago to be more efficient with its bandwidth(no snags) maybe allowing it to reach that theoretical peak of 68gb/s more easily and regularly, but at the end of the day it has a maximum bandwidth of 68gb/s and this doesn't increase it. So in the end Orbis still has 2.5x the bandwidth at 176gb/s. Is this the correct interpretation?


Gotcha, so what I said above is basically correct?

It is my interpretation of why microsoft would choose Move Engines. And i could be totally wrong.
The move engines also have some tiling logic build in so i think they are going after the PowerVR model of Tiled Rendering and what they already had implemented in software with microsoft Talisman. I believe if the wiki is rigth intel Larrabee used tiled rendering.

http://en.wikipedia.org/wiki/Microsoft_Talisman

And there are still unknown variables from durango. These are the remaining options of the vgLeak poll.
*audio block.
*Display planes.
*memory System.
*Video Compression.

So we have to wonder will ps4 also have a audio block or will they need to sacrifice CU time to process audio from what i heard on B3D audio could take a whole core on the 360.

So far it seems Sony has the more/powerful generic machine and MS is opting to go with a machine which on paper looks or probably is weaker but with a lot of specialized hardware.

Hell for all we know they could actually be even in strength when you put everything together.
 
Huh? What about the fact they wanted this machine to do a lot of non gaming related applications, and needed 8GB's of ram to do that? This being the most cost effective way of achieving that, or really the only way. To me it seems MS number 1 priority was 8GB's of RAM, everything followed after that and was changed to fit within that requirement.


That' more of a matter of perspective and doesn't really fit with XBox history or their previous model for success, in this case they may be able to "have their cake and eat it too" as far as the media player\machine angle

To say those 6-8 gigs of RAM won't be useful for other purposes(mega textures\meshs for example) could be a little short sighted, keep in mind we have no idea nothing but rumors as to how much of the main memory will be available while playing games(keep in mind 360 OS uses 32MB of RAM and the largest consumer DVR uses around 512MB) and the size of the memory pool combined with possible ESRAM\cache usages suggest there's more going on than we know at this time.

People are starting to suspect something is up and that Microsoft is taking a more non-traditional approach than most expected but MUCH of this is still speculation and would hinge on the API's being able extremely capable\functional right out the door.

To be clear I'm not saying Durango has the potential to be as or more powerful than Orbis, just saying there's ALLOT that does NOT add up with the Durango design if you look at it in the traditional PC gaming perspective.
 
So we have to wonder will ps4 also have a audio block or will they need to sacrifice CU time to process audio from what i heard on B3D audio could take a whole core on the 360.

So far it seems Sony has the more/powerful generic machine and MS is opting to go with a machine which on paper looks or probably is weaker but with a lot of specialized hardware.

Hell for all we know they could actually be even in strength when you put everything together.

The vgleaks article said PS4 has an additional audio processor. Also 4 custom CU's (each with additional ALU) for compute isn't very generic. As a reminder heres what the vgleaks article said Orbis had as "extras".

Audio Processor (ACP)
Video encode and decode (VCE/UVD) units
Display ScanOut Engine (DCE)
Zlib Decompression Hardware

That' more of a matter of perspective

To say those 6-8 gigs of RAM won't be useful for other purposes(mega textures\meshs for example) could be a little short sighted, keep in mind we have no idea nothing but rumors as to how much of the main memory will be available while playing games(keep in mind 360 OS uses 32MB of RAM and the largest consumer DVR uses around 512MB) and the size of the memory pool combined with possible ESRAM\cache usages suggest there's more going on than we know at this time.

People are starting to suspect something is up and that Microsoft is taking a more non-traditional approach than most expected but MUCH of this is still speculation and would hinge on the API's being able extremely capable\functional right out the door.

The rumor is that the machine will reserve 2-3gb's of RAM for the OS and other functions. Theres a lot of developer comments(see the recent EDGE articles posted today) that Durango's OS is much more intrusive than Orbis's. This supports my claim that MS will be using this machine for playing games as well as many other applications(win8, dvr, extensive multitasking, cooking your morning breakfast, reviving your dead pet Fluffy, ect, ect)
 
The vgleaks article said PS4 has an additional audio processor. Also 4 custom CU's (each with additional ALU) for compute isn't very generic. As a reminder heres what the vgleaks article said Orbis had as "extras".

Audio Processor (ACP)
Video encode and decode (VCE/UVD) units
Display ScanOut Engine (DCE)
Zlib Decompression Hardware

the difference is that microsoft is providing developers a way to leverage all the cores on the gpu as compute.
 
The vgleaks article said PS4 has an additional audio processor. Also 4 custom CU's (each with additional ALU) for compute isn't very generic. As a reminder heres what the vgleaks article said Orbis had as "extras".

Audio Processor (ACP)
Video encode and decode (VCE/UVD) units
Display ScanOut Engine (DCE)
Zlib Decompression Hardware



The rumor is that the machine will reserve 2-3gb's of RAM for the OS and other functions. Theres a lot of developer comments(see the recent EDGE articles posted today) that Durango's OS is much more intrusive than Orbis's. This supports my claim that MS will be using this machine for playing games as well as many other applications(win8, dvr, extensive multitasking, cooking your morning breakfast, reviving your dead pet Fluffy, ect, ect)

Yet based on what the 360 currently does and how much max memory the OS currently uses no one can come up with any reasonable uses for reserving THAT much memory EVEN if it is recording HDTV programs in the background(Which is highly unlikely in the first place since almost everyone already has DVR's at home that wants one and their basicaly free from the cable company these days) so again this conversation leaves us with more questions than answers.

Anyone that thinks they have all this figured it out either has UBER inside information or is looking at the issue from a pretty closed minded and or biased perspective.
 
the difference is that microsoft is providing developers a way to leverage all the cores on the gpu as compute.

And when they are all doing compute, they aren't doing graphics, so its a trade off.

Plus I don't see anything forbidding Sony developers to use the remaining 14CUs for compute if they wanted to - its still a standard looking GCN GPU like the Durango one.


Durango
- 12CU for graphics, no compute performance.
- 8CU for graphics 4 for compute.
- 12 for compute, no graphics
Or any combination I guess

Orbis
- 14CU for graphics, 4CU for compute.
- 14 for graphics, 4 for more graphics (not quite getting full efficiency out of them that way maybe).
- 12 CU for graphics, 6 for compute
 
It really does seem like Microsoft had a target in terms of the amount of RAM they needed for other purposes and have for lack of a better term duct taped on things to help mitigate the potential deficiencies this caused in terms of gaming performance.
Where is aegis now?
Lol, for some reason I read that as "Where is your god now?"
 
Yet based on what the 360 currently does and how much max memory the OS currently uses no one can come up with any reasonable uses for reserving THAT much memory EVEN if it is recording HDTV programs in the background(Which is highly unlikely in the first place since almost everyone already has DVR's at home that wants one and their basicaly free from the cable company these days) so again this conversation leaves us with more questions than answers.

Anyone that thinks they have all this figured it out either has UBER inside information or is looking at the issue from a pretty closed minded and or biased perspective.

exactly see my joke about reviving Fluffy and cooking breakfast, ect ect. What I'm implying here, is that I'm predicting that MS will reveal some pretty surprising features that may or not blow your mind depending on what you care about. I truly believe MS is gonna have some features with this thing that we just cant possibly guess at yet or cant quite fully comprehend tell the system is revealed. All the writing is on the wall, just because you cant come up with what it could be doesn't mean it isn't going to happen.

And when they are all doing compute, they aren't doing graphics, so its a trade off.

Plus I don't see anything forbidding Sony developers to use the remaining 14CUs for compute if they wanted to - its still a standard looking GCN GPU like the Durango one.


Durango
- 12CU for graphics, no compute performance.
- 8CU for graphics 4 for compute.
- 12 for compute, no graphics
Or any combination I guess

Orbis
- 14CU for graphics, 4CU for compute.
- 14 for graphics, 4 for more graphics (not quite getting full efficiency out of them that way maybe).
- 12 CU for graphics, 6 for compute

Exactly. To the bolded though. The way a lot of people have described it, it just A LOT better at doing compute, but its just as good at rendering as the other CU's, but devs may or may not have to manual program them individually to do these tasks(i.e. just more work). Once again this is just some peoples speculation on gaf that have a lot more technical know then me. Even they basically say we'll probably have to wait for dev summit or something to know for sure(even then we have to wait for the info to leak lol).
 
exactly see my joke about reviving Fluffy and cooking breakfast, ect ect. What I'm implying here, is that I'm predicting that MS will reveal some pretty surprising features that may or not blow your mind depending on what you care about. I truly believe MS is gonna have some features with this thing that we just cant possibly guess at yet or cant quite fully comprehend tell the system is revealed. All the writing is on the wall, just because you cant come up with what it could be doesn't mean it isn't going to happen.



Exactly. To the bolded though. The way a lot of people have described it, it just A LOT better at doing compute, but its just as good at rendering as the other CU's, but devs may or may not have to manual program them individually to do these tasks(i.e. just more work). Once again this is just some peoples speculation on gaf that have a lot more technical know then me. Even they basically say we'll probably have to wait for dev summit or something to know for sure(even then we have to wait for the info to leak lol).


Yep agreed but I think the "big stuff" will be more about "techniques"(Megamesh\mega textures ect) that Durango was designed around and less about "secret sauce" special super duper computer hardware.

Anyone that thinks those extra 4 extra CU's in Orbis will be used directly for graphics based tasks is REALLY REALLY reaching it would be inefficient\complex as heck to code for and be far more beneficial for other tasks.
 
Anyone that thinks those extra 4 extra CU's in Orbis will be used directly for graphics based tasks is REALLY REALLY reaching it would be inefficient\complex as heck to code for and be far more beneficial for other tasks.

Sounds crazy, like using the SPUs to assist in rendering in the PS3. Or do you mean something else?
 
Has something been confirmed to indicate those 4 CUs are special and can't be used for graphical tasks?

No, the only thing that has been confirmed is that they can be used for rendering(they used the word "slightly in the article which has everyone interpreting that word differently) and they are specialized for compute and each have additional ALU. This is all according to VGleaks article(so technically nothing is confirmed lol). Everyone is guessing on how it would work as far the programming aspect of it(as in it may take more work if you want to use them for rendering), because the eurogamer article said it wasn't part of the "rendering pipeline." I think thats just more speculation on there part though.
 
Sounds crazy, like using the SPUs to assist in rendering in the PS3. Or do you mean something else?

IMO there's other uses for those 4 CU that would far more beneficial overall, physics, AI possibly , culling who knows but according to what we do know(and makes sense) it's not like like they can't just be turned into regular CU's and simply added to the rendering pile.
 
Whoops posted elsewhere on accident. The move engines and VT(virtual texturing) stuff Lionhead was pushing and Corrien Yu also talks about seems to go hand in hand. Interesting design the more I read about it.
http://miciwan.com/GDC2011/GDC2011_Mega_Meshes.pdf
Page 25 is where he talks about it. Damn I keep posting in the stupid CPU thread. Too many threads.
 
The GPU improved a fair bit and will help the processing but if the luma can't re-route the frinster then the chroma beta output is only gonna get undermined by the central core processor. Should've used a dual hertz mega processor in lieu of the chroma side output.
 
Top Bottom