VGLeaks: Durango's Move Engines

So as the x720 architecture gets demystified, it's entire make up could indeed be the secret sauce. Just as some have been hinting.

A clever, super efficient box that potentially matches and perhaps even in the right hands, exceeds the brute force styling of the competition. This combined with more ram, could see the new Xbox crowned the next gen technical darling.

And that's not even throwing the Swiss Army knife of periphery social and media features into the mix.


Or am I assuming too much?
 
Or am I assuming too much?

It's still impossible to tell, but it does seem that they will be rather close when it comes to basic hardware performance. Xbox to GameCube or Xbox 360 to PS3 do sound like somewhat reasonable assessments, while Xbox to PS2 does not since Xbox was several times more powerful than PS2 on paper (overall; PS2 was better in some things, like CPU floating point performance), and Orbis simply does not leap over Durango's proposed specs to that extent.

Again, we can't be certain yet and things could change, but they seem to be in the same ballpark, which would indicate that the real battles will be fought elsewhere.
 
So as the x720 architecture gets demystified, it's entire make up could indeed be the secret sauce. Just as some have been hinting.

A clever, super efficient box that potentially matches and perhaps even in the right hands, exceeds the brute force styling of the competition. This combined with more ram, could see the new Xbox crowned the next gen technical darling.

And that's not even throwing the Swiss Army knife of periphery social and media features into the mix.


Or am I assuming too much?

$2.03 an hour?
 
So as the x720 architecture gets demystified, it's entire make up could indeed be the secret sauce. Just as some have been hinting.

A clever, super efficient box that potentially matches and perhaps even in the right hands, exceeds the brute force styling of the competition. This combined with more ram, could see the new Xbox crowned the next gen technical darling.

And that's not even throwing the Swiss Army knife of periphery social and media features into the mix.


Or am I assuming too much?

Everything we've seen about Orbis has indicated the same and that it's a lot easier to develop for than the PS3. So I'm sure both coming consoles will be quite efficient in different ways.
 
So as the x720 architecture gets demystified, it's entire make up could indeed be the secret sauce. Just as some have been hinting.

A clever, super efficient box that potentially matches and perhaps even in the right hands, exceeds the brute force styling of the competition. This combined with more ram, could see the new Xbox crowned the next gen technical darling.

And that's not even throwing the Swiss Army knife of periphery social and media features into the mix.


Or am I assuming too much?

To obvious.

$2.03 an hour?

.50¢ per post.
 
These claims of people here being MS viral marketers is getting incredibly tiring. Disagreeing with the narrative that the PS4 is going to be the panacea of gaming apparently is enough to qualify to be on Microsoft's payroll. I'm waiting for my check by the way.
 
These claims of people here being MS viral marketers is getting incredibly tiring. Disagreeing with the narrative that the PS4 is going to be the panacea of gaming apparently is enough to qualify to be on Microsoft's payroll. I'm waiting for my check by the way.

ig1bls7dEklCD.jpeg
 
I like how straight forward development environment = brute force, nevermind the rumors painting Orbis having dedicated hardware to help out as well. This thread is just hilarious. Hell all the next gen threads so far are hilarious.
 
They mitigate bandwidth issues by moving pieces of data to esram, so when the gpu needs to access them it can fetch from both pools at the same time.

Obviously, it's not going to be 68 + 102 GB/s, but the perceived bandwidth of the system can be much higher than either of the pools individually.

opy Operation Peak throughput using move engine(s) Peak throughput using shader
RAM ->RAM 25.6 GB/s 34 GB/s
RAM ->ESRAM 25.6 GB/s 68 GB/s
ESRAM -> RAM 25.6 GB/s 68 GB/s
ESRAM -> ESRAM 25.6 GB/s 51.2 GB/s

They move data to and from eSRAM slower than a direct copy, but they do it without using the GPU cores. They don't magically add bandwidth, they minimize stalls from the shader cores being busy with a data copy.

I think with perfect use, the system is humming along with each part of the pipeline well fed with data, but the peak bandwidth is still going to be less than that of the eSRAM's 102GB/s.

Iti will be interesting to learn how much this needs to be hand tuned by each engine and each game.
 
They move data to and from eSRAM slower than a direct copy, but they do it without using the GPU cores. They don't magically add bandwidth, they minimize stalls from the shader cores being busy with a data copy.

I think with perfect use, the system is humming along with each part of the pipeline well fed with data, but the peak bandwidth is still going to be less than that of the eSRAM's 102GB/s.

Iti will be interesting to learn how much this needs to be hand tuned by each engine and each game.

They don't add bandwidth by copying data. The gpu sees a increased bandwidth because they put the data the gpu will need in advance so it can access from both memories at the same time.

Since they drink from the same pool the gpu or the cpu, obviously the feed rate won't be anywhere near the sum of 170GB/s, but it should be higher than either pool by itself.

Edit: See also the previous gpu leak. The gpu uses virtual addresses, which can be mapped to either pool. The DMEs also can tile portions of the data, so it can take a portion of the texture that resides in the main ram, put it on the esram, and the gpu will be able to read the whole texture from both pools at once.
 
Right, but that act of moving the partial data also consumes bandwidth, so even if you can multiplex your texture reads across both memory buses, you're still well below the theoretical maximum, and potentially effectively below the maximum of even one of the pools because of the overhead.

Like AgentP said, the benefit is that you don't have to waste your GPU or CPU time moving data, and it prevents stalls for situations where you need the whole texture, but you only have part of it in ESRAM, the GPU can seamlessly move on to reading the rest of the texture in DDR3.

It isn't additive. It just smooths over what would otherwise be huge cliffs in performance.
 
These claims of people here being MS viral marketers is getting incredibly tiring. Disagreeing with the narrative that the PS4 is going to be the panacea of gaming apparently is enough to qualify to be on Microsoft's payroll. I'm waiting for my check by the way.

When their argument is that you're a viral marketer, you've already won.
 
So as the x720 architecture gets demystified, it's entire make up could indeed be the secret sauce. Just as some have been hinting.

A clever, super efficient box that potentially matches and perhaps even in the right hands, exceeds the brute force styling of the competition. This combined with more ram, could see the new Xbox crowned the next gen technical darling.

And that's not even throwing the Swiss Army knife of periphery social and media features into the mix.


Or am I assuming too much?

Still too early to say. We should wait until official specs.
 
As I understand it, the move engines are not really to save bandwidth, but to save GPU cycles. No matter how you slice it, they are still moving data around at a peak 102GB/s.

Just to clear something up.

The move engines have a max bandwidth of 25.6GB/s for all 4 of them (in reality its slightly less but not by much). One of them can saturate this bus, or all 4 can take 1/4 of it. They cannot move 102GB/s between then.
 
Just to clear something up.

The move engines have a max bandwidth of 25.6GB/s for all 4 of them (in reality its slightly less but not by much). One of them can saturate this bus, or all 4 can take 1/4 of it. They cannot move 102GB/s between then.

Also worth noting that the 25.6GB/s isn't separate bandwidth it comes from the main memory's 68GB/s, and they don't run at full speed when they're compressing. When they're doing both LZ and jpeg compression their rate drops to 400MB/s.
 
Also worth noting that the 25.6GB/s isn't separate bandwidth it comes from the main memory's 68GB/s, and they don't run at full speed when they're compressing. When they're doing both LZ and jpeg compression their rate drops to 400MB/s.

theres a good likely hood (nearly certain) that you wont be able to do both LZ and JPEG

So you will have to pick either one. The DME unit that only has LZ is most likely the LZ encoder (capable of encoding streams into LZ) and the one that has both LZ and JPEG is most likely a decoder for both.

Which would make it, if you use JPEG 327MB/s (roughly) or if you use LZ 200MB/s. Cant have both.
 
theres a good likely hood (nearly certain) that you wont be able to do both LZ and JPEG

So you will have to pick either one. The DME unit that only has LZ is most likely the LZ encoder (capable of encoding streams into LZ) and the one that has both LZ and JPEG is most likely a decoder for both.

Which would make it, if you use JPEG 327MB/s (roughly) or if you use LZ 200MB/s. Cant have both.

You cannot make an assumption and the qualify it as a fact. Besides from the gist of it on B3D, is that the DMEs are not running full speed because the would consume all the bandwidth, thus defeating their purpose in the first place. The DMEs are their primarily to make sure there are not stalls in the system, not cycle wasted. Read up ERP and Gubbi's post on B3D and you will get a better idea of the reason the DMEs are there in the system.
 
Right, but that act of moving the partial data also consumes bandwidth, so even if you can multiplex your texture reads across both memory buses, you're still well below the theoretical maximum, and potentially effectively below the maximum of even one of the pools because of the overhead.

Like AgentP said, the benefit is that you don't have to waste your GPU or CPU time moving data, and it prevents stalls for situations where you need the whole texture, but you only have part of it in ESRAM, the GPU can seamlessly move on to reading the rest of the texture in DDR3.

It isn't additive. It just smooths over what would otherwise be huge cliffs in performance.
It consumes bandwidth, but that is only a subtraction of the gpu bandwidth if the gpu is saturating the bus. If the data move occur during a time the gpu isn't using all the bandwidth (or better said, if it's not using enough bandwidth so the DMEs can operate at full rate), or when they are not using any bandwidth at all, the dmes can move that data for free (meaning, without taking any from the gpu).

Translate the GB/s to a per clock figure and that should be clearer. Every clock where the gpu does not saturate the bus a data move can occur, and if in the next few clocks, it needs to saturate it, and the data is already ready on both memories the gpu will be able to read from both memories at full speed.

Of course, in reality not all moves are going to occur during times the gpu isn't using it, so the gpu and the dmes will be competing for bandwidth at times, hence why the gpu won't see a perceived rate of 170GB/s ever, but unless something goes horribly wrong there's not much chance of it being actually slower than a single pool either. Perhaps for a few clocks, but not for a sustained period of time.
 
You cannot make an assumption and the qualify it as a fact. Besides from the gist of it on B3D, is that the DMEs are not running full speed because the would consume all the bandwidth, thus defeating their purpose in the first place. The DMEs are their primarily to make sure there are not stalls in the system, not cycle wasted. Read up ERP and Gubbi's post on B3D and you will get a better idea of the reason the DMEs are there in the system.

It becomes rather obvious when you realise that a JPEG decodes would reuse a lot of similar components of a LZ decode, making it very cheap to implement on the same IC. Where as the LZ encode would have nearly nothing in common with it and require reimplementing a lot of things.

So yes that assumption can quite easily be made. Also note that there will be DMA units in Orbis as well, the only thing different here is (maybe) the hardware decompression or the rate at which they work.

Its also in the article.

The same move engine that supports LZ decoding also supports JPEG decoding.
 
It becomes rather obvious when you realise that a JPEG decodes would reuse a lot of similar components of a LZ decode, making it very cheap to implement on the same IC. Where as the LZ encode would have nearly nothing in common with it and require reimplementing a lot of things.

So yes that assumption can quite easily be made. Also note that there will be DMA units in Orbis as well, the only thing different here is (maybe) the hardware decompression or the rate at which they work.

Its also in the article.

As far as I know there are no rumor pointing to the inclusion of the DMEs on the ps4, and it there is, there is a matter of if they are the standard DMAs included in every gcn card which recides in the gpu. I f we are making that assumption then there is nothing stopping the durango from having those as well. These DMEs are similar but htere function has been greatly expanded and there are 4 of them. According to ERP, MS spent a lot of documentation on the DMEs and how to use them.

Anyway the ps4 can have 39 DMAs and thats ok, but that has nothing to do with what we are discussing.
 
As far as I know there are no rumor pointing to the inclusion of the DMEs on the ps4, and it there is, there is a matter of if they are the standard DMAs included in every gcn card which recides in the gpu. I f we are making that assumption then there is nothing stopping the durango from having those as well. These DMEs are similar but htere function has been greatly expanded and there are 4 of them. According to ERP, MS spent a lot of documentation on the DMEs and how to use them.

Anyway the ps4 can have 39 DMAs and thats ok, but that has nothing to do with what we are discussing.

Microsoft likes to remain things
DMEs are DMA units I haven't seen anything normal DMA can't do here
 
Thats the simple function and it says nothing about the DMEs other capabilities and how they are used relative to the system.

They move data between pools via a external bus (this sounds like DMA to me, identical even, depending on implementation). They also have some hardware compression tacked onto some of them.

What other functions, they simply do not seem to exist.

Further from moving data and compression/decompression what does the article outline that these do?.
 
The DMEs appear to simply expand on the functionality of the normal DMAs to cope with the issues created by Durango's segmented memory topology. In fact, I would speculate that the 2 "DMEs" that lack the compression hardware are literally stock GCN DMAs. I don't think it's a coincidence that their max speed is exactly the same as a PCI-Express 3.0 slot which is what the stock CGN DMAs are optimized for.

Orbis doesn't need DMEs like Durango, but it will in all likelihood have the standard 2 GCN DMAs for managing the flow of data between main memory and the GPU's L2 cache. Since Orbis does not face the same segmented memory challenges it has no need for the same solution.
 
It consumes bandwidth, but that is only a subtraction of the gpu bandwidth if the gpu is saturating the bus. If the data move occur during a time the gpu isn't using all the bandwidth (or better said, if it's not using enough bandwidth so the DMEs can operate at full rate), or when they are not using any bandwidth at all, the dmes can move that data for free (meaning, without taking any from the gpu).

Translate the GB/s to a per clock figure and that should be clearer. Every clock where the gpu does not saturate the bus a data move can occur, and if in the next few clocks, it needs to saturate it, and the data is already ready on both memories the gpu will be able to read from both memories at full speed.

Of course, in reality not all moves are going to occur during times the gpu isn't using it, so the gpu and the dmes will be competing for bandwidth at times, hence why the gpu won't see a perceived rate of 170GB/s ever, but unless something goes horribly wrong there's not much chance of it being actually slower than a single pool either. Perhaps for a few clocks, but not for a sustained period of time.

Who are you? You are talking some in-depth stuff.
 
They move data between pools via a external bus (this sounds like DMA to me, identical even, depending on implementation). They also have some hardware compression tacked onto some of them.

What other functions, they simply do not seem to exist.

Further from moving data and compression/decompression what does the article outline that these do?.

The compression and decompression will most definitely be useful and that is a difference.
 
The compression and decompression will most definitely be useful and that is a difference.

Thats wonderful, and yes thats a difference. But other then that, can you come up with anything?. It would be fun to note that the compression would be nigh on useless to use until you actually A. Have to use a compressed format for something for some reason, or B. run out of bandwidth. The compression runs so much slower then the actual bare moving case that i can barely see the point aside from removing stress from the CPU from decompressing.
 
Thats wonderful, and yes thats a difference. But other then that, can you come up with anything?. It would be fun to note that the compression would be nigh on useless to use until you actually A. Have to use a compressed format for something for some reason, or B. run out of bandwidth. The compression runs so much slower then the actual bare moving case that i can barely see the point aside from removing stress from the CPU from decompressing.

This is a console and removing said stress form the cpu is good as the cpu can be used for other things.
 
These claims of people here being MS viral marketers is getting incredibly tiring. Disagreeing with the narrative that the PS4 is going to be the panacea of gaming apparently is enough to qualify to be on Microsoft's payroll. I'm waiting for my check by the way.

http://www.neogaf.com/forum/showthread.php?t=484698
Sup.

If you think for a second that there isn't something similar for viral marketing (people writing for pennies) then idk what to say :p

I think the guy mocked above was just trolling to get a rise out of people though, not a marketer.
 
http://www.neogaf.com/forum/showthread.php?t=484698
Sup.

If you think for a second that there isn't something similar for viral marketing (people writing for pennies) then idk what to say :p

I think the guy mocked above was just trolling to get a rise out of people though, not a marketer.

+1.



George Monbiot. Published in the Guardian said:
Every month more evidence piles up, suggesting that online comment threads and forums are being hijacked by people who aren’t what they seem to be. The anonymity of the web gives companies and governments golden opportunities to run astroturf operations: fake grassroots campaigns, which create the impression that large numbers of people are demanding or opposing particular policies. This deception is most likely to occur where the interests of companies or governments come into conflict with the interests of the public. For example, there’s a long history of tobacco companies creating astroturf groups to fight attempts to regulate them.

After I last wrote about online astroturfing, in December, I was contacted by a whistleblower. He was part of a commercial team employed to infest internet forums and comment threads on behalf of corporate clients, promoting their causes and arguing with anyone who opposed them. Like the other members of the team, he posed as a disinterested member of the public. Or, to be more accurate, as a crowd of disinterested members of the public: he used 70 personas, both to avoid detection and to create the impression that there was widespread support for his pro-corporate arguments. I’ll reveal more about what he told me when I’ve finished the investigation I’m working on.

But it now seems that these operations are more widespread, more sophisticated and more automated than most of us had guessed. Emails obtained by political hackers from a US cyber-security firm called HB Gary Federal suggest that a remarkable technological armoury is being deployed to drown out the voices of real people.

As the Daily Kos has reported, the emails show that:

- companies now use “persona management software”, which multiplies the efforts of the astroturfers working for them, creating the impression that there’s major support for what a corporation or government is trying to do.

- this software creates all the online furniture a real person would possess: a name, email accounts, web pages and social media. In other words, it automatically generates what look like authentic profiles, making it hard to tell the difference between a virtual robot and a real commentator.

- fake accounts can be kept updated by automatically re-posting or linking to content generated elsewhere, reinforcing the impression that the account holders are real and active.

- human astroturfers can then be assigned these “pre-aged” accounts to create a back story, suggesting that they’ve been busy linking and re-tweeting for months. No one would suspect that they came onto the scene for the first time a moment ago, for the sole purpose of attacking an article on climate science or arguing against new controls on salt in junk food.

- with some clever use of social media, astroturfers can, in the security firm’s words, “make it appear as if a persona was actually at a conference and introduce himself/herself to key individuals as part of the exercise … There are a variety of social media tricks we can use to add a level of realness to all fictitious personas”

But perhaps the most disturbing revelation is this. The US Air Force has been tendering for companies to supply it with persona management software, which will perform the following tasks:

a. Create “10 personas per user, replete with background, history, supporting details, and cyber presences that are technically, culturally and geographically consistent. … Personas must be able to appear to originate in nearly any part of the world and can interact through conventional online services and social media platforms.”

b. Automatically provide its astroturfers with “randomly selected IP addresses through which they can access the internet.” [An IP address is the number which identifies someone's computer]. These are to be changed every day, “hiding the existence of the operation.” The software should also mix up the astroturfers’ web traffic with “traffic from multitudes of users from outside the organization. This traffic blending provides excellent cover and powerful deniability.”

c. Create “static IP addresses” for each persona, enabling different astroturfers “to look like the same person over time.” It should also allow “organizations that frequent same site/service often to easily switch IP addresses to look like ordinary users as opposed to one organization.”

Software like this has the potential to destroy the internet as a forum for constructive debate. It makes a mockery of online democracy. Comment threads on issues with major commercial implications are already being wrecked by what look like armies of organised trolls – as you can often see on the Guardian’s sites. The internet is a wonderful gift, but it’s also a bonanza for corporate lobbyists, viral marketers and government spin doctors, who can operate in cyberspace without regulation, accountability or fear of detection. So let me repeat the question I’ve put in previous articles, and which has yet to be satisfactorily answered: what should we do to fight these tactics?

Thats why when I look at comment sections or threads for things like Windows 8, I roll my eyes.
 
The DMEs appear to simply expand on the functionality of the normal DMAs to cope with the issues created by Durango's segmented memory topology. In fact, I would speculate that the 2 "DMEs" that lack the compression hardware are literally stock GCN DMAs. I don't think it's a coincidence that their max speed is exactly the same as a PCI-Express 3.0 slot which is what the stock CGN DMAs are optimized for.

Orbis doesn't need DMEs like Durango, but it will in all likelihood have the standard 2 GCN DMAs for managing the flow of data between main memory and the GPU's L2 cache. Since Orbis does not face the same segmented memory challenges it has no need for the same solution.

Tis is something I'd like to know more about. GCN has two DMA engines, but I can't find any information about the speeds they run at, or whether they can execute memory read/write without shaders having to wait, if they can, then effectively Durango only has the two extra 'special' DMA engines customised for tiling and compression. as they all share the same bandwidth there wouldn't be a huge benefit of four Vs two, and at least for textures orbis would still be transferring them compresed anyway.
 
The DMEs appear to simply expand on the functionality of the normal DMAs to cope with the issues created by Durango's segmented memory topology. In fact, I would speculate that the 2 "DMEs" that lack the compression hardware are literally stock GCN DMAs. I don't think it's a coincidence that their max speed is exactly the same as a PCI-Express 3.0 slot which is what the stock CGN DMAs are optimized for.

Orbis doesn't need DMEs like Durango, but it will in all likelihood have the standard 2 GCN DMAs for managing the flow of data between main memory and the GPU's L2 cache. Since Orbis does not face the same segmented memory challenges it has no need for the same solution.

Sums up my thoughts on the DMEs. I think the reason most people are assuming this is some new fangled magical pixie dust is because the average person doesn't know what DMA is and so when they see next gen leaks about DMEs and walls of text they automatically assume this is something amazingly new and exclusive hardware that has debuted on consoles.
 
Tis is something I'd like to know more about. GCN has two DMA engines, but I can't find any information about the speeds they run at, or whether they can execute memory read/write without shaders having to wait, if they can, then effectively Durango only has the two extra 'special' DMA engines customised for tiling and compression. as they all share the same bandwidth there wouldn't be a huge benefit of four Vs two, and at least for textures orbis would still be transferring them compresed anyway.

I've never seen the speed of the GCN quoted directly, but in promotional materials AMD says they are "optimized for PCI-Express 3.0". Independently executing memory reads and writes are the whole purpose of a DMA, though.
 
Tis is something I'd like to know more about. GCN has two DMA engines, but I can't find any information about the speeds they run at, or whether they can execute memory read/write without shaders having to wait, if they can, then effectively Durango only has the two extra 'special' DMA engines customised for tiling and compression. as they all share the same bandwidth there wouldn't be a huge benefit of four Vs two, and at least for textures orbis would still be transferring them compresed anyway.

Someone should let MS engineers know that they might as well not worry about these two extra DMAs because there is not much benefit of having them.
 
So as the x720 architecture gets demystified, it's entire make up could indeed be the secret sauce. Just as some have been hinting.

A clever, super efficient box that potentially matches and perhaps even in the right hands, exceeds the brute force styling of the competition. This combined with more ram, could see the new Xbox crowned the next gen technical darling.

And that's not even throwing the Swiss Army knife of periphery social and media features into the mix.


Or am I assuming too much?
Man this post. You're assuming a lot and I really don't see the basis for it.
 
Someone should let MS engineers know that they might as well not worry about these two extra DMAs because there is not much benefit of having them.

They should be really helpful because the high bandwidth pool of embedded memory is so small. It will be necessary to move data around far more often than a conventional GPU with 1-3GB of VRAM.
 
Superdae is again selling a devkit on eBay:
http://www.ebay.com.au/itm/ws/eBayISAPI.dll?ViewItem&item=221187435627
Condition:
Used: An item that has been used previously. The item may have some signs of cosmetic wear, but is fully ... Read more
Processor Type: 8-core 64-bit CPU
Hard Drive Capacity: 500 GB Processor Speed: 1.6 GHz
Primary Drive: BR 50G Memory: 8GB

He promoted the link on his twitter so it's really him.

How come it is located in Australia? I thought DaE was in Raleigh , NC?
 
Sums up my thoughts on the DMEs. I think the reason most people are assuming this is some new fangled magical pixie dust is because the average person doesn't know what DMA is and so when they see next gen leaks about DMEs and walls of text they automatically assume this is something amazingly new and exclusive hardware that has debuted on consoles.
This seems to have happened to quite a few standard features of modern graphics hardware.
 
Sums up my thoughts on the DMEs. I think the reason most people are assuming this is some new fangled magical pixie dust is because the average person doesn't know what DMA is and so when they see next gen leaks about DMEs and walls of text they automatically assume this is something amazingly new and exclusive hardware that has debuted on consoles.

First, nobody is thinking it's magic pixie dust. second there's people here and at B3D (including developers) that know what DMA is and how it's different to what is rumored. Now if you're smarter then all of them, please show the errors in their ways...
 
How come it is located in Australia? I thought DaE was in Raleigh , NC?

He either moved to Australia or he pretends to. One follower on asked him on twitter for how much money he would pull the auction and he said for $3K he will deliver it in person within the states.

And it's just the alpha kit with the latest xdk it seems.
 
Top Bottom