• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

AMD Ryzen Thread: Affordable Core Act

Paragon

Member
Looks like PC Perspective have posted a video discussing the Windows 10 scheduler article now: https://www.youtube.com/watch?v=6laL-_hiAK0
The short version is that they think their testing is valid, and that there is not an issue with the scheduler.

Hardware Unboxed have also posted a response to AdoredTV's “Ryzen - The Tech Press Loses The Plot” video, comparing the 2500K vs FX8370: https://www.youtube.com/watch?v=76-8-4qcpPo
I'm kind of shocked to see the 2500K at stock speeds beating the 1700X in Deus Ex: Mankind Divided, as that game scales well beyond 4 cores, and it's one of the games that my 2500K (at 4.5GHz) really struggles with.
 
For people who actually do real work on their monster PCs:
https://www.pugetsystems.com/labs/a...2017-AMD-Ryzen-7-1700X-1800X-Performance-909/

Ryzen seems super decent for Premiere Pro. I run my 5820K with a 4.3 OC but if I was staying at stock since most pros don't OC their mission-critical workstations, I would definitely have Ryzen under consideration. The thing is that most pros don't necessarily care that much about overall cost since their time is the most important thing, so Intel still wins on raw time saved and AMD will have work to do there before a lot of the real pros would move to Ryzen over HEDT. I'm not a pro and do videos for fun but I already bought into my HEDT more than a year ago now and Ryzen is mostly a downgrade for me so I'm staying put.
 
Looks like PC Perspective have posted a video discussing the Windows 10 scheduler article now: https://www.youtube.com/watch?v=6laL-_hiAK0
The short version is that they think their testing is valid, and that there is not an issue with the scheduler.
Between reviewers and owners, how many sources have you posted that either support or refute PC Perspective's data on this?



Hardware Unboxed have also posted a response to AdoredTV's “Ryzen - The Tech Press Loses The Plot” video, comparing the 2500K vs FX8370: https://www.youtube.com/watch?v=76-8-4qcpPo
I'm kind of shocked to see the 2500K at stock speeds beating the 1700X in Deus Ex: Mankind Divided, as that game scales well beyond 4 cores, and it's one of the games that my 2500K (at 4.5GHz) really struggles with.
You may be as equally surprised to see an i3 7350K (2c/4t) outperforming an i7 2700K (4c/8t), or an i5 6600K (4c/4t) outperforming an i7 5960X (8c/16t).

Multiple reviews have frametime and FPS numbers which don't align with what you mentioned about the 2500K. Extrapolate where needed.

https://techreport.com/review/31366/amd-ryzen-7-1800x-ryzen-7-1700x-and-ryzen-7-1700-cpus-reviewed/9

https://www.computerbase.de/2017-03/amd-ryzen-1800x-1700x-1700-test/4/

https://www.guru3d.com/articles-pages/amd-ryzen-7-1700x-review,20.html

http://www.techspot.com/review/1348-amd-ryzen-gaming-performance/page2.html
 

x3sphere

Member
For people who actually do real work on their monster PCs:
https://www.pugetsystems.com/labs/a...2017-AMD-Ryzen-7-1700X-1800X-Performance-909/

Ryzen seems super decent for Premiere Pro. I run my 5820K with a 4.3 OC but if I was staying at stock since most pros don't OC their mission-critical workstations, I would definitely have Ryzen under consideration. The thing is that most pros don't necessarily care that much about overall cost since their time is the most important thing, so Intel still wins on raw time saved and AMD will have work to do there before a lot of the real pros would move to Ryzen over HEDT. I'm not a pro and do videos for fun but I already bought into my HEDT more than a year ago now and Ryzen is mostly a downgrade for me so I'm staying put.

I have a 5930k and would probably go Ryzen if I was buying today. Most of the non gaming benches look impressive and I game at 4K so it'll be awhile before I'm CPU limited.

Having said that, I was on X58 previously and that lasted me 7 years. So my next upgrade is likely far off. I doubt I'll be looking at a new CPU until 2020 at the earliest. Here's hoping AMD continues to remain competitive.
 
anyone have any luck with amazon and their crosshair vi delivery updates? Mine has been sitting at "we need a little more time..." since release date :/

Anyone that has gotten their board care to give a quick impression? I've been hearing some pretty good things about Gigabyte and MSI, and with so much time to think about it, I've been debating switching vendors!
 

Paragon

Member
·feist·;231957127 said:
You may be as equally surprised to see an i3 7350K (2c/4t) outperforming an i7 2700K (4c/8t), or an i5 6600K (4c/4t) outperforming an i7 5960X (8c/16t).
I think it may just be the way that you phrased this, but I'm not sure what point you're trying to make.

·feist·;231957127 said:
Multiple reviews have frametime and FPS numbers which don't align with what you mentioned about the 2500K. Extrapolate where needed.
Obviously that's why I am surprised. It does not perform well on my 2500K at all.
I'm wondering if they just ran the game's benchmark tool rather than picking a section of gameplay to test, as the game's benchmark tool does not produce results which are applicable to gameplay in my experience.
 

Thraktor

Member
Not looking good for the "7700k killer"

Framing the quad-core Ryzens as "7700K killer" is just silly, they're going to cost around a third the price. They'll compete with i3 and i5 models, and stand to do reasonably well in that segment. The models they're talking about there are also the entry level parts (similar to R7 1700's position in the 8C lineup), so I'd expect stock clocks to hit 3.9/4.0GHz on the more expensive models, just like their 6 core and 8 core chips.
 

Paragon

Member
Why would a 4 core part clock lower than a 8 core part?
Lower-binned parts.
As I've been saying from the start, this ~4.1GHz limit on Ryzen (using safe voltages) is likely a process/architectural limitation instead of a thermal one - even though they do run hot.
 

Mahnmut

Member
So, is it good or not ?
It seems like there is no definitive answer to that question...
Maybe somebody can help ? I know it's not that good for gaming but will they improve that ?

Is it better to go Intel ? Wait a little bit to see how AMD will improve it ?
 

Datschge

Member
As I've been saying from the start, this ~4.1GHz limit on Ryzen (using safe voltages) is likely a process/architectural limitation instead of a thermal one - even though they do run hot.
Not likely, definitely. This is the voltage to frequency curve on Samsung's 14nm LPP Ryzen uses:
8Rch6JF.png
The two critical points are at 3.3 and 3.5GHz, everything above is losing consumption efficiency fast. Above 3.9GHz this is accelerating.

So the sweet spot for high frequencies with best possible performance/consumption ratio is 3.3GHz. And as the processing is optimized for low power we'll likely see very efficient usage of the chip for mobile and server parts at around 2.1GHz.

The Stilt tested a stock 1800X limited by a TDP of 30W:
https://forums.anandtech.com/threads/ryzen-strictly-technical.2500572/page-6#post-38774390
The Stilt said:
Regarding the ST: 1800X at default, with turbo & XFR enabled scores 162 in Cinebench 15. With the TDP (PPT) limited to 30W the score is 155.
So ignoring iGPU a 4c8t 15W+ mobile chip using turbo & XFR for temporary desktop grade frequencies on single cores seems feasible. I expect Ryzen to be more successful in this market as it plays more to the strengths of Samsung's 14nm LPP than the current lineup.

So, is it good or not ?
It seems like there is no definitive answer to that question...
Maybe somebody can help ? I know it's not that good for gaming but will they improve that ?

Is it better to go Intel ? Wait a little bit to see how AMD will improve it ?
For gaming 7700K is still the go to CPU, for productivity nothing beats Ryzen's value. What the Ryzen launch showed the most though is what a mess the handling of CPU topologies is under Windows 10 as well as Microsoft's non-communication regarding patches.
 
So, is it good or not ?
It seems like there is no definitive answer to that question...
Maybe somebody can help ? I know it's not that good for gaming but will they improve that ?

Is it better to go Intel ? Wait a little bit to see how AMD will improve it ?

It's a great chip that sees a huge IPC increase committed to Piledriver and puts AMD back into competition with Intel.

It does however excel at different workloads to Intel's chips and if raw gaming performance for the dollar is what you're after then stick with Intel.

If your workload is a mix of content creation and gaming and you don't mind overclocking then the Ryzen 7 1700 offers a lot for the money. I'd personally recommend waiting a little while until we have more extensive B350 motherboard reviews.

There is never a one size fits all best CPU, its always dependant on use case and budget.
 

dr_rus

Member
So if cannot be "fixed" (and fixed is a strange term since nothing is broken) why we are seeing regression between Win7 and Win10 if we compare the total war benchmarks?

And can you explain what perfomarnce will not be "fixed"? Are we seeing bad performance across the board in all benchmarks and tests? Because what im seeing is specific cases were performance is not inline with the rest of the tests. Or are we only relying on the bad results to its a bad cpu ignoring the rest of the cpu performance?

Win7 and Win10 schedulers are different and the difference may be due to several factors which can't actually be "fixed" in Win10, like the inability to use SMT in Win7 for example. The fundamental issue of NUMA 2xCCX connected via some fabric won't be "fixed" as it cannot be "fixed" as it's a feature of the CPU in question. Software may be programmed in a way which will minimize whatever losses this feature results in but honestly I don't think that many devs will bother - unless AMD will push Ryzen APU with the same 8 core layout into some console. Generally speaking this is a design flaw and fixing it by complex software optimization is completely counter productive for s/w devs especially as it most certainly won't be required in the future, on some Zen 2, etc.

What you are seeing is the difference in performance between massively parallel workloads which launch 1000s of threads with little to no thread synchronization (like video encoding or rendering) and limited parallel workloads which launch several heavy threads with constant synchronization between them (like modern games mostly). The second load type will hit the CCX cache snooping weakness of Ryzen 8 core and will produce worse results then the first one.
 

Mahnmut

Member
It's a great chip that sees a huge IPC increase committed to Piledriver and puts AMD back into competition with Intel.

It does however excel at different workloads to Intel's chips and if raw gaming performance for the dollar is what you're after then stick with Intel.

If your workload is a mix of content creation and gaming and you don't mind overclocking then the Ryzen 7 1700 offers a lot for the money. I'd personally recommend waiting a little while until we have more extensive B350 motherboard reviews.

There is never a one size fits all best CPU, its always dependant on use case and budget.

PC is gaming only, I use a Mbp for work so I guess it's a no brainer.
 

Datschge

Member
The fundamental issue of NUMA 2xCCX connected via some fabric won't be "fixed" as it cannot be "fixed" as it's a feature of the CPU in question.
That's mincing words. Stuff like that is for the OS to support, and that's where it does need fixing. Software running under any given OS is supposed to be able to rely on a current OS to not make a mess of it.
 
Not likely, definitely. This is the voltage to frequency curve on Samsung's 14nm LPP Ryzen uses:

The two critical points are at 3.3 and 3.5GHz, everything above is losing consumption efficiency fast. Above 3.9GHz this is accelerating.

So the sweet spot for high frequencies with best possible performance/consumption ratio is 3.3GHz. And as the processing is optimized for low power we'll likely see very efficient usage of the chip for mobile and server parts at around 2.1GHz.

The Stilt tested a stock 1800X limited by a TDP of 30W:
https://forums.anandtech.com/threads/ryzen-strictly-technical.2500572/page-6#post-38774390

So ignoring iGPU a 4c8t 15W+ mobile chip using turbo & XFR for temporary desktop grade frequencies on single cores seems feasible. I expect Ryzen to be more successful in this market as it plays more to the strengths of Samsung's 14nm LPP than the current lineup.


For gaming 7700K is still the go to CPU, for productivity nothing beats Ryzen's value. What the Ryzen launch showed the most though is what a mess the handling of CPU topologies is under Windows 10 as well as Microsoft's non-communication regarding patches.

Perfect process for a low frequency console chip
 

dr_rus

Member
That's mincing words. Stuff like that is for the OS to support, and that's where it does need fixing. Software running under any given OS is supposed to be able to rely on a current OS to not make a mess of it.

No, it's not "mincing words", and stuff like that (non uniform memory access latencies) is certainly not for OS to support as OS can't decide whether some s/w needs a cross CCX snoop or not and it most certainly can't block it. All OS can realistically do here to avoid this is limit Ryzen's accessible core count to 4 - which is hardly a solution to the problem. Whatever s/w side optimizations can help here they should most certainly happen on the programs side and not the OS side.
 

Datschge

Member
No, it's not "mincing words", and stuff like that (non uniform memory access latencies) is certainly not for OS to support as OS can't decide whether some s/w needs a cross CCX snoop or not and it most certainly can't block it. All OS can realistically do here to avoid this is limit Ryzen's accessible core count to 4 - which is hardly a solution to the problem. Whatever s/w side optimizations can help here they should most certainly happen on the programs side and not the OS side.
Sure, if you (and Microsoft) still pertain to the illusion that the perfect OS is a DOS like OS that does the bare minimum to abstract the underlying hardware. Just bad that Windows 10 is not even doing that but working with a bunch of inane assumptions that already meddle a lot with the things you say non-OS s/w should optimize for. So on the top of optimizing the s/w will need to include workarounds for crap OS behavior.
 

dr_rus

Member
Sure, if you (and Microsoft) still pertain to the illusion that the perfect OS is a DOS like OS that does the bare minimum to abstract the underlying hardware. Just bad that Windows 10 is not even doing that but working with a bunch of inane assumptions that already meddle a lot with the things you say non-OS s/w should optimize for. So on the top of optimizing the s/w will need to include workarounds for crap OS behavior.

Again, this has nothing to do with abstracting anything and everything to do with what the underlying h/w in question is capable of. You cannot "fix" h/w flaws with s/w, only work around them. And in this case this isn't something which OS would be able to do.
 

Thraktor

Member
Win7 and Win10 schedulers are different and the difference may be due to several factors which can't actually be "fixed" in Win10, like the inability to use SMT in Win7 for example. The fundamental issue of NUMA 2xCCX connected via some fabric won't be "fixed" as it cannot be "fixed" as it's a feature of the CPU in question. Software may be programmed in a way which will minimize whatever losses this feature results in but honestly I don't think that many devs will bother - unless AMD will push Ryzen APU with the same 8 core layout into some console. Generally speaking this is a design flaw and fixing it by complex software optimization is completely counter productive for s/w devs especially as it most certainly won't be required in the future, on some Zen 2, etc.

What you are seeing is the difference in performance between massively parallel workloads which launch 1000s of threads with little to no thread synchronization (like video encoding or rendering) and limited parallel workloads which launch several heavy threads with constant synchronization between them (like modern games mostly). The second load type will hit the CCX cache snooping weakness of Ryzen 8 core and will produce worse results then the first one.

Regarding the bolded, both PS4 and XBO's CPUs already operate under pretty much exactly the same paradigm as Ryzen (ie two quad core clusters "connected via some fabric"). Granted the development scenario for them is quite different than for a Windows PC, though.

In any case, I don't believe that CCX cache snooping is the main issue here (at least for gaming). Windows 10 frequently migrating threads between clusters seems like the most probable explanation for the bimodality were seeing in frame time histograms, and that's both something which could be fixed within the Windows kernel and a change which would have a large impact on performance.
 

Datschge

Member
Again, this has nothing to do with abstracting anything and everything to do with what the underlying h/w in question is capable of. You cannot "fix" h/w flaws with s/w, only work around them. And in this case this isn't something which OS would be able to do.
WFT? Either you don't know what you are talking about or you are defending the indefensible.

It's not a h/w flaw when the OS is playing blind man and rapidly moving threads around completely disregarding the CPU topology.
https://forums.anandtech.com/threads/ryzen-strictly-technical.2500572/page-26#post-38791096
And parking cores using assumptions that don't reflect CPU usage of individual threads.
https://forums.anandtech.com/threads/ryzen-strictly-technical.2500572/page-22#post-38790080
 

ethomaz

Banned
I think there are a lot of assumptions right now.

Tests from some guys are showing weird distribution of threads in Windows 10 mainly with the use of SMT.

What is causing that? Software, hardware, game optimization, etc... nobody really know... I just read a lot of guesses and possible causes.

I think we will have some answer soon but right now it is impossible to say what is really causing the issue.
 

Datschge

Member
Tests from some guys are showing weird distribution of threads in Windows 10 mainly with the use of SMT.

What is causing that? Software, hardware, game optimization, etc... nobody really know... I just read a lot of guesses and possible causes.
Hardware can't distribute threads on its own, that's the job of the OS (or with band aids the software running under it).
 
So, is it good or not ?
It seems like there is no definitive answer to that question...
Maybe somebody can help ? I know it's not that good for gaming but will they improve that ?

Is it better to go Intel ? Wait a little bit to see how AMD will improve it ?

Couldn't put it better than Datschge:

For gaming 7700K is still the go to CPU, for productivity nothing beats Ryzen's value. What the Ryzen launch showed the most though is what a mess the handling of CPU topologies is under Windows 10 as well as Microsoft's non-communication regarding patches.
As someone who's just bought a Ryzen CPU for productivity (rendering, digital art and visual effects + the possibility of live streaming work) + some 1080p gaming with a mid-range card (RX 480), the 1700 made too much sense given the price.
 

dr_rus

Member
Regarding the bolded, both PS4 and XBO's CPUs already operate under pretty much exactly the same paradigm as Ryzen (ie two quad core clusters "connected via some fabric"). Granted the development scenario for them is quite different than for a Windows PC, though.
That's true to a degree, but Jaguars are significantly different here, they don't have L3 and their L2s are connected via a dedicated bus (as opposed to "some fabric") so it's hard to say without benchmarking that they even experience the same issue at all.

In any case, I don't believe that CCX cache snooping is the main issue here (at least for gaming). Windows 10 frequently migrating threads between clusters seems like the most probable explanation for the bimodality were seeing in frame time histograms, and that's both something which could be fixed within the Windows kernel and a change which would have a large impact on performance.
So, a) thread migration leading to performance issues is the direct result of cross CCX cache snooping latency. There would be no performance loss without this issue due to thread migration. And b) if that's the case, AMD can easily push a new CPU driver via Windows Update which will prevent thread migration - like Intel's TBT 3.0 Max is doing on BWE CPUs for example. No need to do anything with Windows scheduler.

WFT? Either you don't know what you are talking about or you are defending the indefensible.

It's not a h/w flaw when the OS is playing blind man and rapidly moving threads around completely disregarding the CPU topology.
https://forums.anandtech.com/threads/ryzen-strictly-technical.2500572/page-26#post-38791096
And parking cores using assumptions that don't reflect CPU usage of individual threads.
https://forums.anandtech.com/threads/ryzen-strictly-technical.2500572/page-22#post-38790080
Or I do know what I'm talking about and you just can't tell what is what.
 

Datschge

Member
So, a) thread migration leading to performance issues is the direct result of cross CCX cache snooping latency. There would be no performance loss without this issue due to thread migration. And b) if that's the case, AMD can easily push a new CPU driver via Windows Update which will prevent thread migration - like Intel's TBT 3.0 Max is doing on BWE CPUs for example. No need to do anything with Windows scheduler.
Oh, so it's "the Windows scheduler is fine, just replace its broken behavior with a CPU driver"? Alright.
 

Thraktor

Member
That's true to a degree, but Jaguars are significantly different here, they don't have L3 and their L2s are connected via a dedicated bus (as opposed to "some fabric") so it's hard to say without benchmarking that they even experience the same issue at all.

There's certainly a difference, but I have no doubt that there is some benefit to keeping communicative threads on the same cluster in PS4/XBO, even if it's marginal compared to Ryzen. However my point was that the existence of a similar CPU config in consoles shouldn't affect developers' behaviour. Setting core affinity for each thread is already a sensible thing to do on a console, but is very unlikely to become common in PC game development.

So, a) thread migration leading to performance issues is the direct result of cross CCX cache snooping latency. There would be no performance loss without this issue due to thread migration. And b) if that's the case, AMD can easily push a new CPU driver via Windows Update which will prevent thread migration - like Intel's TBT 3.0 Max is doing on BWE CPUs for example. No need to do anything with Windows scheduler.

Turbo Boost 3.0 is very different from what we're talking about. It only affects single-threaded applications on a CPU where an individual core is boosted, and just manually sets affinity for that thread to that single core. With Ryzen we're looking at an arbitrary number of threads over an identically-clocked multicore CPU, and the problem isn't that threads shouldn't ever migrate between cores (they clearly should when a given core is overworked), but rather that threads are migrated too often and in an inefficient manner, as described in Datschge's links.

To consider just how often Windows 10 is migrating threads between clusters, have a look at this frame time histogram (source) for Crysis 3 on Ryzen:

cry3-1800x-normalized2.png


This shows a bimodal distribution, which indicates to us (as we're expecting a log-normal, or approximately normal distribution) that there are actually two overlapping distributions here, separated by an event either occurring or not occurring in a given frame, where that event delays the frame by a little over 1ms.

If we simplify slightly by assuming each underlying distribution is normal with the same standard deviation (a reasonable assumption given what we know), then we can roughly approximate the total proportion of frames in which the event occurs by comparing the two peaks in the histogram (the left being the mode of the distribution where the event occurs). This would indicate to us that the event occurs in about 60% of frames.

If we consider that the average frame rate is 127 FPS (from the link above), then each frame on average lasts 7.9ms. If the event occurs in 60% of frames with an average frame time of 7.9ms, then the average time between the event occurring is a little over 13ms.

Now, what happens on a consumer Windows PC once every 13ms or so? Why, Windows thread scheduler's clock interrupt cycle, of course! Absent any interrupts or creation of higher priority threads, the Windows thread scheduler will check every thread once every 10-15ms (it varies a bit depending on hardware) to either context switch to a waiting thread and/or migrate the thread to a different core. A regular performance hiccup every 13ms, as we see in the Crysis 3 test above, is not just consistent with the Windows 10 thread scheduler migrating threads once every so often, it's consistent with the Windows 10 thread scheduler migrating threads between clusters at literally every opportunity. According to what we're seeing in the Crysis 3 data, at every single thread scheduler interrupt Windows is causing a ~1ms performance drop, presumably by moving high priority threads between the two clusters.

This performance loss is far from insignificant. Removing it would push average FPS up by about 10, to above all other CPUs in Tech Report's tests, and would push 99th percentile frame times from 12.5ms to about 11.4ms, which would put it almost neck-and-neck with the 7700K. Time spent beyond 8.3ms would also improve significantly, although it's much more difficult to judge the extent without access to the full data.

Even absent of Ryzen's particular core configuration, there's no reason for Windows to migrate high-priority threads every single chance it gets. There's always a cost to thread migration, and that cost will only go up as core counts increase. It seems Microsoft knows this, as Windows Server variants of the thread scheduler leave threads for 6 times as long between checks. They've also reportedly identified reducing thread migration as part of the "Game Mode" feature to be added to Windows 10 (although their recent GDC talk on the issue isn't online yet, so I don't have a link for that).
 

dr_rus

Member
Cool, thanks, so in theory it should produce the same problems with snoops across the modules.

Turbo Boost 3.0 is very different from what we're talking about. It only affects single-threaded applications on a CPU where an individual core is boosted, and just manually sets affinity for that thread to that single core.
No, it doesn't affect only single threaded applications, it affects all of them and does two things: 1) tries to load the fastest core to 100% all the time (I've ran into this in WD2 recently actually, where with TBT3 enabled c1 of my 6850K is always at 100% and with it disabled the load is about equal between all 12 threads); 2) affinitize all heavy threads so that they won't jump cores in the process of execution:

m4pc.png


With Ryzen we're looking at an arbitrary number of threads over an identically-clocked multicore CPU, and the problem isn't that threads shouldn't ever migrate between cores (they clearly should when a given core is overworked), but rather that threads are migrated too often and in an inefficient manner, as described in Datschge's links.
PCPer tests show that this isn't the case, threads are assigned and migrated as on any other multicore CPU with SMT. The issue arise when a thread is being migrated across CCX but this isn't an issue of OS scheduler - as thread migration is pretty normal, it's an issue of the h/w architecture and there are two options of dealing with this: A) don't let threads go to a second CCX essentially turning Ryzen 7 into a quad core (some unrelated OS work can run on the second CCX in parallel I guess), B) program the s/w in such way that it won't inflict a (large) performance hit in case of such migration.

Even absent of Ryzen's particular core configuration, there's no reason for Windows to migrate high-priority threads every single chance it gets.
Again, this is a h/w issue and as such it should be "fixed" by the IHV s/w, in this case - by AMD's CPU driver. As I've shown above, it's completely possible to affinitize all work and thus stop Windows scheduler from performing such migrations. If they are the issue of the performance loss then keeping threads running where they've started should provide some benefits at least.
 

Datschge

Member
Right now I have an i7 3770.

I'd like to see how the 1700, 1700X and 1800X compare to that.
Single core performance per Hz is very similar between i7 3770 and 7700K so essentially the same rule as with the 7700k applies: Keep it for gaming, Ryzen is good value for productivity. Or you didn't forget the K but want to start overclocking now? Then the choice is between less cores but ready to go peak performance in games (7700K) and more cores and great value right away in embarrassingly parallel workloads (R7 1700). Or you wait for more optimization and Ryzen models with fewer cores (upcoming R5 and R3 models) which may or may not go more head to head with 7700K, though air cooled overclocking potential likely won't ever go above 4.1GHz with any Ryzen model.

Again, this is a h/w issue and as such it should be "fixed" by the IHV s/w, in this case - by AMD's CPU driver. As I've shown above, it's completely possible to affinitize all work and thus stop Windows scheduler from performing such migrations. If they are the issue of the performance loss then keeping threads running where they've started should provide some benefits at least.
You can call it a hardware "issue" as much as you want, the fact that stock Windows 10 isn't able to handle this in a sane way in this age and time is plain embarrassing, and relying on 3rd party software to resolve that is very typical for the Windows ecosystem and doesn't speak well of the interoperability of the different OS parts as well as Microsoft's foresight in supporting different hardware topologies.
 

ethomaz

Banned
That is possible what is happening with Ryzen... the sync of L3 between different modules (4 core) is "190 cycles" while the direct access of it own module L3 is "20 cycles".

PS. Takes theses "x cycle" like an example and not actual number.
 

Datschge

Member
According to AMD there is nothing wrong with the Win10 thread scheduler.
If I were AMD and dependent on support from Microsoft for getting any notable consumer PC market share I wouldn't want to publicly call them out on their crap either.

Edit: Best summary I read tonight...
https://forums.anandtech.com/threads/ryzen-strictly-technical.2500572/page-26#post-38792168
innociv said:
A game that only runs 4-8 threads or even more should not necessarily need Ryzen optimizations if the scheduler would just run all those threads on one CCX and run other application(s) threads on the other.
The expectation that every developer needs to patch their games to manage threads to keep them all on one CCX if the threads are interdependent is a crazy expectation. What Windows should be doing by default is keeping an applications threads on the same CCX unless all 8 threads are overloaded or if the application specifically requests thread(s) to be managed on another CCX. Far more games will work like the former than the later!
Alas, with AMD's statement above I fully expect Microsoft to do zilch. Let's hope dr_rus is right and a CPU driver can be written to replace all the inane behavior.
 

dr_rus

Member
AMD Running out of Intel Sheckels, Renews Contract to Defame Own Products

"We have investigated reports alleging incorrect thread scheduling on the AMD Ryzen™ processor. Based on our findings, AMD believes that the Windows® 10 thread scheduler is operating properly for ”Zen," and we do not presently believe there is an issue with the scheduler adversely utilizing the logical and physical configurations of the architecture."

Shocking news.

You can call it a hardware "issue" as much as you want, the fact that stock Windows 10 isn't able to handle this in a sane way in this age and time is plain embarrassing, and relying on 3rd party software to resolve that is very typical for the Windows ecosystem and doesn't speak well of the interoperability of the different OS parts as well as Microsoft's foresight in supporting different hardware topologies.

I call it a h/w issue because it's a h/w issue, one which Intel "fixed" back when it switched from Penryn to Nehalem and one which AMD will most certainly fix in future versions of Zen architecture. There is no other way around it, it will always affect Zen's performance.

Actually, should be pretty easy to test the impact of this issue by disabling one CCX completely and comparing this to a CPU with 2+2 configuration. So far I've seen only one benchmark of this (in PCGH's Ryzen review) and it have some interesting results.
 

dr_rus

Member
This website recently updated their results for more tests, surprised it wasn't posted here already: http://www.hardware.fr/articles/956-24/retour-sous-systeme-memoire-suite.html

Yep, this is what I'm talking about. Granted, it's unlikely to produce exact results for a 4+4 configuration (threads snooping into other CCX L3 can happen more often there and thus performance hit will probably be even higher), but it gives some insight into the performance hit Ryzen have because of the NUMA L3 cache configuration.

So we're probably looking at ~10% loss on average in gaming. 7-zip being faster isn't surprising either as a 2+2 config have twice the amount of L3 cache per core which will undoubtedly help in such applications as archiving/compressing with a dictionary.
 

Datschge

Member
I call it a h/w issue because it's a h/w issue, one which Intel "fixed" back when it switched from Penryn to Nehalem and one which AMD will most certainly fix in future versions of Zen architecture. There is no other way around it, it will always affect Zen's performance.
In a dumb OS, right. Somebody was able to reproduce the inane behavior of Windows moving threads around in the most stupid ways with respectively trashed overall performance on a four socket Opteron system. I guess that's also a hardware issue.
 
If I were AMD and dependent on support from Microsoft for getting any notable consumer PC market share I wouldn't want to publicly call them out on their crap either.

Edit: Best summary I read tonight...
https://forums.anandtech.com/threads/ryzen-strictly-technical.2500572/page-26#post-38792168

Alas, with AMD's statement above I fully expect Microsoft to do zilch. Let's hope dr_rus is right and a CPU driver can be written to replace all the inane behavior.
But running 8 threads in SMT on a single ccx could lead to performance degradation and resource contention?

There is a chance that forcing the 8 threads onto a single ccx is best but there is also the real possibility that splitting the threads across ccxs and having increased compute power will improve performance. The scheduler cannot know this ahead of time. Seems to me the choice is either SMT and resource contention or no SMT and ccx latency.
 

dr_rus

Member
In a dumb OS, right. Somebody was able to reproduce the inane behavior of Windows moving threads around in the most stupid ways with respectively trashed overall performance on a four socket Opteron system. I guess that's also a hardware issue.

In any OS. This is a result of the simplified design where two quad core modules have dedicated L3 caches and converge on the same data in system RAM in the worst case scenario. No OS will be able fix this, specifically optimized s/w may somewhat work around this, CPU manufacturer can produce a driver which will limit OS's thread migration - but this issue will never go away completely because it's an issue of h/w design.

AMD's choice of such approach for 6/8 core Zen is understandable - they'd need a completely different chip for 6 and 8 core CPUs otherwise - something which they don't have the financials to produce right now most likely. But the drawbacks associated with it were pretty clear back in Aug last year when AMD unveiled Zen's cache architecture.

The good thing here is that they'll most likely opt for a proper 8 core design for Zen 2 (I still wonder what Naples will be like but it'll probably be just 8 CCXs in a chip; should be rather interesting to benchmark in that case) and just upgrade the current 4 core Zen design with Zen 2 improvements which should be easier and thus doable for them in parallel with a new 8 core Zen 2 CPU.

The bad thing however is the timing of Zen 2 which some sources put on 2019.
 

Datschge

Member
In any OS.
Wrong. Linux is fully aware of Ryzen's topology (mostly since late last year already) and spreads the threads accordingly for best possible performance. Which is exactly why the situation in Windows is so completely ridiculous.

But running 8 threads in SMT on a single ccx could lead to performance degradation and resource contention?
Indeed, and that's the point a scheduler has to act. Not furiously spread threads around even when resource is not contented at all and cause a performance degradation through that like the Windows scheduler traditionally likes to do.
 

Engell

Member
I just don't get why they would design the Ryzen CPU like this (im no a cpu designer)
When looking at the die shot of Ryzen it looks like they could have easily mated all cores and cache together, nothing is linking to the sides of the Cores so blocking them together seems like an easier solution then splitting them... i just don't get it,.
Maybe Intel has some patent to connect more then 4 cores in one block, or maybe it is easier to get a working chip since it allow one side of the chip to be defective(and sell it as a 4 core).

if anybody know the reason to why they a split please fill in the gaps here.

for reference here is an Intel cpu
 

tuxfool

Banned
I just don't get why they would design the Ryzen CPU like this (im no a cpu designer)
When looking at the die shot of Ryzen it looks like they could have easily mated all cores and cache together, nothing is linking to the sides of the Cores so blocking them together seems like an easier solution then splitting them... i just don't get it,.
Maybe Intel has some patent to connect more then 4 cores in one block, or maybe it is easier to get a working chip since it allow one side of the chip to be defective(and sell it as a 4 core).

if anybody know the reason to why they a split please fill in the gaps here.

It is easier to design and to bin chips. You have one CCX of 4 cores, for 8 cores you double it, for 16 cores you quadruple it etc.

And then for the "uncore" stuff (I don't know the equivalent AMD nomenclature) all you have to do is to connect each complex with fabric. For Intel they pretty much have to make a new design for each class of CPU, it yields its own benefits but they have the resources to expend that extra time.

In the end, for most tasks other than gaming the advantages of Intel's strategy start to see diminished returns.
 

ethomaz

Banned
I just don't get why they would design the Ryzen CPU like this (im no a cpu designer)
When looking at the die shot of Ryzen it looks like they could have easily mated all cores and cache together, nothing is linking to the sides of the Cores so blocking them together seems like an easier solution then splitting them... i just don't get it,.
Maybe Intel has some patent to connect more then 4 cores in one block, or maybe it is easier to get a working chip since it allow one side of the chip to be defective(and sell it as a 4 core).

if anybody know the reason to why they a split please fill in the gaps here.


for reference here is an Intel cpu
Because it is easy and cheaper to make a 2 blocks of 4 cores than one block of 8 core... and it helps if you want to have a line up of 4 core, 8 core, 12 core, 16 core... all of them will use the same design.

If you choose to have one block for all core you have to make a unique design for each type of CPU (4 core, 8 core, 12 core, 16 core).

Intel did the same in the past with 2 blocks of 2 core before make something more complex with 1 block with all core.
 

Datschge

Member
In the end, for most tasks other than gaming the advantages of Intel's strategy start to see diminished returns.
Indeed. With 4 cores as CCX AMD found something of a one size fits all. A single CCX will fare well in laptops thanks to 14 LPP, and for servers the use of HBM will allow an unprecedented amount of cores. Only the corner case of two CCX in gaming usage on an unprepared OS makes it look worse than it is.
 

Engell

Member
Because it is easy and cheaper to make a 2 blocks of 4 cores than one block of 8 core... and it helps if you want to have a line up of 4 core, 8 core, 12 core, 16 core... all of them will use the same design..

Looks like Intel is doing the same with their xeon family when looking at their die shots.. would be fun to see if they have the same kind of latency when jumping to another stack of cores.
 
Top Bottom