• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

AMD Ryzen Thread: Affordable Core Act

thelastword

Banned
More benchmarks from Digital Foundry comparing the 1800X/7700K/6900K: https://www.youtube.com/watch?v=HZPr-gNWdvI

Some interesting results there, with the 7700K having a clear lead in some games and being tied or behind in others. It really seems like 4 cores is being pushed to its limit now.
Crysis 3 is interesting in that it seems to scale well with core count in some situations, but is still reliant on single-core performance too.

The SMT On/Off and Windows 7/10 comparisons are a bit concerning.
There seemed to be 10-15 FPS to be gained from it, but what performed best varied in each game.
At this point there's a myriad of issues which has to be corrected for Ryzen to do better in gaming, but it's current performance for such a heavy multicore chip is indeed impressive despite some nagging issues which will ultimately be fixed in the next month or so....

1.) Windows Scheduling Issues
2.) Bios and MB manufacturerer Issues
3.) Memory Limitation Issues

For what it's worth, at this point it makes no sense to be disabling SMT in a bios to get better performance in select titles or vice versa, you should be able to get best performance with SMT ON as it's supposed to be and that should not be a limitation to your fps count...So this is definitely something that will be addressed in firmware and Win 10

I think when the dust settles, the RYzen chip will prove superior not only in synthetic tests against the 7700k, but also in gaming........Surely we will see the uplift in performance when those niggles are ironed out, but for a heavy core chip that's clocked at 4GHz to be so close to the 7700k at 5GHz, a 1000Mhz divide where most games are catering for higher single core performance so far, I think Ryzens current performance in this worse case scenario is eye opening...Which leads me to believe that AMD does have the upperhand even in single core performance and I think most will see that when better Bios's, ram-speeds and windows updates roll out....Better overclocks with these updates will prove very interesting as well....

Just a quick observation. Crysis 3 obviously prefers more cores and correct me if I saw wrongly, but it seems to me that Crysis 3 is performing better on Ryzen with lower minimums over the 7700k as well....Which brings me to another point which so many have made from the time Ryzen launched, many have said their games play smoother on Ryzen, they may not have the highest average framerates and maximums that the 7700k has atm, but frametimes on those benches are really nice for Ryzen.

This video is the most thorough test I've seen on Ryzen with frametime graphs et al.......and I think he deserves more views for it tbh....Please note he is using 3200 MHz GSkill memory which is the best memory kit for Ryzen atm...

The 1800X in this video performs so closely to the 6900K that it makes the reactions to the gaming performance look absurd. And this is whilst using an overclocked Titan X at 1080p as well!

The May memory update will be very interesting. Will performance continue to increase with memory frequency? Will a 3600Mhz memory kit have appreciable gains over 3200Mhz?
Yes it will, based on Ryzens architecture. faster memory will boost performance significantly...perhaps moreso in Ryzens case than intel CPU's.

So far, lots of tests have been done with sub 3000Mhz memory configurations with Ryzen. Using Gskill 3200MHz and updated bioses have shown us better benchmarks, but there are still windows related and even api/dev related issues in certain games. I can only imagine when ryzen issues are taken care of, it will have a significant lead in Crysis 3 and Mafia 3 for example....On the flip, some games look like they will need to be patched for Ryzen, I'm looking at ROTTR and even Primal, these games love a strong CPU, especially TR, but they seem to be heavily behind intel in the benches. So the upcoming updates both on AMD, Windows and the board vendors side should prove really interesting......

More than anything, I do hope the R5's 1600x and 1500x can hit 4.5 GHz and eventually the R7's as well...
 

Steel

Banned
Übermatik;232450799 said:
I did it, I bought a motherboard. ASUS PRIME X370-PRO. Pray for me GAF - I hope those one star reviews don't bite me in the ass.

Good luck. I've got my b350 MSI tomahawk and memory in myself. Just figuring out which processor I want, though I'm leaning toward the 1700 because I do CAD animation reasonably often and that shit processes at a glacial rate on my current cpu.
 
JayzTwoCents put up a video where he attempted a comparison (not 100% a match outside of the CPUs) of the 1800X vs an i7-5960x in terms of video rendering, says he's gonna use the 1800X as his primary workbench for the next month:
https://www.youtube.com/watch?v=UIIb5uZfukU

Result: 1800X was about 2 and a half minutes slower than the 5960x in rendering, for a video that took around 20-25 minutes to render.
 
Ryzen core scaling.

Simulated 4c/8t —vs— 6c/12t —vs— 8c/16t using R7 1700X sample

RedGamingTech —— How Well Does Ryzen Performance Scale With Fewer Cores ?

ryzen-cores-rtgqwa2t.png



Simulated R3 1200X 4c/4t —vs— R5 1400X 4c/8t —vs— R5 1600X 6c/12t —vs— 8c/16t using R7 1800X sample (Includes i7 7700K & R7 1700X results)

LinusTechTips —— AMD RYZEN 5 AND 3 PREVIEWED!

 
My MSI Gaming Carbon X370 came in today.

So I have

1700X
32GB Gskill 3200MHz
MSI Gaming Carbon X370
Noctua D15s


I'll be reusing my

Corsair Air 540
RX 480 8GB
240GB Intel SSD
3TB Hitachi HDD
850W EVGA PSU

I've been working 12hr days 7 days a week since February 25th. Should have a day off this coming weekend... So may have a chance to build. Want to get some benchmarks with my 2500k at stock and 4.5GHz before I tear it down for comparison.
 

Datschge

Member
I'm not sure how that does anything but highlight the issue of the dual-CCX design though; anything which spawns more than four threads is going to end up split across CCXes, and performance could vary depending on which threads are split across the CCXes.
To be very clear: Due to the nature of the Windows scheduler all the gaming benchmarks potentially highlight the issue how threads are managed across over the two CCX modules. It's wholly about that thread management (or the lack thereof, the adaption of core parking to the task at hand, or lack thereof etc. etc.). And as Windows thread allocation is randomized in space and time with many curious outcomes there is no absolute path for fixes and improvement (aside the obvious ones like "manually apply affinity to everything", "fix the next silicon stepping", and "but the scheduler works as intended").

For example the tests I linked in post #1935 showed a lot of the issues may potentially be resolved by slowing down the scheduler, indicating the main crux is moving threads back and forth between the two CCX modules too quickly and oversaturating the IF that way (this can be alleviated by increasing RAM and thus IF clock, but it stays a highly inefficient behavior).
 

FingerBang

Member
·feist·;232474911 said:
Ryzen core scaling.

Simulated 4c/8t —vs— 6c/12t —vs— 8c/16t using R7 1700X sample

RedGamingTech —— How Well Does Ryzen Performance Scale With Fewer Cores ?

Simulated R3 1200X 4c/4t —vs— R5 1400X 4c/8t —vs— R5 1600X 6c/12t —vs— 8c/16t using R7 1800X sample (Includes i7 7700K & R7 1700X results)

LinusTechTips —— AMD RYZEN 5 AND 3 PREVIEWED!

Looking at this tests I think the 1600X will be next CPU... unless they find a way to squeeze more out of a 1700, but I don't think it's likely.
 

NeOak

Member
JayzTwoCents put up a video where he attempted a comparison (not 100% a match outside of the CPUs) of the 1800X vs an i7-5960x in terms of video rendering, says he's gonna use the 1800X as his primary workbench for the next month:
https://www.youtube.com/watch?v=UIIb5uZfukU

Result: 1800X was about 2 and a half minutes slower than the 5960x in rendering, for a video that took around 20-25 minutes to render.

For the price difference, the 1800X may be worth it.
 
For the price difference, the 1800X may be worth it.

Yep. The sort of money you'd save on the CPU in this instance while still not being that far behind is money that could go to bumping up just about every other aspect of the machine. Or putting it another way, you could go from no GPU whatsoever, to slotting in a GTX 1080.
 
The R5's will use the same motherboard? So I could buy a low end 4c/8t for a year and assuming they improve the next version of R7, just pop one of those in?
 

XiaNaphryz

LATIN, MATRIPEDICABUS, DO YOU SPEAK IT
Digital Trends - AMD found the root problem causing its new Ryzen processors to freeze desktops

AMD confirmed with Digital Trends on Monday that the company discovered why FMA3 code is causing system hangs on PCs using a new Ryzen desktop processor. Although AMD didn’t provide a detailed report on the problem’s root cause, the company said that BIOS changes will be distributed to motherboard manufacturers to resolve the issue. Customers are encouraged to keep an eye on their motherboard vendor’s website for an update.

“We are aware of select instances where FMA code can result in a system hang,” the company said. “We have identified the root cause.”

AMD released three Ryzen-branded desktop processors at the beginning of March that plug into motherboards based on AMD’s new AM4 socket. The trio of processors include the Ryzen 7 1800X, the Ryzen 7 1700X, and the Ryzen 7 1700. However, all three reportedly cause a hard system lock when running certain FMA3 workloads. The problem was replicated across all three processors and a variety of motherboards.
 

shandy706

Member
That would be the gloves.

Wait, he's wearing gloves after the yellow ones at the beginning?

(goes to get eyes checked)

Edit* Seriously though, our office internet SUCKS. The video at 480p doesn't look like he has gloves on when he's holding the CPU...his palms/fingers just look orange. Is he wearing gloves while talking? Switches to 720p...still can't see them..LOL.
 
Wait, he's wearing gloves after the yellow ones at the beginning?

(goes to get eyes checked)

Edit* Seriously though, our office internet SUCKS. The video at 480p doesn't look like he has gloves on when he's holding the CPU...his palms/fingers just look orange. Is he wearing gloves while talking? Switches to 720p...still can't see them..LOL.

You're right, he isn't wearing gloves.

...Allow me to clarify my phrasing. I believe those are stains on his hands that occur because of the gloves. I've had my hands coloured purple because of stuff like that.
 

Renekton

Member
AMD's biggest enemy right now is motherboard supply lol

For the price difference, the 1800X may be worth it.
This makes 1700 even more amazing value for productivity 😃 or multi-tasking. Not to mention the bundled Spire cooler can handle up to 3.9
 

Datschge

Member
Dota 2 was updated (likely CPU affinity stuff):
- Fixed the display of particles in the portrait window.
- Fixed Shadow Fiend's Demon Eater (Arcana) steaming while in the river.
- Fixed Juggernaut's Bladeform Legacy - Origins style hero icons for pre-game and the courier button.
- Improved threading configuration for AMD Ryzen processors.
- Workshop: Increased head slot minimum budget for several heroes.
http://store.steampowered.com/news/28296/

Players report gains of 20-25%.
 
I've still been trying to keep an eye out on comparisons of streaming with Intel and quicksync vs streaming with Ryzen (most interested in the 1700). I can't seem to find any actual benchmarks though. Anyone have any direction?
 
·feist·;232393735 said:
[...]

If I change the cooling again, there's a strong possibility I may change the motherboard as well (wanted to wait for "gen1.2/gen1.5" X370 models to be shown at Computex) to another X370.

[...]
That didn't last very long.

I've gotten rid of my motherboard earlier than expected and picked up an EK Predator 360 water cooler.

Former: Gigabyte Aorus AX370 Gaming 5

Current: Asus ROG Crosshair VI Hero


- Received the AM4 kit for my Noctua D15, but Cryorig still hasn't sent out their AM4 kits. Running Phanteks PH-TC14PE black/white which was a perfect match for the Gigabyte. Want to monitor temps and turbo response for few days at least before switching to EK 360 rad

- AFAIK, this C6Hero hasn't done a normal turbo not once... it constantly goes directly to 4.1GHz XFR on 1-4 cores simultaneously (mostly 2-3 cores at a time)

- C6H XFR-turbos more frequently and for more sustained time frames than the Gigabyte X370, which did normal turbo + XFR compared to the Hero's constant XFR

- Default BIOS fan config of the Gigabyte was dead silent to the point I would need to mute everything around me, sit still and place my ear right next to the case intake to hear slight wooshing of air. Asus default runs at a more normal "standard" fan curve/speed so it's certainly quiet but not dead silent like the Gigabyte in standard. This of course partially explains the greater XFR use on the Asus, but other board features account for the more active turbo as well

- 20c temp issue is present while in BIOS only; temps fully normal while in Windows with fans spinning at normal speed

- Board came with an older BIOS which had normal temp both in BIOS/UEFI and on the desktop. Somehow Asus later introduced the 20c issue in more recent BIOS updates. Looking forward to an update which reintroduces normal temp under BIOS as well as Windows as found in older BIOS

- Currently running a combo of 0902 and new 1001

- The power section is better than the Gigabyte. Most of the ~4.1GHz and ~4.2GHz clocks are seemingly being achieved on the Asus Hero and the top end ASRock Taichi + its twin X370 Professional mobo.

- IIRC, Asus = 12-phase power (4+2 x 2), ASRock Taichi/Fatality Pro = 16-phase (6+2 x 2)​



If I'm not mistaken, years ago when all the motherboard manufacturers began implementing UEFI, Gigabyte was the last major company still using the older BIOS setup. Across Intel and AMD platforms they just don't seem to be as dedicated to that aspect of the user experience as their competitors. ALL companies drop the ball on certain models, though.

As for AM4, MSI, ASRock and Asus have released BIOS updates and tweaks more frequently than Gigabyte. Biostar owners seem to believe they have the most stable and full-featured UEFIs with solid RAM compatibility, but I can't speak to that.

In fairness to Gigabyte, the Gaming 5 is a beautiful board that seems nicely built. It was one of the very first to resolve the 20c temp reading error and it mostly runs great without issue. The Gaming 5 is a good "set-it-and-forget-it" board. Sadly, if you like to tweak heavily, it isn't the best AM4 choice, and the stability found under Windows gives way to problems in the UEFI when you prod around too much.




My MSI Gaming Carbon X370 came in today.

So I have

1700X
32GB Gskill 3200MHz
MSI Gaming Carbon X370
Noctua D15s

[...]
Congrats. Post some impressions whenever you have a moment, this thread could use more of that.


Dota 2 was updated (likely CPU affinity stuff):

http://store.steampowered.com/news/28296/

Players report gains of 20-25%.
Imagine that...


I've still been trying to keep an eye out on comparisons of streaming with Intel and quicksync vs streaming with Ryzen (most interested in the 1700). I can't seem to find any actual benchmarks though. Anyone have any direction?
Not sure this is quite what you're looking for, but hope it can be of some help.

Stream and capture of Rise of the Tomb Raider.

AMD FX 8350 @4.4GHZ
Intel i7 7700K @4.5GHZ
AMD R7 1700 @3.9GHz​

The Hardware Hound —— AMD Ryzen 7 Game Stream and Capture Review
 
Really liking Hardware Unboxed setup for R5 simulation (lack of frame time testing not withstanding). Enthusiasts buying overclocked processors are going to run them overclocked so it makes sense to test at reasonable overclocked settings.

The drop from the 1800x to 1600x in games is negligible so AMD's price to performance ratio for gaming is almost doubled overnight at the high end.

I'm surprised at just how well the 1500x holds up, very competitive with the 7600k despite the huge gulf in clock speed. When you consider this chip will be going up against against an i5 with a locked 3ghz base clock then it is the clear winner in that segment. Throw in the Wraith Spire cooler and it's looking like a steal.
 

Steel

Banned
·feist·;232536361 said:
More simulated core scaling @ 1080p with GTX 1080 Ti, GTX 1070 and GTX 1060.


Written version: TechSpot —— Simulating AMD Ryzen 5 1600X, 1500X Gaming Performance

Video version: Hardware Unboxed —— 'Simulating' AMD Ryzen 5 1600X, 1500X Gaming Performance


R5 lineup for reference:

Interesting to see that the 1600x and even 1500x at 4 ghz outright beat the 7600k at 4.8 ghz in a lot of titles. It also seems like there are very few titles where the 1600x and 1500x would be significantly worse than the 1800x.
 
Interesting to see that the 1600x and even 1500x at 4 ghz outright beat the 7600k at 4.8 ghz in a lot of titles. It also seems like there are very few titles where the 1600x and 1500x would be significantly worse than the 1800x.

It was interesting to see the quirk in a lot of tests where once a 1070 was introduced the Ryzen 5s started to pull ahead of the i5 line due to the higher minimum framerates.

Frame time testing should be very interesting i5 vs. R5 as those results would indicate R5 may be delivering more consistent frame times then i5.

A 1060/1070 is going to be the real world setup for most R5/i5 gaming rigs, and R5 is beating i5 on those setups at launch.

I can easily see these being the gaming CPUs of choice for anyone that can't stretch to the 7700k. Once the R3 range is released only the G4560 and 7700k look like competitive gaming CPUs from Intel based on where these results are landing. That's a huge market segment for AMD to exploit where they've been ceding to Intel for years.
 

DonMigs85

Member
Very excited for R5, and I think Ryzen 2.0 could have significant enhancements. Wonder if Intel will have a significant IPC bump at all with Cannon/Coffee Lake.
 

Paragon

Member
·feist·;232534667 said:
Dota 2 was updated (likely CPU affinity stuff): http://store.steampowered.com/news/28296/
Players report gains of 20-25%.
Imagine that...
It's almost like this is exactly what some of us have been concerned about; where you're potentially losing a lot of performance if software is not specifically optimized for the CPU and ordering threads to minimize the performance impact of the dual-CCX design, because that hasn't mattered at all for multi-core CPUs until now.
That's not always going to be possible if you have more threads which need to communicate than there are cores on a CCX - though I'm sure there is always going to be the potential to minimize performance loss by manually organizing your threads even in that situation.

Wonder if Intel will have a significant IPC bump at all with Cannon/Coffee Lake.
It seems unlikely. But if the rumors are true, and they are introducing a 6-core model, that will probably be the first big leap in performance for their consumer CPUs in years - assuming it doesn't get introduced at a higher price.
With Ryzen costing what it does, I don't know how they could introduce it at a higher price than the 7700K is at now though. But even that would be considerably higher than the R5s.
 

Datschge

Member
It's almost like this is exactly what some of us have been concerned about; where you're potentially losing a lot of performance if software is not specifically optimized for the CPU and ordering threads to minimize the performance impact of the dual-CCX design, because that hasn't mattered at all for multi-core CPUs until now.
This is wrong, Intel just did a good job at designing the hardware around the suboptimal scheduler. There is a reason why Intel's ring bus is barely talked about compared to AMD's IF/CCX even though the former actually has double the latency in inter cores communication (80ns instead 40ns, whereas further apart cores still are at 80ns and inter CCX communication raises it to 140ns).
 

Darklor01

Might need to stop sniffing glue
I just read an article which eluded to the possibility of a Ryzen 10 core processor for about $700. If that winds up true, and the performance is there/fixed, wow... just wow. That's nuts.
 

Renekton

Member
I just read an article which eluded to the possibility of a Ryzen 10 core processor for about $700. If that winds up true, and the performance is there/fixed, wow... just wow. That's nuts.
The clockspeed is gonna be super low though, ya gonna use it for servers or workstations.
 

nubbe

Member
It would be interesting if AMD made an HEDT version of Naples.
But unless they refine the CCX and MCM, latencies could be insane and might perform worse in games than the 1500x

Will be an awesome production chip
 

Datschge

Member
It would be interesting if AMD made an HEDT version of Naples.
There is a current rumor of a 16c32t (so 4x CCX) Ryzen HEDT competitor running on a "X399" chipset that supposedly also resolves the IF/CCX latency issues seen so far. Big grain of salt.
 

Paragon

Member
This is wrong, Intel just did a good job at designing the hardware around the suboptimal scheduler. There is a reason why Intel's ring bus is barely talked about compared to AMD's IF/CCX even though the former actually has double the latency in inter cores communication (80ns instead 40ns, whereas further apart cores still are at 80ns and inter CCX communication raises it to 140ns).
We may have reached new heights of absurdity now.
You think that Intel is designing its hardware based on the Windows scheduler?
It's not possible that Intel's engineers decided that a more consistent - if higher latency - ring-bus design was preferable to separate core complexes?

People are putting far too much stock in the scheduler like changes are likely to result in significant performance improvements.
Preventing applications that spawn ≤4 threads from being split across CCXes, and preventing threads from jumping across CCXes unnecessarily is about all the scheduler can do as far as performance is concerned.

In a well-threaded application that spawns ≥8 threads, what changes do you suppose could be made to the scheduler to improve performance?
It's up to applications that spawn many threads to manage what tasks are being placed on which cores, and to try and prevent cross-CCX communication as much as possible.
That's not something the scheduler handles. Application developers have to implement those sorts of optimizations.

Furthermore, if you do think that Intel are designing their CPUs around the Windows scheduler, why do you think that AMD would choose not to?
 

Datschge

Member
We may have reached new heights of absurdity now.
You think that Intel is designing its hardware based on the Windows scheduler?
It's not possible that Intel's engineers decided that a more consistent - if higher latency - ring-bus design was preferable to separate core complexes?

People are putting far too much stock in the scheduler like changes are likely to result in significant performance improvements.
Preventing applications that spawn ≤4 threads from being split across CCXes, and preventing threads from jumping across CCXes unnecessarily is about all the scheduler can do as far as performance is concerned.

In a well-threaded application that spawns ≥8 threads, what changes do you suppose could be made to the scheduler to improve performance?
It's up to applications that spawn many threads to manage what tasks are being placed on which cores, and to try and prevent cross-CCX communication as much as possible.
That's not something the scheduler handles. Application developers have to implement those sorts of optimizations.

Furthermore, if you do think that Intel are designing their CPUs around the Windows scheduler, why do you think that AMD would choose not to?
Because AMD doesn't have the time and money to custom design every different number of core per CPU per market as Intel does since a decade now. CCX is is their one-size-fits-all solution that coupled with the 14LPP allows them to target everything from lower consumption laptop, over workstation to server markets with one single scalable 4 core module. As Intel does essentially polish designs for specific markets since a decade and Microsoft made its scheduler a barely moving target unlike schedulers both in the phone and the server market taking notes of and avoiding worst case scenarios for the scheduler of the desktop market should be natural. That said it's not like improving the scheduler wouldn't help Intel as well, we already had several benchmarks where oddities shown by Ryzen where also visible with Intel, just to a much lesser degree.

Honestly it's somewhat irritating that many people here pretend that scheduling (and by extension core parking) is something that shouldn't change and it's up to the applications to adapt to new hardware topologies. It's totally hilarious that Microsoft offloaded this job to such a degree that now the games are at fault for getting the number of cores (and as such threads) wrong when in the ideal case all applications should be able to rely on the OS to get correct information about the CPU topology without reinventing and reimplementing hardware detection again and again on application level. And with the OS having its hands in wanting to control hardware performance and power consumption it itself should also be in the best position to tell how to manage threads to ensure best possible performance at the lowest possible consumption. But no, it's all a half arsed mishmash where everything can effect everything but expectation turned to rather replace the hardware and update the software (good luck with both) instead going to the source of the problems.
 

Paragon

Member
Because AMD doesn't have the time and money to custom design every different number of core per CPU per market as Intel does since a decade now. CCX is is their one-size-fits-all solution that coupled with the 14LPP allows them to target everything from lower consumption laptop, over workstation to server markets with one single scalable 4 core module.
I agree that it's the smart move for them, especially with the way that they're binning the chips for lower-end models.

Honestly it's somewhat irritating that many people here pretend that scheduling (and by extension core parking) is something that shouldn't change and it's up to the applications to adapt to new hardware topologies. It's totally hilarious that Microsoft offloaded this job to such a degree that now the games are at fault for getting the number of cores (and as such threads) wrong when in the ideal case all applications should be able to rely on the OS to get correct information about the CPU topology without reinventing and reimplementing hardware detection again and again on application level. And with the OS having its hands in wanting to control hardware performance and power consumption it itself should also be in the best position to tell how to manage threads to ensure best possible performance at the lowest possible consumption. But no, it's all a half arsed mishmash where everything can effect everything but expectation turned to rather replace the hardware and update the software (good luck with both) instead going to the source of the problems.
A lot of this goes beyond just the scheduler.
What do you expect to happen with a "scheduler change" that could dramatically change performance from what it is now, beyond what I outlined in my previous post?
(keeping applications with ≤4 threads on a single CCX, and preventing threads from jumping across CCXes unnecessarily)
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
I agree that it's the smart move for them, especially with the way that they're binning the chips for lower-end models.

A lot of this goes beyond just the scheduler.
What do you expect to happen with a "scheduler change" that could dramatically change performance from what it is now, beyond what I outlined in my previous post?
(keeping applications with ≤4 threads on a single CCX, and preventing threads from jumping across CCXes unnecessarily)
Easier said than done.

While I agree with some points by both you and by Datschge, optimal topological scheduling is not a solved problem per se. Here's a rudimentary problem that demonstrates but one issue at hand:

You have 2 nodes of 4 cores each, you have 5 threads. Three of those are high-latency worker threads, the remaining 2 - low-latency io threads. The high-latency threads you can pin to some cores on a node an forget about. The 2 low-latency, threads, though, face the following dilemma: each time low-latency thread A is running (thus along with the workers exhausting a node), and thread B gets ready to run, the scheduler needs to decide whether:
1) it'd be better to wait for A to finish its low-latency work, and then run B on the same node
2) send B to the other node, thus paying whatever cache prices that might incur.

A winning strategy here suggests some very good understanding of the dataset access patterns of all involved threads by the scheduler. What if it's actually more beneficial to preempt a worker, let B do its job, and resume the worker?
 

Datschge

Member
While I agree with some points by both you and by Datschge, optimal topological scheduling is not a solved problem per se.
In my eyes the issue at hand is that Microsoft doesn't even bother to include topological information in its scheduler. For example all indications point to information about SMT not residing in the scheduler itself but being part of the core parking feature. And unlike the high speed scheduler core parking moves at a low frequency and appears to only consider the overall load of a core instead the demanded power from every single thread, leading to long delays between request for big performance and actual unparking of cores.

If that's the extend Microsoft want to handle different topologies of course further optimizations need to happen on application level. But there optimal topological scheduling will be even harder to achieve in a multitasking environment with the OS and every application having different competing views about the hardware topology they run on. So for perfect performance of an optimized application it better gets to be the only application concurrently running in corner cases, which runs counter to the purpose of existence for the OS.

For AMD this turned out to be a catch 22 as the overall all core performance hurts whenever core parking with its suboptimal reflection of the Ryzen topology is activated, that's why they suggested high performance mode for benchmarking. But this hurts single core performance which relies on the knowledge that some cores are not needed so a couple cores can benefit of the XFR turbo, the highest frequency boost a stock Ryzen can give. Core parking needs to be in balanced mode for this to work, but as is balanced mode cripples overall all core performance even when its requested...

Applications can get an easy performance boost by adapting threading configuration and affinity masks. But as you know by now it's my strong believe that it's up to the OS scheduler (including core parking etc.) to handle the big picture including hardware topology regardless of per application optimizations correctly.

A winning strategy here suggests some very good understanding of the dataset access patterns of all involved threads by the scheduler. What if it's actually more beneficial to preempt a worker, let B do its job, and resume the worker?
In the theoretical ideal case schedulers could note which threading configuration bring the best overall performance and stop deviating from it. Depending on learning aptitude this could even ensure good use of new unknown topologies without further adaptions as known fast paths would then always be preferred over known slow paths, and the scheduler would improve the more new "paths" it collects.
 

Datschge

Member
Thanks. I really want to see how quicksync factors into the overall comparison though. I'm sure someone will compare eventually.
Isn't Quick Sync technically a part of Intel's iGPU and the better comparison NVEnc on Nvidia GPUs and VCE on AMD GPUs respectively? Ryzen doesn't come with an iGPU after all so encoding is all done on the CPU while the aforementioned features use hardware acceleration which is always more efficient (but often less flexible).
 
Top Bottom