PaintTinJr
Member
/Summary
Regardless of CPU tier, gaming lows/choke points on CPU are either OC stability limits caused by too much power and heat because of workload(SSE & AVX2) at that clockspeed, or CPU throttling caused by firmware power limits or thermal(heat) limits because of workload(SSE & AVX2).
Many games are still single core clockspeed bound, so using AVX2 ratio offset to lower the AVX2 unit clockspeed - and reducing power draw and heat from AVX2 - independently of the main Core SSE clockspeed(and cache clock) can yield superior gaming performance by having more headroom to run the Core SSE clock faster in an OC, or manipulate Intel speedStep logic in your system's favour to allow the boost clock speed to be maintained on CPUs like the Core i5-10400.
/summary
The basis of this thread comes from synthetic results using 3D Mark's free Demo (TimeSpy) from Steam. But the theory for why the results improve should be sound and should equally apply in games.
Anyway, my test rig was my nephew's newly built budget gaming PC - I built and tuned for him, but certainly am not overclocking his brand new GPU, and am not able to overclock his CPU BCLK the small amount without a Z class chipset motherboard at £100 extra over his budget.
On the bios side the obvious alterations to enable Clever Access memory ( C.A.M., 4G, Resizable Bar) are done, despite the limits of the H470 chipset not supporting PCIe 4.0, so Rebar on PCIe 3.0 could be more beneficial. And the following setting have been set as advised in the bios to get more CPU performance as mobo marketing claims, effectively raising both the CPUs PowerLevel 1 (PL1 is the quoted CPU 65watt TDP as far as I know) and PL2 (the level when exceeded causes the Boost frequency of 4.3GHz to head towards the 2.9GHz base clock).
At this configuration point the CPU and case fan profiles haven't been altered from bios defaults, they've only been adjusted for them all to monitor the CPU temperature and respond. The system still benchmarked quite well through the TimeSpy CPU test at the end, but on trying to run Minecraft, the CPU fan reaches screaming sound levels when the fan goes north of ~65% which certainly isn't good for the fan, the noise or the temperature of the CPU, or its subsequent performance as the temperature is reactively controlled by the poorest fan in the system by all metrics and is too little to late.
Changing the fan profiles, so that the CPU_FAN2 rear fan runs at 100% all the time, and the front 3 fans run at 50% until the CPU temp reaches 65degs, and then they immediately move to 100%.
The CPU fan is set to 47% all the way to 62degs. The CPU fan speeds increases in shallow amounts and so only reaches 80% fans speed at 85degs – which it should never reach because the 4 case fans which are remarkably quiet at full speed will be fully engaged by the CPU reaching 65degs when its own fan is then running only at 50%, which again is still largely inaudible.
Running the benchmark again with the new fan profiles improved the CPU score again, slightly, but altering the AVX2 Ratio Offset manually yielded the best result.
For anyone wondering what AVX2 Ratio Offset is, it is the number (eg 29) to multiply by x100 and subtract from the CPU boost frequency (4,300MHz – 2,900MHz = 1,400MHz), to workout what frequency the (A)dvanced (V)ector E(x)tension (2) vector units should run at for doing largely sparse FMA (fused, multiply add) instruction on a single clock cycle maybe used in decompression or physics in games as it gained traction of doing on the Cell BE and Xenos in the PS3/360 generation.
Reading about AVX2 Ratio Offset on the internet in the context of CPU overclocking would give the exact opposite advice that I’m going to propose here. Most people overclocking with top tier motherboards and top tier CPUs – or at least K class CPUs – would say if they can’t set the ratio to 0(zero) so that the AVX2 units run stably at their CPU Core Streaming SIMD Extensions overclocked frequency or Core SSE frequency for short, then their overclock isn’t stable and needs lowered.
But this CPU and chipset are far from top tier, and in most software, games especially a higher clock frequency for SSE will yield higher performance than AVX2.
AVX2 processing generates lots of heat from drawing far more power because the vector units are doing 3 instructions per equivalent clock than their single instruction SSE counterpart gates. So to avoid hitting a lowly CPU's PL2 early, that causes the boost clock to fall towards the base clock, it stands to reason that you want to reduce the AVX2 clock down towards the optimal value that allows the Core SSE clock to stay highest for longer by power efficiency and pre-emptive cooling around 65degs, while still having enough clock cycles for the AVX2 processing that it doesn’t become a big bottleneck.
My first attempt was setting the AVX2 Ratio Offset value to 14, so that the AVX2 clock would be 2.9GHz, figuring that as the system hits the PL2 the whole chip AVX2 and Core SSE will both be dropped to the base clock anyway, so setting AVX2 at the base clock would be optimal. It did improve the score on my nephew’s chip, but I was able to do better. From a mathematical/physics point of view, I seem to remember that 1.2GHz is the optimal frequency to power efficiency for a parallel circuit IIRC, but it turned out 1.4GHz (AVX2 Ratio Offset of 29, consistently posted the best synthetic benchmark), and my theory for that it is either just the silicon lottery of the specific chip, or that AVX2 doing 3 times the instructions is the closet value to 1/3 of the Core SSE Boost clock (4300/3 = 1433MHz). Either way, I thought this was an interesting thing to test, especially as Intel treat the motherboard chipset and the CPU itself more like a Pentium Gold than the Core i3-ish i5 it is, and that maybe people running much higher LGA1200 setups than my nephew’s will find it also helps them overclock their SSE clock higher for better performance, or someone with a Z class motherboard and a Core i5-10400 CPU will get a better BCLK than the minor boost people report it can get.
Regardless of CPU tier, gaming lows/choke points on CPU are either OC stability limits caused by too much power and heat because of workload(SSE & AVX2) at that clockspeed, or CPU throttling caused by firmware power limits or thermal(heat) limits because of workload(SSE & AVX2).
Many games are still single core clockspeed bound, so using AVX2 ratio offset to lower the AVX2 unit clockspeed - and reducing power draw and heat from AVX2 - independently of the main Core SSE clockspeed(and cache clock) can yield superior gaming performance by having more headroom to run the Core SSE clock faster in an OC, or manipulate Intel speedStep logic in your system's favour to allow the boost clock speed to be maintained on CPUs like the Core i5-10400.
/summary
The basis of this thread comes from synthetic results using 3D Mark's free Demo (TimeSpy) from Steam. But the theory for why the results improve should be sound and should equally apply in games.
Anyway, my test rig was my nephew's newly built budget gaming PC - I built and tuned for him, but certainly am not overclocking his brand new GPU, and am not able to overclock his CPU BCLK the small amount without a Z class chipset motherboard at £100 extra over his budget.
In order of his £555 budget it comprises of:
MSI 4GB AMD RX6500XT GPU (~£160)
Intel Core i5-10400F CPU (~£95 including stock cooler)
AsRock H470m HDV/M2 motherboard (~£75) (RAM limited to 2900MHz. NO XMP, No CPU O/C, PCIe3.0 only)
Corsair Carbide Delta RGB ATX Mid case (~£70)
Corsair TX550m 550w PSU(~£55)
Kingston Fury Beast 1x16GB DDR4 RAM module (~£40)
Windows 10 Pro x86/x64 license (~£35) (upgraded freely to Win11 Pro x64)
KIOXIA EXCERIA(formerly Toshiba Storage) 480GB SATA3 SSD (~£20) (nvme is a later upgrade option)
1 to 3 splitter cable for 4/3PIN case fans (~£5)
https://uk.pcpartpicker.com/list/Lfpfwc
Everything is pretty standard in the hardware build, except the 3 RGB fans have no RGB lighting because £15 RGB hub was beyond the budget and the motherboard doesn't have a RGB or aRGB header, and all the 3 front case fans connect to CHASSIS_FAN1 header via the splitter cable, and the PSU fan points upwards into the case, so it is drawing case warm air down from the GPU's 2 fans and venting out the back, down at the bottom - effectively working in tandem with the 4th free case fan(CPU_FAN2) that is drawing warm case air over the CPU stock fan and venting out the back, at the top, with the three front fans drawing in air through the restricted vents on the front panel.
On the Windows config side, the Kioxia drive uses the manufacture software to enable 8GBs of overprovisioning - the size of Windows in RAM typically - and the Windows virtual memory setting has been manually changed to an initial size of 20GB(RAM+VRAM size) and 40GB Maximum (the advised 2.5x Physical RAM) and in power saving the high performance profile has been selected and that profile slightly tweaked to turn off all timers to shutdown or save power. In Windows Explore's FolderOptions->View the "Launch folder windows in a separate process" has been enabled, and the recycle bin has been capped at 2GB, too to limit background recycle bin activity for the O/S book-keeping its spare drive space.
MSI 4GB AMD RX6500XT GPU (~£160)
Intel Core i5-10400F CPU (~£95 including stock cooler)
AsRock H470m HDV/M2 motherboard (~£75) (RAM limited to 2900MHz. NO XMP, No CPU O/C, PCIe3.0 only)
Corsair Carbide Delta RGB ATX Mid case (~£70)
Corsair TX550m 550w PSU(~£55)
Kingston Fury Beast 1x16GB DDR4 RAM module (~£40)
Windows 10 Pro x86/x64 license (~£35) (upgraded freely to Win11 Pro x64)
KIOXIA EXCERIA(formerly Toshiba Storage) 480GB SATA3 SSD (~£20) (nvme is a later upgrade option)
1 to 3 splitter cable for 4/3PIN case fans (~£5)
https://uk.pcpartpicker.com/list/Lfpfwc
Everything is pretty standard in the hardware build, except the 3 RGB fans have no RGB lighting because £15 RGB hub was beyond the budget and the motherboard doesn't have a RGB or aRGB header, and all the 3 front case fans connect to CHASSIS_FAN1 header via the splitter cable, and the PSU fan points upwards into the case, so it is drawing case warm air down from the GPU's 2 fans and venting out the back, down at the bottom - effectively working in tandem with the 4th free case fan(CPU_FAN2) that is drawing warm case air over the CPU stock fan and venting out the back, at the top, with the three front fans drawing in air through the restricted vents on the front panel.
On the Windows config side, the Kioxia drive uses the manufacture software to enable 8GBs of overprovisioning - the size of Windows in RAM typically - and the Windows virtual memory setting has been manually changed to an initial size of 20GB(RAM+VRAM size) and 40GB Maximum (the advised 2.5x Physical RAM) and in power saving the high performance profile has been selected and that profile slightly tweaked to turn off all timers to shutdown or save power. In Windows Explore's FolderOptions->View the "Launch folder windows in a separate process" has been enabled, and the recycle bin has been capped at 2GB, too to limit background recycle bin activity for the O/S book-keeping its spare drive space.
On the bios side the obvious alterations to enable Clever Access memory ( C.A.M., 4G, Resizable Bar) are done, despite the limits of the H470 chipset not supporting PCIe 4.0, so Rebar on PCIe 3.0 could be more beneficial. And the following setting have been set as advised in the bios to get more CPU performance as mobo marketing claims, effectively raising both the CPUs PowerLevel 1 (PL1 is the quoted CPU 65watt TDP as far as I know) and PL2 (the level when exceeded causes the Boost frequency of 4.3GHz to head towards the 2.9GHz base clock).
AVX2 Ratio Offset: Auto
BCLK Spread Spectrum: 0%
BCLK Aware Adaptive Voltage: Enable
Boot Performance Mode: Battery
FCLK Frequency: 400Mhz
Ring to Core Ratio Offset: Enable
Intel SpeedStep Technology: Enable
Intel Turbo Boost Technology: Enable
Intel Speed Shift Technology: Enable
Intel Thermal Velocity Boost Voltage Optimizations: Enable
BCLK Spread Spectrum: 0%
BCLK Aware Adaptive Voltage: Enable
Boot Performance Mode: Battery
FCLK Frequency: 400Mhz
Ring to Core Ratio Offset: Enable
Intel SpeedStep Technology: Enable
Intel Turbo Boost Technology: Enable
Intel Speed Shift Technology: Enable
Intel Thermal Velocity Boost Voltage Optimizations: Enable
At this configuration point the CPU and case fan profiles haven't been altered from bios defaults, they've only been adjusted for them all to monitor the CPU temperature and respond. The system still benchmarked quite well through the TimeSpy CPU test at the end, but on trying to run Minecraft, the CPU fan reaches screaming sound levels when the fan goes north of ~65% which certainly isn't good for the fan, the noise or the temperature of the CPU, or its subsequent performance as the temperature is reactively controlled by the poorest fan in the system by all metrics and is too little to late.
Changing the fan profiles, so that the CPU_FAN2 rear fan runs at 100% all the time, and the front 3 fans run at 50% until the CPU temp reaches 65degs, and then they immediately move to 100%.
The CPU fan is set to 47% all the way to 62degs. The CPU fan speeds increases in shallow amounts and so only reaches 80% fans speed at 85degs – which it should never reach because the 4 case fans which are remarkably quiet at full speed will be fully engaged by the CPU reaching 65degs when its own fan is then running only at 50%, which again is still largely inaudible.
Running the benchmark again with the new fan profiles improved the CPU score again, slightly, but altering the AVX2 Ratio Offset manually yielded the best result.
For anyone wondering what AVX2 Ratio Offset is, it is the number (eg 29) to multiply by x100 and subtract from the CPU boost frequency (4,300MHz – 2,900MHz = 1,400MHz), to workout what frequency the (A)dvanced (V)ector E(x)tension (2) vector units should run at for doing largely sparse FMA (fused, multiply add) instruction on a single clock cycle maybe used in decompression or physics in games as it gained traction of doing on the Cell BE and Xenos in the PS3/360 generation.
Reading about AVX2 Ratio Offset on the internet in the context of CPU overclocking would give the exact opposite advice that I’m going to propose here. Most people overclocking with top tier motherboards and top tier CPUs – or at least K class CPUs – would say if they can’t set the ratio to 0(zero) so that the AVX2 units run stably at their CPU Core Streaming SIMD Extensions overclocked frequency or Core SSE frequency for short, then their overclock isn’t stable and needs lowered.
But this CPU and chipset are far from top tier, and in most software, games especially a higher clock frequency for SSE will yield higher performance than AVX2.
AVX2 processing generates lots of heat from drawing far more power because the vector units are doing 3 instructions per equivalent clock than their single instruction SSE counterpart gates. So to avoid hitting a lowly CPU's PL2 early, that causes the boost clock to fall towards the base clock, it stands to reason that you want to reduce the AVX2 clock down towards the optimal value that allows the Core SSE clock to stay highest for longer by power efficiency and pre-emptive cooling around 65degs, while still having enough clock cycles for the AVX2 processing that it doesn’t become a big bottleneck.
My first attempt was setting the AVX2 Ratio Offset value to 14, so that the AVX2 clock would be 2.9GHz, figuring that as the system hits the PL2 the whole chip AVX2 and Core SSE will both be dropped to the base clock anyway, so setting AVX2 at the base clock would be optimal. It did improve the score on my nephew’s chip, but I was able to do better. From a mathematical/physics point of view, I seem to remember that 1.2GHz is the optimal frequency to power efficiency for a parallel circuit IIRC, but it turned out 1.4GHz (AVX2 Ratio Offset of 29, consistently posted the best synthetic benchmark), and my theory for that it is either just the silicon lottery of the specific chip, or that AVX2 doing 3 times the instructions is the closet value to 1/3 of the Core SSE Boost clock (4300/3 = 1433MHz). Either way, I thought this was an interesting thing to test, especially as Intel treat the motherboard chipset and the CPU itself more like a Pentium Gold than the Core i3-ish i5 it is, and that maybe people running much higher LGA1200 setups than my nephew’s will find it also helps them overclock their SSE clock higher for better performance, or someone with a Z class motherboard and a Core i5-10400 CPU will get a better BCLK than the minor boost people report it can get.
Last edited: