AMD RyZen CPU Architecture for 2017

I'd like to see the same thing with a gaming workload (or 10) but Windows 10's scheduler does not seem to be the culprit:

https://www.pcper.com/reviews/Processors/AMD-Ryzen-and-Windows-10-Scheduler-No-Silver-Bullet

Cross-referenced with some ping-pong latency measurements from before, and it seems that Zen is measurably slower than its prior cores but competitive with Intel concerning in-module latency, and as slow or slower than it's ever been when it comes to transfers crossing the uncore.

https://forum.beyond3d.com/threads/amd-bulldozer-core-patent-diagrams.45981/page-53
 
Here a (extremely short) test of memory speed impact:

http://www.legitreviews.com/ddr4-me...tform-best-memory-kit-amd-ryzen-cpus_192259/4

'in the system we saw a jump in performance between DDR4-2133 and DDR4-3200 by an impressive 16%. The AMD Ryzen 7 1700 processor overclocked up to 4 GHz certainly likes the memory bandwidth for 1080P gaming. The bad news is that by the time you read 2560 x 1440 (1440P), the system is more GPU botttlenecked, so memory clock speed didn’t impact performance at all.'

and another short test of performance degradation between CCXs

https://forums.anandtech.com/threads/ryzen-strictly-technical.2500572/page-21#post-38789965
 
Interesting! I guess someone needs to take a deeper look at games, then. I wonder whether using a single CCX is beneficial purely due to the reduced inter-thread latency, or because it allows for higher turbo clocks (thanks to the second CCX's ability to turn off).
 
Interesting! I guess someone needs to take a deeper look at games, then. I wonder whether using a single CCX is beneficial purely due to the reduced inter-thread latency, or because it allows for higher turbo clocks (thanks to the second CCX's ability to turn off).
If you read the links I posted you would notice a 20%~ improvement in drawcalls using one CCX than both.

Also this means one should not make statements like this without being completely sure and with extensive testing.

One thing that I don't understand is the number doesn't really add: If we take all this "bugs"(game code, windows esch, etc) and add all that %s we would end up with a cpu on pair with the 7700k(or am i wrong?) and that just doesn't add since the difference in IPC and frequency are just too high.
 
If you read the links I posted you would notice a 20%~ improvement in drawcalls using one CCX than both.

Also this means one should not make statements like this without being completely sure and with extensive testing.

One thing that I don't understand is the number doesn't really add: If we take all this "bugs"(game code, windows esch, etc) and add all that %s we would end up with a cpu on pair with the 7700k(or am i wrong?) and that just doesn't add since the difference in IPC and frequency are just too high.
I have not seen a way to consistently measure these percentages from different sources, many of which have mechanisms we may have a very limited awareness of. The percentages are fuzzy and come from scattered tests and different situations, and many of them are due to the same factors and would be double-counted to some extent if combined.

Also, it's not all negative. Hopping CCXs in the "bugged" case doubles the number of cores and L3, which has a significant benefit even if masked by the other issues.
 
I have not seen a way to consistently measure these percentages from different sources, many of which have mechanisms we may have a very limited awareness of. The percentages are fuzzy and come from scattered tests and different situations, and many of them are due to the same factors and would be double-counted to some extent if combined.

Also, it's not all negative. Hopping CCXs in the "bugged" case doubles the number of cores and L3, which has a significant benefit even if masked by the other issues.
Yes but if we take for example the improvements from faster ddr and scheduler is like from 10 to 15 or even 20 in idea situation? That with the context that the 7700 is just about from 7 to 15% faster is just too good to be true for amd.

This is sure the most funny launch in a long long time [emoji23] [emoji23] [emoji23]

Enviado desde mi HTC One mediante Tapatalk
 
Interesting! I guess someone needs to take a deeper look at games, then. I wonder whether using a single CCX is beneficial purely due to the reduced inter-thread latency, or because it allows for higher turbo clocks (thanks to the second CCX's ability to turn off).
The largest single drop for the 1800X's clocks from AMD's turbo and XFR slide is when the active core count goes beyond 2, rather than CCX count (300 MHz outright and up to 100 MHz from the upper range of XFR).
AMD's clocking solution seems rather inflexible in this regard, and one area where it seems to be hurt by Windows is with any twitchy behavior when it comes to parking cores. It apparently cannot handle having additional cores being powered before losing its upper clock bins, although chips are pushing the physical limit so much that this is quibbling over 10% of the max clock. After that, Ryzen is apparently stuck scraping what it can between 3.6 and 3.7 GHz.
Perhaps game developers can provide insight into how much it's worth trying to fight for this if a game is able to scale its demands above 2 cores.

The CCX issue is an interesting one, although seemingly dependent on the level of inter-thread communication and the exact flow.
Perhaps if there's some way of imposing a hierarchical flow or a natural trend towards parallel reduction, one could reduce the impact of cross-CCX communication by subdividing threads into two pools, with threads feeding their results upwards in the hierarchy, until there's one coalescing thread per CCX, and then they coordinate or feed into a master resident on one of them. It would seem that one core's traffic between CCXs would be survivable.
However, that would require sufficient control over where threads are going, or a natural way for threads to quickly identify where they are relative to the CCXs and each other without a lot of overhead. The more rigid GPU thread assignments, convenient ID calcuations, and limited context switching are one environment where a thread can readily determine where it is in the grand scheme of things.

Consoles can do more to assign threads, and they seem to give their threads more time in general. Naughty Dog's presentation on their job system does show that they aren't totally immune to context switches, although it seems to be more occasional and it doesn't leave the overall environment a mystery.
If there were to be a benefit to a Windows 10 game mode, something like helping with this wouldn't hurt. Whether it's worth fighting Zen's topology for the games that don't hit the sore spot too much, particularly when most Zen chips in the client market will probably be single-CCX APUs is a question I suppose we'll see answered in the future.

I am somewhat disconcerted by some of the numbers for Ryzen's interconnect so far. I figured there would be an effect, but seeing latencies reminiscent of AMD's unimpressive multi-module APUs is leading me to question if that uncore has changed, or what exactly they replaced it with.
The memory latency figures are similarly unimpressive, and apparently subject to something of a multisocket penalty in the same die.
I read Ars Technica's Naples article and read a comparison of its direct-attach strategy for GPUs being like NVLink, which would have been a value-add for Ryzen once Vega came out. I may have been reading that too optimistically, however.

I actually can think of some server loads that wouldn't mind Ryzen's structure or inflexible turbo, or would like the scalability, power, and density something like this (or Naples) enables. Perhaps we'll see more details when Naples launches, but for all the things that were changed the interconnect and cache protocol seem a little too reminiscent of its predecessors.
 
Last edited:
Apparently Ryzen memory controller seems to be based on the same IP as Bristol Ridge block. The Stilt checked various registers and they are identical to latest APU from AMD.

Also if it helps, The Stilt described some of the clock domains for memory and data fabrics in his post over at Anand:

UCLK, FCLK & DFICLK default to half of the effective MEMCLK frequency (i.e. DDR-2400 = 1200MHz).
There is a way to configure the memory controller (UCLK) for 1:1 rate, however that is strictly for debug and therefore completely untested. The end-user has neither the knowledge or the hardware to change it.
AFAIK FCLK & DFICLK are both fixed and cannot be tampered with. However certain related fabrics, which run at the same speed have their own frequency control. The "infinity fabric" (GMI) runs at 4x FCLK frequency.
 
'Finally, we have reviewed the limited available evidence concerning performance deltas between Windows® 7 and Windows® 10 on the AMD Ryzen™ CPU. We do not believe there is an issue with scheduling differences between the two versions of Windows. Any differences in performance can be more likely attributed to software architecture differences between these OSes.'

I don't understand. AMD said there is no problem in W10 yet they say that the 10% improvement we see using windows 7 is "software architecture differences between these OSes'
 
'Finally, we have reviewed the limited available evidence concerning performance deltas between Windows® 7 and Windows® 10 on the AMD Ryzen™ CPU. We do not believe there is an issue with scheduling differences between the two versions of Windows. Any differences in performance can be more likely attributed to software architecture differences between these OSes.'

I don't understand. AMD said there is no problem in W10 yet they say that the 10% improvement we see using windows 7 is "software architecture differences between these OSes'

I do not believe it is a flat 10%, and the statement indicates they do not believe there is an issue with scheduling differences. There seem to be some oddities that would be suggestive of at least some scheduling impact, but AMD would be motivated to be diplomatic about what it says concerning the people writing any possible scheduling fixes.

Taken at face value, it may be stating there are other layers to the OS that matter besides the scheduler. Core parking is not the same routine as the scheduler, for example, and neither would system functions, drivers, or synchronization routines. Windows 10 did show a tendency to underperform Windows 7 when it launched, and perhaps the work done to improve this on known platforms doesn't map the same way to a new CPU.
 
I do not believe it is a flat 10%, and the statement indicates they do not believe there is an issue with scheduling differences. There seem to be some oddities that would be suggestive of at least some scheduling impact, but AMD would be motivated to be diplomatic about what it says concerning the people writing any possible scheduling fixes.

Taken at face value, it may be stating there are other layers to the OS that matter besides the scheduler. Core parking is not the same routine as the scheduler, for example, and neither would system functions, drivers, or synchronization routines. Windows 10 did show a tendency to underperform Windows 7 when it launched, and perhaps the work done to improve this on known platforms doesn't map the same way to a new CPU.

It is possible because there is a difference in performance, not in every single situation but there is nevertheless. I think AMD could right that article more marketing friendly tho. In any way the test done by serious user and not just a random youtuber shows difference in some cases but AMD did not say the cause or possible case for them which is a flaw in my opinion.
 
I don't understand. AMD said there is no problem in W10 yet they say that the 10% improvement we see using windows 7 is "software architecture differences between these OSes'
Same thing with Windows XP and Vista/7 for awhile. Hardcore gamers refused to upgrade as they dropped 5-10% fps.
 
Back
Top