AMD RyZen CPU Architecture for 2017

Malo · Mar 14, 2017

xEx said:
Yeah but almost 20% is crazy. I think Ryzen is the paradise for ram makers.

Witcher 3 is another title to benefit a lot from RAM speed. From the tests I've seen so far the RAM makes little difference at high resolutions in most titles but there's clearly a benefit to going as high speed as possible.

sebbbi · Mar 14, 2017

3dilettante said:
Consoles can do more to assign threads, and they seem to give their threads more time in general. Naughty Dog's presentation on their job system does show that they aren't totally immune to context switches, although it seems to be more occasional and it doesn't leave the overall environment a mystery.

Core locked threads + fibers are pretty good for cache locality (both L1 & L2). You have core locked 7 worker threads (1 core reserved for OS). 4 on the first cluster, 3 on the other. Data migration between two 4 core clusters really hurts on Jaguar. There are several ways to solve this.

Easy improvements include:
- Schedule top level task to selected cluster (for example rendering & physics to cluster 0). IIRC a commercial physics lib is configured by default to use only cluster 0.
- Restrain sub-tasks to same cluster (including wide parallel for loops).
- Work stealer prefers stealing from the same cluster. Only steal from threads of other cluster if work can't be found otherwise.

In case of SMT you'd want to steal work first from the neighbor SMT thread, because it shares L1 and L2 cache (on Intel and Ryzen). Then steal from other threads in the cluster cluster (8 threads out of 16 on Ryzen). Last steal from everybody (otherwise you would just stall). Prefer stealing top level tasks from other cluster instead of pieces of parallel for loops. Each task should have data dependency info (low, medium, high) that describes how much it is dependent on parent task's data. This is just a rough estimate that programmers can tune if they notice bad cache behavior. Scheduling and stealing would use the data dependency info to optimize the cache usage. This is pretty simple way to solve most of the problems. On PC you of course need to trust Windows API that tells you the core and cache configuration. Unfortunately this API gives wrong info currently for Ryzen. Hopefully fix is coming soon.

Some task graph systems track read/write resources (like the new DICE GDC presentation). This allows you to schedule tasks to SMT cores and clusters in more efficient way. Resource = chunk of memory. Scheduler could simply try to keep accesses to each resource in the same cluster. Not a problem if you build the task graph at the beginning of each frame. This is simply a graph sorting/binning problem.

Arnold Beckenbauer · Mar 14, 2017

xEx said:
'Finally, we have reviewed the limited available evidence concerning performance deltas between Windows® 7 and Windows® 10 on the AMD Ryzen™ CPU. We do not believe there is an issue with scheduling differences between the two versions of Windows. Any differences in performance can be more likely attributed to software architecture differences between these OSes.'

I don't understand. AMD said there is no problem in W10 yet they say that the 10% improvement we see using windows 7 is "software architecture differences between these OSes'

"Don't believe anything, untill AMD denies it."

SCNR. :runaway:

Kaotik · Mar 14, 2017

xEx said:
'Finally, we have reviewed the limited available evidence concerning performance deltas between Windows® 7 and Windows® 10 on the AMD Ryzen™ CPU. We do not believe there is an issue with scheduling differences between the two versions of Windows. Any differences in performance can be more likely attributed to software architecture differences between these OSes.'

I don't understand. AMD said there is no problem in W10 yet they say that the 10% improvement we see using windows 7 is "software architecture differences between these OSes'

Where exactly do they say there's 10% improvement over using Win7? They say "any differences", no matter if they're positive or negative, and IIRC there's plenty of results both ways

Arnold Beckenbauer · Mar 14, 2017

https://community.amd.com/community...4/tips-for-building-a-better-amd-ryzen-system

Mind Your Power Plan
Make sure the Windows® 10 High Performance power plan is being used (picture). The High Performance plan offers two key benefits:

Core Parking OFF: Idle CPU cores are instantaneously available for thread scheduling. In contrast, the Balanced plan aggressively places idle CPU cores into low power states. This can cause additional latency when un-parking cores to accommodate varying loads.

Fast frequency change: The AMD Ryzen™ processor can alter its voltage and frequency states in the 1ms intervals natively supported by the “Zen” architecture. In contrast, the Balanced plan may take longer for voltage and frequency changes due to software participation in power state changes.

In the near term, we recommend that games and other high-performance applications are complemented by the High Performance plan. By the first week of April, AMD intends to provide an update for AMD Ryzen™ processors that optimizes the power policy parameters of the Balanced plan to favor performance more consistent with the typical usage models of a desktop PC.

There are differences between W7's and W10's power plans, so "balanced" on W10 can hurt the performance.

Ike Turner · Mar 14, 2017

Arnold Beckenbauer said:
https://community.amd.com/community...4/tips-for-building-a-better-amd-ryzen-system

There are differences between W7's and W10's power plans, so "balanced" on W10 can hurt the performance.

Throughout this process we also discovered that F1™ 2016 generates a CPU topology map (hardware_settings_config.xml) when the game is installed. This file tells the game how many cores and threads the system’s processor supports. This settings file is stored in the Steam™ Cloud and appears to get resynced on any PC that installs F1™ 2016 from the same Steam account. Therefore: if a user had a 4-core processor without SMT, then reused that same game install on a new AMD Ryzen™ PC, the game would re-sync with the cloud and believe the new system is also the same old quad core CPU.

Only a fresh install of the game allowed for a new topology map that better interpreted the architecture of our AMD Ryzen™ processor. Score one for clean computing! But it wasn’t a complete victory. We also discovered that the new and better topology map still viewed Ryzen™ as a 16-core processor, rather than an 8-core processor with 16 threads. Even so, performance was noticeably improved with the updated topology map, and performance went up from there as we threw additional changes into the system.

Wut!

would be interesting to see if this applies to other games (even if this is probably just minor as I hope/think that most benches were done on a clean install)

entity279 · Mar 14, 2017

Would be easy for that game to be patched to trigger generation of maps on startup.

Next ppl would come to complain that switching cpus while playing is not suported

Ike Turner · Mar 14, 2017

entity279 said:
Would be easy for that game to be patched to trigger generation of maps on startup.

Next ppl would come to complain that switching cpus while playing is not suported

The game has to be un-installed and re-installed which is most often not the case for most players who keep their steam games library on a different HD than the one the OS is installed so they don't have to re-download them every time to recover/format/whatever there Windows install.

entity279 · Mar 14, 2017

Sorry, the only reason for my post above was that i was actually picturing some1 changing their CPU in the middle of a game. It was funny to me..

xEx · Mar 14, 2017

Kaotik said:
Where exactly do they say there's 10% improvement over using Win7? They say "any differences", no matter if they're positive or negative, and IIRC there's plenty of results both ways

Just look around and you will find test with difference around 10%. And I'm not talking about random youtubers. I think even stilt make some tests.

Enviado desde mi HTC One mediante Tapatalk

3dilettante · Mar 14, 2017

xEx said:
It is possible because there is a difference in performance, not in every single situation but there is nevertheless. I think AMD could right that article more marketing friendly tho. In any way the test done by serious user and not just a random youtuber shows difference in some cases but AMD did not say the cause or possible case for them which is a flaw in my opinion.

The two operating systems are complex systems, so there is going to be a difference. Whether AMD considers that an issue is not the same thing. There are trade-offs that might hurt performance in other non-game scenarios, or platform power efficiency (going for no core parking for example) that could be an issue for someone else. Given the rawness of the platform, AMD may simply assume that it's going to be rough in some place regardless. AMD's general statements are that there are various secondary issues, and in a few cases individual fixes, but not a singular cause.

xEx said:
Yeah but almost 20% is crazy. I think Ryzen is the paradise for ram makers.

Currently, it is not. Multiple reviews have it capping out at sub-DDR3200, and one constraint for compatibility being cited is the fixed 1T command latency.
This appears to be an area where the immaturity of the architecture is evident, and memory controllers and interconnect are an area where a lot of tweaking happens under the hood late in development.
Rizen's dependence on this may indicate that AMD was hoping that the chip could reliably get to the next speed grades, but that so far this is the best they could do stably.
The Anandtech thread has discussion of internal debug modes for 1:1 clocking of the fabric with the headline DRAM speed, although that might be more of a room to grow testing option for some future instance--or for some chip with way lower clocks. Getting clocks closer to the core speeds does lower latency somewhat, and perhaps some future review can monitor of there are certain sweet spots where the fabric, MC, and core clocks are at convenient multiples that minimize synchronization cycles.

xEx · Mar 14, 2017

Yes right now is hard to find kit for 3200 that work on ryzen but in the future users will have a real reason to buy (super expensive) fast ram kits in the oposite of Intel where the diference is very low.

Also more than 32% in F1 from worse case scenario to best case scenario is huges, thats like 2 generations jump in performance(5 with Intel strategy

) .

I think I will wait until R2 to buy a system or maybe getting a chip 4/8 for 150~ when they came out and update from then.

Alexko · Mar 14, 2017

3dilettante said:
The two operating systems are complex systems, so there is going to be a difference. Whether AMD considers that an issue is not the same thing. There are trade-offs that might hurt performance in other non-game scenarios, or platform power efficiency (going for no core parking for example) that could be an issue for someone else. Given the rawness of the platform, AMD may simply assume that it's going to be rough in some place regardless. AMD's general statements are that there are various secondary issues, and in a few cases individual fixes, but not a singular cause.

Currently, it is not. Multiple reviews have it capping out at sub-DDR3200, and one constraint for compatibility being cited is the fixed 1T command latency.
This appears to be an area where the immaturity of the architecture is evident, and memory controllers and interconnect are an area where a lot of tweaking happens under the hood late in development.
Rizen's dependence on this may indicate that AMD was hoping that the chip could reliably get to the next speed grades, but that so far this is the best they could do stably.
The Anandtech thread has discussion of internal debug modes for 1:1 clocking of the fabric with the headline DRAM speed, although that might be more of a room to grow testing option for some future instance--or for some chip with way lower clocks. Getting clocks closer to the core speeds does lower latency somewhat, and perhaps some future review can monitor of there are certain sweet spots where the fabric, MC, and core clocks are at convenient multiples that minimize synchronization cycles.

How likely do you think it is that we'll see a new stepping with improved memory/interconnect performance before Ryzen 2 is out?

3dilettante · Mar 14, 2017

xEx said:
Yes right now is hard to find kit for 3200 that work on ryzen but in the future users will have a real reason to buy (super expensive) fast ram kits in the oposite of Intel where the diference is very low.

Hopefully AMD's tweaks allow for stable operation at higher speed grades and more DIMMs. The official support at even the sub-3200 speeds for a single-rank DIMM per channel would leave customers buying fewer of these high-speed modules, which might not make a kit maker happy.

Alexko said:
How likely do you think it is that we'll see a new stepping with improved memory/interconnect performance before Ryzen 2 is out?

I'm not sure how important continuity of DIMM support within speed grades counts, although presumably AMD would want healthier DRAM speed and capacity support per channel for Naples, if they have similar controllers and fabric.
There's also Raven Ridge as a product that would have a stronger case for robust memory support.

I keep wondering when a 16-core quad channel product is supposed to show up. The gap between Rizen and the announced Naples is vast in core and channel count, and the intermediate core and channel count is what AMD's design philosophy leaves as a way to match Intel's quad-channel sockets.

xEx · Mar 14, 2017

Validation of server parts are very complex and difficult, AMD needs to do it right at the first try so they have to take the time needed to do it right. Also you need a ecosystem and right now partners are busy fixing the few models they launched and are not very happy with the way amd threat them in this launch.

So Naples is still away and there is no date so I would expect end of the year maybe?

Enviado desde mi HTC One mediante Tapatalk

3dilettante · Mar 14, 2017

xEx said:
Validation of server parts are very complex and difficult, AMD needs to do it right at the first try so they have to take the time needed to do it right. Also you need a ecosystem and right now partners are busy fixing the few models they launched and are not very happy with the way amd threat them in this launch.

So Naples is still away and there is no date so I would expect end of the year maybe?

Enviado desde mi HTC One mediante Tapatalk

Per AMD, Naples is to be available in Q2 2017.
http://www.anandtech.com/show/11183...aples-cpus-for-1p-and-2p-servers-coming-in-q2

Alexko · Mar 14, 2017

3dilettante said:
Hopefully AMD's tweaks allow for stable operation at higher speed grades and more DIMMs. The official support at even the sub-3200 speeds for a single-rank DIMM per channel would leave customers buying fewer of these high-speed modules, which might not make a kit maker happy.

I'm not sure how important continuity of DIMM support within speed grades counts, although presumably AMD would want healthier DRAM speed and capacity support per channel for Naples, if they have similar controllers and fabric.
There's also Raven Ridge as a product that would have a stronger case for robust memory support.

I keep wondering when a 16-core quad channel product is supposed to show up. The gap between Rizen and the announced Naples is vast in core and channel count, and the intermediate core and channel count is what AMD's design philosophy leaves as a way to match Intel's quad-channel sockets.

I think I've read somewhere that there will be 16-core (and 24-core?) versions of Naples available as well. That's what you'd expect, anyway.

xEx · Mar 14, 2017

3dilettante said:
Per AMD, Naples is to be available in Q2 2017.
http://www.anandtech.com/show/11183...aples-cpus-for-1p-and-2p-servers-coming-in-q2

Nvm then

Kaotik · Mar 14, 2017

xEx said:
Just look around and you will find test with difference around 10%. And I'm not talking about random youtubers. I think even stilt make some tests.

Enviado desde mi HTC One mediante Tapatalk

I'm sure I can, and I'm sure I can also find tests where the difference isn't around 10%, you shouldn't take any single benchmark, no matter who did it, as gospel

3dilettante · Mar 14, 2017

On further reflection, the Naples system used by AMD had DRAM density and speed matching the DIMM count for Ryzen at 2400. The chip Naples faced off with isn't any faster, so Ryzen's relatively pedestrian speeds for enthusiasts could be related to the datacenter market being more important than high-speed gamer kits.

AMD RyZen CPU Architecture for 2017

Malo

Yak Mechanicum

sebbbi

Arnold Beckenbauer

Kaotik

Drunk Member

Arnold Beckenbauer

Ike Turner

entity279

Ike Turner

entity279

xEx

3dilettante

xEx

Alexko

3dilettante

xEx

3dilettante

Alexko

xEx

Kaotik

Drunk Member

3dilettante

Similar threads