AMD RyZen CPU Architecture for 2017

Don't forget that SMT performs better than HT so it's not only a core advantage but the extra threads from SMT perform better than the extra threads from HT.

Hyper Threading is Simultaneous MultiThreading, it's just Intel's brandname for it.

The reason Ryzen is relatively faster in multithreading is because it has a wider execution core; In a single threaded scenario you're in part limited by the ILP of the code running. Ryzen, because it is wider, has more resources idling, which can be used when executing two threads.

Cheers
 
Hyperthreading is simply Intel's marketing name for their SMT implementation. I don't see any big differences in Intel's and AMDs SMT implementation. Do you have links to professional workload benchmarks showing better scaling with AMDs SMT implementation vs Intel's?

https://forums.anandtech.com/threads/ryzen-strictly-technical.2500572/#post-38770120

GzZdx4q.png


Depending on the application it can be higher or lower.
 
3% advantage in professional software is pretty good, but not a big difference. Also this figure doesn't include results from software with negative gains. You have to take into account that Ryzen also dips a bit more in worst cases when SMT is enabled (games for example). Intel has tuned their SMT implementation over 6 generations to reduce performance drops. Nehalem and Sandy dropped a lot more in worst case than Skylake/Kaby. Gamers commonly disabled HT back then. Some core resources were shared in first HT implementations and are now dedicated per thread. I am sure there's lots of minor tweaks that are not disclosed. I would guess that AMD also tunes their implementation to Zen 2 and Zen 3 to avoid SMT performance drop in worst cases. It's good to see that their SMT already works very well in software that benefits the most. This is definitely good news for server workloads.

My educated guess why SMT helps Ryzen a bit more than Intel:
AMD has higher L3 and memory latency than Intel. SMT helps in hiding L3 & mem latency. Most likely explanation is that we see bigger SMT gains simply because the baseline memory latency is higher for Ryzen. SMT helps bringing core utilization closer to Intel's.
 
My educated guess why SMT helps Ryzen a bit more than Intel:
AMD has higher L3 and memory latency than Intel. SMT helps in hiding L3 & mem latency. Most likely explanation is that we see bigger SMT gains simply because the baseline memory latency is higher for Ryzen. SMT helps bringing core utilization closer to Intel's.

Ryzen has twice as much I$, which might make a difference when you have two contexts stomping around. It also has twice the L2 cache, and as important twice the associativity. Sky/KabyLake's L2 is only four way associative. Ideally you want to have three ways per context for when the code, stack and heap segments alias or you might end up evicting hot cache lines because you run out of associativity ways.

The aggregate scheduling resources are also bigger for Ryzen than Sky/Kabylake, something that might make a difference once you're limited by LS throughput and have to schedule around extended latencies. Also, according to Agner Fog, execution throughput, AVX instructions excepted, is higher for Ryzen than any Intel core.

Currently Ryzen does a pretty good job at running code optimized for Intel microarchitectures. It'll be interesting to see what performance increases (if any) are achieved once compilers start to accomodate Ryzen.

Cheers
 
Last edited:
Hope this hasnt allready been posted
Ryzen Memory Latency's Impact on Weak 1080p Gaming
https://www.pcper.com/reviews/Processors/Ryzen-Memory-Latencys-Impact-Weak-1080p-Gaming

Fair to note that since launch we've already seen AMD reduce memory latency with the first AGESA microcode update (1.0.0.4a). It'll be interesting to see how this develops. Memory latency was my first guess as well, as it is the single biggest difference between Intel and AMD in mem performance at the moment.
 
Fair to note that since launch we've already seen AMD reduce memory latency with the first AGESA microcode update (1.0.0.4a). It'll be interesting to see how this develops. Memory latency was my first guess as well, as it is the single biggest difference between Intel and AMD in mem performance at the moment.
Yes, there's a significant difference in memory latency. I would guess that this is the biggest single reason for better SMT performance. SMTs main purpose is to hide memory latency. But as Gubbi said earlier, there are other reasons as well. Larger I$ and L2$ are definitely also helpful when running two threads concurrently on the same CPU core.
 
Core i7-6950X (10 cores, 3.0 GHz) costs $1723. It has 62.5% core count and a lower clock rate. Rumors (few months ago) speculated on 999$ price point for the 16 core (32 thread) Ryzen flagship. 999$ would be a steal for this CPU. At that price point this would sell like hot cakes. I would assume that quad channel memory solves Ryzen's memory bottlenecks. Eagerly waiting for benchmarks.

It's going to be interesting to see how Intel prices the forthcoming i9 CPUs, especially the 12 core flagship. That's going to be the main competitor for the 16 core AMD CPU. Maybe they need to lower prices a bit. I'd expect something around 1500$. Even at that price point, it would be a steal compared to current 12 core (single socket) Xeon flagship (which is 2 gens older architecture and lower clock).
Oh yea I am sure even at 1k it would be a steal I just can't go above $700 lol
 
Currently Ryzen does a pretty good job at running code optimized for Intel microarchitectures. It'll be interesting to see what performance increases (if any) are achieved once compilers start to accomodate Ryzen.
Support for AMD Family 17h processors (“Zen” core)
•Extends LLVM 4.0 (llvm.org) with enhancements and optimizations

•Improved vectorization, high-level optimizer and code generation
•Improved whole program optimization

•Enhanced and well supported DragonEgg Fortran frontend
http://developer.amd.com/tools-and-sdks/cpu-development/amd-optimizing-cc-compiler/

Haven't seen any benches for that yet, but if you're using clang some optimizations exist while still being upstreamed according to AMD devs.
 
Jim Anderson just showed a slide that claims the Ryzen ultra-mobile SoCs coming Q3 will bring Vega graphics, 55% more CPU performance, 40% more GPU performance and 50% less power consumption.
A dual-core RR with SMT should be 50% faster in Cinebench 15 (MT) than a Bristol Ridge APU with two Excavator modules ("four cores") = you get 50% higher numbers in Cinebench. And AMD loves Cinebench.

I won't be surprised if a possible mobile RR-APU with 2 cores&SMT and 6 CUs (TDP of 15W) will deliever better real world performance than Intels mobile U-processors with HD 620.

I'm guessing it's the big Raven Ridge (4-core, 11CU) that will consume half of Intel's current 45W models (22.5W).
Why are guessing this?
 
Yes, there's a significant difference in memory latency. I would guess that this is the biggest single reason for better SMT performance. SMTs main purpose is to hide memory latency. But as Gubbi said earlier, there are other reasons as well. Larger I$ and L2$ are definitely also helpful when running two threads concurrently on the same CPU core.
According to agner it is possible to sustain 5uops a cycle on Zen vs realisticly only 4 on skylake. Most the time at that width your going to be load/store limited but with SMT I could see that extra real world width accounting for the difference.

My crazy dream for a Zen follow on would be to keep everything at 128bit but add an extra load and store port and change the fpu/prf to be 12 load ports and a add/mul/fma on every port.

I know it's never going to happen but if Zen can really sustain higher uop throughput then that would really allow Zen to stretch it's leg's and all that 128bit SIMD would be awesome for games.

Edit: add the smt4 rumor to that for the ultra levels power 9 copy.
 
ps:
AMD just officially announced the new AMD Ryzen ThreadRipper High End Desktop CPU that will be released this summer at its Financial Analyst Day. A 16-Core 32-Thread CPU
 
ps:
AMD just officially announced the new AMD Ryzen ThreadRipper High End Desktop CPU that will be released this summer at its Financial Analyst Day. A 16-Core 32-Thread CPU
As well as Epyc server/workstation CPUs, aka 32-core 64-thread CPU, previously known as Naples
 
Oh yea I am sure even at 1k it would be a steal I just can't go above $700 lol
A 10/12 might be good for what I want

With that budget, you could go with a X99 board and a Haswell-based Xeon E5 2xxx V3, which you can find at ridiculous prices on ebay.
Slap the cheapest/slowest DDR4 memory you find (it's quad-channel anyway), use a motherboard with 8 DIMMs so you can upgrade up to 128GB total system memory and you'll have a system to last many years.

They just don't clock very high (3GHz max in that model, I think?), but in a world going towards DX12/Vulkan, at least for games you should be more than fine.
 
With that budget, you could go with a X99 board and a Haswell-based Xeon E5 2xxx V3, which you can find at ridiculous prices on ebay.
That's pretty nice price. But remember that 14 cores at 2.0 GHz have exactly the same maximum throughput as 8 cores at 3.5 GHz (same architecture). And this is in perfectly multithreaded scenarios (all 28 threads 100% utilized). So don't expect it to match a modern 3.5 GHz 8-core CPU in many applications.

Lower clocked & higher core count CPU obviously wins the perf/watt race. But this particular CPU is pretty old. Haswell is 22nm. Modern 14nm 8-core CPUs beat 14-core Haswell in perf/watt as well. I would buy a 8-core Ryzen instead of that 2.0 GHz 14-core Haswell. Would also save on motherboard + memory cost.

Nobody yet knows the price of the 12-core Threadripper. But if the 16-core flagship is 999$ (old rumors that might be false), we should expect to see 12-core prices hovering pretty close to your 700$ budget.
 
Back
Top