AMD RyZen CPU Architecture for 2017

The thing is I believe what Voxilla want is a core roughly half as small as current Intel core (by removing unnecessary stuff like SMT) but with similar IPC? I also assume this CPU is an X86/X64 CPU? I don't think it's possible.
 
I don't care about cores size, given how small CPUs are on 14nm, there is plenty of room for adding more transistors.

A Skylake (or equivalent new AMD processor) with 8 cores, with SMT disabled would be a big improvement. The shared infrastructure like L3 cache and memory bus can remain (ie 8 MB and dual channel DDR4), and are sufficient to support 8 threads on 8 cores.
 
Last edited:
You are confusing things, IPC is a property of your CPU not of the software running on the CPU.
What you describe is threads that stall. That is an indication of bad software design.
SMT brings only max 30%, 2 cores bring 100% improvement.
You're ridiculously determined to not see the forest for all the trees. SMT isn't a replacement for more cores, it's a mechanism to make each core run more efficiently. Look at IBM Power, the latest generation runs up to 8 threads per core, are you really going to tell IBM to toss that capability and squeeze 8x the amount of cores into their CPUs instead...? ;)

2 cores bring 100% improvement of one core (in theory - scaling may not be perfect, especially beyond merely 2 cores), but it also brings ~100% greater cost. With large chips, doubling the core count from say 8 to 16 increases cost a lot more than 100% more by the way.

SMT meanwhile has a low cost impact and may have a sometimes very large performance benefit as well. Basically you're stupid for not having it, SMT makes the cores you DO have run more efficiently; wishing that you had 100% more cores than you actually have on the other hand accomplishes absolutely nothing.
 
I don't care about cores size, given how small CPUs are on 14nm, there is plenty of room for adding more transistors.
That's unrealistic. Per-transistor cost at 14nm is a lot higher than previous nodes, almost doubling transistor counts for CPU cores would make the chip a lot more expensive, with little benefit for most people.

Beyond large servers, big data and computing, few softwares take advantage of 8 cores.
 
SMT meanwhile has a low cost impact and may have a sometimes very large performance benefit as well. Basically you're stupid for not having it, SMT makes the cores you DO have run more efficiently; wishing that you had 100% more cores than you actually have on the other hand accomplishes absolutely nothing.

Tell Intel they are stupid, why did they go from 2 cores with SMT to 4 cores without SMT for Atom ?
 
Tell Intel they are stupid, why did they go from 2 cores with SMT to 4 cores without SMT for Atom ?

Because they went from an in-order architecture which is very sensitive to memory stalls to an out-of-order architecture which isn't

Cheers
 
On the one hand I see it as probably easier for AMDs limited engineering to make a fast, 'simple' non-SMT core.
If you're already going to have 8 real cores does 16 logical cores really help in many non-HPC type situations?

On the other hand SMT is reportedly pretty simple to add & has proven real performance improvements in particular situations.
I had kinda wondered if AMD might go for 4 or 8 thread per core SMT like IBM to maximise the SMT advantage but the rumors seem pretty specific that it'll be kept to 2 thread.
Hopefully AMD will implement in a way that is directly compatible with existing scheduler/compiler optimisation for Intel SMT (which may be a big driver for why they are going 2 thread per core).

I do believe there is technical risk in putting SMT on an otherwise normal core, Intel had issues with their early iterations of SMT (partly because of P4 architecture oddities) but even now some tasks are quicker on i5 real quads than SMT i7s.
Also seems that a core designed expecting to have SMT & attempting to maximise performance when both logical cores are loaded could wind up costing a bunch of area/complexity/power and hurting the per-thread IPC especially if you fluff it on first time designing SMT.
There is possibility to get a double-whammy architeture fail that way & unfortunately is the sort of cockup AMD has been prone to in their last several Architectures :oops:
 
We very much are in a Netburst situation again.
The desktop CPU again has grind to halt.
How was that situation defused again ?
Taking a few lessons from the mobile guys.
First lesson learnt you don't need SMT/HT.

The new mobile obviously being the phone and tablet SoCs.
While the desktop is at a standstill, they keep progressing with leaps and bounds.
None of them use SMT, they must have a reason.
They are actually making 8 cores, even 10 cores.
Are they out of their mind ?
 
Last edited:
You'd have a point if phones with 8 or 10 cores were faster than phones with two.

But they aren't, so you don't.

Cheers
 
Accept that people can have different opinions without the need of insulting each other.

To refresh terms and rules:
"The Beyond3D forum community rules are minimal, straightforward and all underpin a core tenet of "be nice to each other""
 
Last edited:
Who has insulted anyone? Also, you're stating your opinions in very absolute terms - you should expect disagreement. Some very knowledgeable people have also weighed in on this topic and informed you that SMP has more merit than you want to give it credit for.
 
Some of your quotes:
"You're ridiculously determined..."
"Basically you're stupid..."
"Keep digging that hole of yours..."
 
That is very wrong!
For one the working set of 2 threads is larger as for 1 thread.
True, which is one of the trade-offs designs have to make when it comes to threading and the memory hierarchy.
Working sets have scaled much more gradually over time, and cache capacity is one of the less intensive knobs that can be tuned for design, particularly if they are further out from the core.

So you need bigger L3 caches to accommodate 4 cores with 2xSMT versus 4 cores with no SMT.
Why would this be significant, particularly with designs like Intel's where the L3 is frequently highly inclusive of the L2 and L1? The L3's pressure is actually worse the more cores you have in that scenario. SMT doesn't materialize additional physical lines the L3 has to track.
The lower levels of cache are a fraction of the L3's size, so the impact has been measured to be modest. The L3 is also very adjustable in terms of capacity, and this is a very low-power adjustement versus doubling the active circuitry of the core complex.

Similar your L3 bandwidth requirements for a core with SMT are higher compared to no SMT.
The L3 bandwidth requirements are exactly as the same core without SMT: the physical number of transactions that need to be serviced by the next level of the hierarchy, assuming that is local to the core. The L3 and the uncore see the cores in terms of their physical interface points and porting, which is why doubling the number of cores means more than using the same hardware that is already in place. If the L3's bandwidth is not sufficient to supply 4 SMT cores's worth of bandwidth, it is at best half what is needed for 8 cores. Due to coherence traffic, which would worsen with the number of active caches, it would be worse. That's why Intel starts upping the complexity of its internal interconnect at the higher counts, and as bandwidth needs start to up the number of memory controllers.

SMT does not materialize additional physical ports or additional caches that need to maintain coherence.

Or to make it even more clear. Say you have very efficient SMT that allows your single core with 2xSMT to perform as good as 2 cores, obviously your single core needs the same shared infrastructure as the 2 weaker cores.
It's not necessary that SMT match two separate cores. It just needs to yield better performance to justify its power and area cost on the workloads the chip will face.
SMT helps justify a beefier core than can be justified without it, which means a secondary benefit to single-threaded performance.
The overhead is effectively a small amount of additional context tracking and selection logic for these cores, since the OoO engine's rename and result tracking can with very little alteration keep two threads in flight.

This puts die size penalty at 5% for SMT, and this is for now ancient architectures. Some of the split resources are now pooled and the overall size of the cores and the complexity of the rest of the chip have expanded far more since then.
http://www.cs.cmu.edu/afs/cs/academic/class/15740-f02/www/lectures/hv.pdf
Power penalty was put up as high as 16% in other places with sufficient utilization by multiple threads.
Having two whole cores running the same two threads as one SMT is not a lower power penalty.

Putting an 8-core solution against a 4-core means it will be compared in terms of the workloads the 4-core does best in, which even now is heavily weighted towards benchmarks and applications that usually do not scale to 8 and appreciate the higher clocks the 4-core can turbo to.
 
SMT does not materialize additional physical ports or additional caches that need to maintain coherence.
IMHO, there is another way of looking at it: uncore resources are sized for the "execution capability" of the core. It does not matter whether it is, for example, a single thread capable of sustaining 4 operations per cycle, or two threads sustaining two operations per cycle each.
 
Putting an 8-core solution against a 4-core means it will be compared in terms of the workloads the 4-core does best in, which even now is heavily weighted towards benchmarks and applications that usually do not scale to 8 and appreciate the higher clocks the 4-core can turbo to.

So you claim 4 cores is final solution, the end of progress on the desktop.
Were stuck with that for the rest of times.
 
Back
Top