AMD RyZen CPU Architecture for 2017

Pressure · Nov 28, 2016

Just think of Core i5 versus i7 ...

Kaotik · Nov 28, 2016

More likely the diagram is just as fake as the one above it - especially since it already has Raven Ridges on it when they're nowhere near being released
(edit: also, at least rumors suggest Raven Ridge would feature up to 16 CUs, while the diagram says max 8)

sebbbi · Nov 28, 2016

Pressure said:
Just think of Core i5 versus i7 ...

Intel is able to disable hyperthreading from all middle class products because there's no competition. If there was more competition, it would make little sense to purposefully disable working features. Hyperthreading has very small amount of dedicated extra hardware. It is highly unlikely that most Intel chips have defects in HT transistors. If we had more competition i5 would definitely have hyperthreading as well (i3 and Celeron would propably be the defective chips).

I would expect AMD to keep SMT enabled in most of their chips. They don't have the luxury to disable perfectly working features. I would expect 6 core chips (1-2 defective cores in a 8 core model) and lower clocked models to improve yields. Disabling SMT isn't going to help much with yields.

I also wouldn't expect AMD to disable features (AVX2, TSX, etc) in their low end models. They have never done that. Intel is disabling AVX in their low end Skylake Celerons. SSE4.2 only.

I.S.T. · Nov 28, 2016

Disabling AVX/2/FMA3 is just one of the most boneheaded fucking things Intel's ever done.

"Let's introduce a new instruction set to speed up certain types of problems!"

"Okay, then we disable it on the low end stuff to try and force people to buy the higher end stuff!"

*Eight years later*

"So, why isn't this getting more use?"

Yes, I know that's not the only reason why instruction sets are slow to pick up on and have been for a long time, but it is a major component.

Anarchist4000 · Nov 28, 2016

Kaotik said:
More likely the diagram is just as fake as the one above it - especially since it already has Raven Ridges on it when they're nowhere near being released
(edit: also, at least rumors suggest Raven Ridge would feature up to 16 CUs, while the diagram says max 8)

There were two Raven Ridge's as I recall. I'd have expected different names, but an interposer and HBM seem a likely difference. Assuming that list is the B stock binned parts it may be reasonable. Withholding the best parts for professional lines is plausible and the premium unlocked cores they may want for larger APUs. I wouldn't be surprised if they were binning already. The real question is how Zen is packaged? I'd think a part with HBM and an interposer would be a different design just to accommodate packaging.

Ethatron · Nov 28, 2016

I'd love an UltraSPARC-style 16 threads per core ... c'mon AMD, time to kick the balls and make 256 threads per socket available to the masses!

Kaotik · Nov 28, 2016

Anarchist4000 said:
There were two Raven Ridge's as I recall. I'd have expected different names, but an interposer and HBM seem a likely difference. Assuming that list is the B stock binned parts it may be reasonable. Withholding the best parts for professional lines is plausible and the premium unlocked cores they may want for larger APUs. I wouldn't be surprised if they were binning already. The real question is how Zen is packaged? I'd think a part with HBM and an interposer would be a different design just to accommodate packaging.

The "HPC APU" with Zeppelin CPU, separate GPU and HBM one one package wasn't supposed to be Raven Ridge, but completely separate

xEx · Nov 28, 2016

I just hope to have a Smt unlock cpu for 200 dollars and I'm sold.

Enviado desde mi HTC One mediante Tapatalk

Rurouni · Nov 29, 2016

Pressure said:
Just think of Core i5 versus i7 ...

And that's probably only for the desktop parts. Notebook parts are... well... more messy.

Alexko · Nov 29, 2016

I.S.T. said:
Disabling AVX/2/FMA3 is just one of the most boneheaded fucking things Intel's ever done.

"Let's introduce a new instruction set to speed up certain types of problems!"

"Okay, then we disable it on the low end stuff to try and force people to buy the higher end stuff!"

*Eight years later*

"So, why isn't this getting more use?"

Yes, I know that's not the only reason why instruction sets are slow to pick up on and have been for a long time, but it is a major component.

Yeah, but when there's no competition, you can sacrifice long-term benefits that don't really affect your competitive position to short-term profit. Just squeeze, squeeze, squeeze!

fellix · Nov 29, 2016

I.S.T. said:
Disabling AVX/2/FMA3 is just one of the most boneheaded fucking things Intel's ever done.

This is probably done to shave off few more watts, since those wide and feature-heavy FP ALUs are not really optimized for low-power operations. Even the big multi-core Xeons have to enter in a lower performance state when running AVX/FMA code. The mobile SoCs already come with a plethora of dedicated logic (DSP, ISP, decoders, IO offloading) for specific consumer tasks and at the same time those blocks are much more manageable and power efficient anyway. Why keep a (mostly) redundant programmable logic warmed up, just to process a hastily written video decode loop once in a while, if this can be done by the integrated video decoder that's an order of magnitude more efficient.

kalelovil · Nov 29, 2016

sebbbi said:
I also wouldn't expect AMD to disable features (AVX2, TSX, etc) in their low end models. They have never done that. Intel is disabling AVX in their low end Skylake Celerons. SSE4.2 only.

AMD have, but you have to go back quite a way. The first K8-based Semprons had AMD64 disabled.
(And not exactly a feature cut, but the first K10-based mobile Semprons and Athlon IIs had half of their FPU disabled).

I.S.T. · Nov 29, 2016

fellix said:
This is probably done to shave off few more watts, since those wide and feature-heavy FP ALUs are not really optimized for low-power operations. Even the big multi-core Xeons have to enter in a lower performance state when running AVX/FMA code. The mobile SoCs already come with a plethora of dedicated logic (DSP, ISP, decoders, IO offloading) for specific consumer tasks and at the same time those blocks are much more manageable and power efficient anyway. Why keep a (mostly) redundant programmable logic warmed up, just to process a hastily written video decode loop once in a while, if this can be done by the integrated video decoder that's an order of magnitude more efficient.

Considering some of these run at the same TDP as other parts... Pentium and i3, at least in the SB generation, had the same TDP, and the desktop parts are separated only by SMT and AVX1.

If we were talking laptop or embedded chips only, I'd probably agree with you. But, the issue is across the board.

fellix · Nov 29, 2016

I'm too displeased of this ISA "segregation", but apparently the SoC mentality of dedicated block integration is prevailing here and the general purpose logic is being sidestepped. I mean, Intel is already keeping some ultra-wide SIMD extensions exclusive to their HPC and server SKUs, so why not cut some fat on the other side of the spectrum (mobile)?

Intel could've opted for a new design with narrower and power-efficient SIMD ALUs while keeping all the ISA extensions intact, but probably the cost-benefit of supporting yet another architecture branch is not there.

sebbbi · Nov 30, 2016

fellix said:
I'm too displeased of this ISA "segregation", but apparently the SoC mentality of dedicated block integration is prevailing here and the general purpose logic is being sidestepped. I mean, Intel is already keeping some ultra-wide SIMD extensions exclusive to their HPC and server SKUs, so why not cut some fat on the other side of the spectrum (mobile)?

Intel could've opted for a new design with narrower and power-efficient SIMD ALUs while keeping all the ISA extensions intact, but probably the cost-benefit of supporting yet another architecture branch is not there.

Exactly. Make the SIMD narrower in low end models or artificially limit the performance, but do not disable the features. AVX(1) is still not used widely in games, and the reason is that there's too many CPUs around that do not support it. TSX is also a great feature, but it is disabled in low end models. People are not going to write two versions of their threading synchronization primitives (one for TSX and one without). When TSX has good enough coverage people start using it. Disabling it from low end models currently makes no sense. TSX is only used in some HPC applications. There's no additional value currently to consumers -> no consumer is going to select a more expensive model because lack of TSX. However disabling TSX of some consumer parts greatly hurts the adaptation of TSX.

AMD handled AVX perfectly in Jaguar. They supported full AVX instruction set with their narrow 128 bit SIMD. AVX instructions are split to two internal 128 bit instructions and thus run at half rate. The key difference is that Jaguar still runs AVX code. Intel ATOM on the other hand is limited to SSE4.2. And same is true for low end Skylake Celerons and Pentiums.

hoom · Nov 30, 2016

sebbbi said:
AMD handled AVX perfectly in Jaguar.

I wonder if the Intel compiler actually enables AVX for it now?

I.S.T. · Nov 30, 2016

sebbbi said:
Exactly. Make the SIMD narrower in low end models or artificially limit the performance, but do not disable the features. AVX(1) is still not used widely in games, and the reason is that there's too many CPUs around that do not support it. TSX is also a great feature, but it is disabled in low end models. People are not going to write two versions of their threading synchronization primitives (one for TSX and one without). When TSX has good enough coverage people start using it. Disabling it from low end models currently makes no sense. TSX is only used in some HPC applications. There's no additional value currently to consumers -> no consumer is going to select a more expensive model because lack of TSX. However disabling TSX of some consumer parts greatly hurts the adaptation of TSX.

AMD handled AVX perfectly in Jaguar. They supported full AVX instruction set with their narrow 128 bit SIMD. AVX instructions are split to two internal 128 bit instructions and thus run at half rate. The key difference is that Jaguar still runs AVX code. Intel ATOM on the other hand is limited to SSE4.2. And same is true for low end Skylake Celerons and Pentiums.

To be fair with regards to TSX, it's also bugged as shit on most of the CPUs supporting it: http://www.anandtech.com/show/8376/...rratum-found-in-haswell-haswelleep-broadwelly https://www.reddit.com/r/hardware/comments/44k218/intel_disables_tsx_transactional_memory_again_in/ (Not every Skylake seems to have it disabled)

So, yeah, that has little to do with Intel dumbassness and more to do with Intel... not QAing enough.

"Hey kids, it's that time again! It's Time To Learn A New Word!

*kids cheer in the background*

Today, the word is clusterfuck! How would you use it? Let's give an example! 'Intel's handling of TSX has been a clusterfuck!'"

Seriously, though. With this many issues, TSX has to be insanely complex to implement in the hardware.

sebbbi · Nov 30, 2016

I.S.T. said:
Seriously, though. With this many issues, TSX has to be insanely complex to implement in the hardware.

It is complex. IIRC AMD tried to bring similar extensions to Bulldozer, but no luck either. And no rumors that Zen is going to support it either. IBM has great tech in this field. More complex memory versioning and huge EDRAM based LLC. IIRC Intel's implementation is L1 cache only, so transaction size must be very small. IIRC IBM also uses their tech for speculative execution. Way ahead of Intel.

AlexV · Nov 30, 2016

I think that TSX might be less impressive than assumed in the scenarios that are typically of interest on this forum,. For example, software in this neck of the woods would have probably already migrated to a somewhat granular locking policy...or at least one would hope so. In general, one would be excused for assuming that the incentive for Intel to aggressively push TSX is low, IMHO.

I.S.T. · Nov 30, 2016

AlexV said:
I think that TSX might be less impressive than assumed in the scenarios that are typically of interest on this forum,. For example, software in this neck of the woods would have probably already migrated to a somewhat granular locking policy...or at least one would hope so. In general, one would be excused for assuming that the incentive for Intel to aggressively push TSX is low, IMHO.

True. Still, not having something like that is frustrating 'cause it slows future advances. Many programmers experiment on their own PCs, after all...

AMD RyZen CPU Architecture for 2017

Pressure

Kaotik

Drunk Member

sebbbi

I.S.T.

Anarchist4000

Ethatron

Kaotik

Drunk Member

xEx

Rurouni

Alexko

fellix

kalelovil

I.S.T.

fellix

sebbbi

hoom

I.S.T.

sebbbi

AlexV

Heteroscedasticitate

I.S.T.

Similar threads