Next Generation Hardware Speculation with a Technical Spin [pre E3 2019]

Status
Not open for further replies.
Since we're on AVX512
Agner's blog has some nice thoughts to ponder:
https://www.agner.org/optimize/blog/read.php?i=963
The new AVX512 instruction set extension adds three hundred new instructions. The x86 instruction set with its many extensions now includes more than two thousand different instructions. We can only guess what this costs in terms of design complexity and silicon space on the latest microprocessors.

No compiler is able to generate so many different instructions from high-level language, and few programmers, if any, are competent to use them all in assembly code or intrinsic functions. However, most of the instructions appear to be useful and they may be used in specially designed function libraries.


Many of the older instructions are now obsolete as they have been replaced by new more efficient instructions, but the old instructions are still supported for the sake of compatibility with legacy software. AMD has removed some of their obsolete instructions from their new processors, but Intel processors still support even the most obscure undocumented instructions dating back to the first 8086 processor.


The x86 instruction set was initially developed at a time when CISC design was technologically optimal. This instruction set with its many extensions is now a confusing hodgepodge witnessing a long history of changing technologies, short-sighted decisions, patches, changing priorities, and changing marketing fads.

And on the topic of what a forward compatible processor would be:
Well - I have done more than just thinking. I have published some of my ideas and got a lot of useful feedback and new ideas from the users of this forum. The result is a new instruction set and computer system that I call ForwardCom (Forward Compatible Computer System). ForwardCom is neither RISC nor CISC, but a hybrid with few instructions but many variants of each instruction. It has vector registers with variable length, where each vector register contains information about its own length. It is designed so that existing software can take advantage of unlimited future extensions of the vector length without the need for new instructions and recompilation. Hence the name "forward compatible". The instruction format is standardized and the complexity is limited in order to enable a simple pipeline design.


All the necessary software tools for ForwardCom have been designed. You can see it all at www.forwardcom.info including the many innovative features. No hardware or FPGA implementation has been developed yet.
 
You have a processor that can finally compete with Intel. Are you really going to sell it at rock bottom prices

The Zen 2 chiplets will have varying working core counts and varying performance characteristics. AMD selling their lower tail end of their Zen 2 production to console vendors has multiple advantages: 1.) Selling at cost (or more) is better than selling below cost, 2.) Lower end SKUs won't canibalize higher end SKU sales, 3.) Brand recognition of Zen 2 improves because you only have high end performing CPUs.

Say AMD decides their 3600 will run at 3.8GHz base. Everything that can't clock this high within the TDP will have to go in a lower bin (3500, 3300 or 3200). The ASP of these bins are roughly half that of the six core SKUs. AMD will also have a lot of chiplets with just four working cores. Now they have an abundance of lower performing chiplets which puts further downward pressure on these lower end SKUs, and price might well drop below cost.

If, instead, AMD sells those working, but lower performing, 6-core chiplets off to SONY and MS with a guarantee they'll run at 3GHz and at a slight profit, they are much better off. Not only do they make a small profit selling to the console vendors, but they also end up with fewer SKUs that has to be sold at bargain basement prices, improving margins.

The console vendors can then bin the CPUs according to power consumption and pair a high power consumption CPU with a lower power consumption GPU. MS already does this in the XB1X, pairing high power consumption APUs with lower consumption system hardware and vice versa.

Cheers
 
What constitutes a new architecture in your mind? What has been disclosed on. Zen 2 shows a revamped front-end, and a vastly different FP arrangement.
Not Zen. If tick tock was a thing for AMD, Zen and Zen 2 would be that.

Lot of supposition there. What matters to AMD is which fabs the console makers will be using. If it’s 7nm, what’s the difference to them which design occupies that die space. Wouldn’t it make more sense to have them use a standard Zen 2 chiplet so that their binning options for console, desktop or mobile expand vastly? The idea that using the latest would somehow hamper their ability to compete with Intel or magically lose revenue does not seem to have a basis to me.
Because fabrication time is a resource. If all the fabs are producing low margin items when you could be gaining more margin, that is an obvious opportunity cost. There's a very specific reason that Tesla sold only the high end Model 3s before selling the low end ones. You have only so much available fabrication time, and time is money. While your churning out low margin chips, that same fabrication time and silicon could be use to sell high margin EPYCs. High margin products is where a healthy business is at, and I can't see AMD signing up for yet another 7 years of super low margins - you're basically wasting time and AMD is desperate need of money. High margin products will have the priority for sure. They can only take advantage of this period where Zen can compete with intel for so long before intel comes back with something.

Why are we comparing retail prices? When has that ever been a sound basis for costing of a console BOM?
Because we need some anchor point for reality.

That seems to be reading into what the blog post says. Moreover, Zen 1 is the first of a new approach for AMD. To suggest we have all this precedent for what Zen 2 will be seems to be liberally misappropriating history.
A move to 512-bit vectors is a big change. That's double the bandwidth effectively from 256 bit vectors.
 
The Zen 2 chiplets will have varying working core counts and varying performance characteristics. AMD selling their lower tail end of their Zen 2 production to console vendors has multiple advantages: 1.) Selling at cost (or more) is better than selling below cost, 2.) Lower end SKUs won't canibalize higher end SKU sales, 3.) Brand recognition of Zen 2 improves because you only have high end performing CPUs.

Say AMD decides their 3600 will run at 3.8GHz base. Everything that can't clock this high within the TDP will have to go in a lower bin (3500, 3300 or 3200). The ASP of these bins are roughly half that of the six core SKUs. AMD will also have a lot of chiplets with just four working cores. Now they have an abundance of lower performing chiplets which puts further downward pressure on these lower end SKUs, and price might well drop below cost.

If, instead, AMD sells those working, but lower performing, 6-core chiplets off to SONY and MS with a guarantee they'll run at 3GHz and at a slight profit, they are much better off. Not only do they make a small profit selling to the console vendors, but they also end up with fewer SKUs that has to be sold at bargain basement prices, improving margins.

The console vendors can then bin the CPUs according to power consumption and pair a high power consumption CPU with a lower power consumption GPU. MS already does this in the XB1X, pairing high power consumption APUs with lower consumption system hardware and vice versa.

Cheers
Agreed, that's certainly a reasonable strategy, and one of the reasons I think Zen 2 is viable. However price of the Zen 2 chiplet, and it being paired with a GPU is highly in question for me. Threadripper processors top over $1000 CAD, and I believe this chiplet type design is meant to replace thread ripper and not service the low end.
 
*Checks thread title* yup, we’re good here. Was I given a scepter of you’re-not-allowed-to-disagree with-my-conclusions? Not that I saw, but I could be wrong I suppose.
Then just qualify what you're speculating as your speculation. Why not just say, "hey guys, it's only my theory."? The simplest, easiest way to clear up any messy conversation is add a teensie-weensie bit of clarification so everyone's on the same page, even if you think that clarification is redundant for whatever reasons that seem obvious to you.
 
The problem with supporting AVX-512 is that there are 18 (EIGHTEEN!!!) distinct subsets of instructions in AVX-512, they are:
F, CD, ER, PF, 4FMAPS, 4VNNIW, VPOPCNTDQ, VL, DQ, BW, IFMA, VBMI, VNNI, VBMI2, BITALG, VPCLMULQDQ, GFNI and VAES

I can see why AMD won't try to be compliant until the dust settles.

They are going to have the raw computational throughput of two AVX-512 units with their four full width AVX2 units and they'll have similar bandwidth to/from their FP units as Intel does. There are some useful instructions in AVX-512, like gather and scatter that they'll miss out on, but it's pretty easy to saturate a memory subsystem with 6+ cores without those instructions anyway.

Cheers

https://reviews.llvm.org/rL318983

There is that. I didn’t realize Skylake X even had differences from Skylake in that regard.

Not Zen. If tick tock was a thing for AMD, Zen and Zen 2 would be that.

The analogy would be much more fitting for Zen+ then, which had a very small IPC bump. There’s nothing comparable to a FP unit doubling as a ‘tock’ in Intel’s dead scheme.

Because fabrication time is a resource. If all the fabs are producing low margin items when you could be gaining more margin, that is an obvious opportunity cost. There's a very specific reason that Tesla sold only the high end Model 3s before selling the low end ones. You have only so much available fabrication time, and time is money. While your churning out low margin chips, that same fabrication time and silicon could be use to sell high margin EPYCs. High margin products is where a healthy business is at, and I can't see AMD signing up for yet another 7 years of super low margins - you're basically wasting time and AMD is desperate need of money. High margin products will have the priority for sure. They can only take advantage of this period where Zen can compete with intel for so long before intel comes back with something.

It is not AMD’s resource, though. It’s essentially a commodity until console volumes affect AMD’s cost for wafers on their non semi-custom products. But I don’t think there’s any world where next gen consoles weren’t going to be 7nm anyway, so that’s pretty much a non-starter anyway.

I think it highly likely that Sony/Microsoft paid a lump sum for NRE and are likely paying per wafer now, with an AMD fee included. The what of the wafer’s content matters little once the engineering design effort is complete, at least from AMD’s perspective.


Because we need some anchor point for reality.

Then compare last gen consoles to hardware products at the time. Otherwise, I can sell you a $10,000 scalpel.


A move to 512-bit vectors is a big change. That's double the bandwidth effectively from 256 bit vectors.

They’ve already doubled the FP unit widths, though. Papermaster was very dodgy when pressed for specifics, so I’d consider a very open matter.

Agreed, that's certainly a reasonable strategy, and one of the reasons I think Zen 2 is viable. However price of the Zen 2 chiplet, and it being paired with a GPU is highly in question for me. Threadripper processors top over $1000 CAD, and I believe this chiplet type design is meant to replace thread ripper and not service the low end.

The chiplet itself is only 70 mm^2. It makes sense for very price segment except perhaps maybe the low end APU market.

Then just qualify what you're speculating as your speculation. Why not just say, "hey guys, it's only my theory."? The simplest, easiest way to clear up any messy conversation is add a teensie-weensie bit of clarification so everyone's on the same page, even if you think that clarification is redundant for whatever reasons that seem obvious to you.
I added a question mark to the original post.
 
Because fabrication time is a resource. If all the fabs are producing low margin items when you could be gaining more margin, that is an obvious opportunity cost. There's a very specific reason that Tesla sold only the high end Model 3s before selling the low end ones. You have only so much available fabrication time, and time is money. While your churning out low margin chips, that same fabrication time and silicon could be use to sell high margin EPYCs.

AMD doesn't have fabs, their 7nm products are manufactured by TSMC and while the console volumes are quite large, they aren't that a significant portion of TSMCs total production or in serious contention with AMD's other TSMC orders imo, thus console APUs are extra, not alternative revenue for AMD. Zen 2 processors are also hitting the PC market this year, whereas the consoles seemingly next year, at which point these technologies aren't state of the art anymore.
 
The Zen 2 chiplets will have varying working core counts and varying performance characteristics. AMD selling their lower tail end of their Zen 2 production to console vendors has multiple advantages: 1.) Selling at cost (or more) is better than selling below cost, 2.) Lower end SKUs won't canibalize higher end SKU sales, 3.) Brand recognition of Zen 2 improves because you only have high end performing CPUs.

Say AMD decides their 3600 will run at 3.8GHz base. Everything that can't clock this high within the TDP will have to go in a lower bin (3500, 3300 or 3200). The ASP of these bins are roughly half that of the six core SKUs. AMD will also have a lot of chiplets with just four working cores. Now they have an abundance of lower performing chiplets which puts further downward pressure on these lower end SKUs, and price might well drop below cost.

If, instead, AMD sells those working, but lower performing, 6-core chiplets off to SONY and MS with a guarantee they'll run at 3GHz and at a slight profit, they are much better off. Not only do they make a small profit selling to the console vendors, but they also end up with fewer SKUs that has to be sold at bargain basement prices, improving margins.

The console vendors can then bin the CPUs according to power consumption and pair a high power consumption CPU with a lower power consumption GPU. MS already does this in the XB1X, pairing high power consumption APUs with lower consumption system hardware and vice versa.

Cheers

Indeed, so I guess the question is how low can AMD go with the HW spec if certain things are a must for the console request.

It’s going to be a long generation again, and I believe it might be very prudent to go with a wide design initially (8c/16t) then deal with clocks and TDP over time as that will set the baseline for game engine threading practices while allowing for much easier upgrade SKUs down the line.

I could be wrong about whether devs can find enough work for that many threads subtract reserved thread/cache (I don’t know what that means for Zen, specifically given the quad core arrangements).

Idk, I was thinking wafer defects wouldn’t be too problematic come 2020 in terms of getting functional cores instead of having to disable them ala GPU shaders.
 
Not Zen. If tick tock was a thing for AMD, Zen and Zen 2 would be that.


Because fabrication time is a resource. If all the fabs are producing low margin items when you could be gaining more margin, that is an obvious opportunity cost. There's a very specific reason that Tesla sold only the high end Model 3s before selling the low end ones. You have only so much available fabrication time, and time is money. While your churning out low margin chips, that same fabrication time and silicon could be use to sell high margin EPYCs. High margin products is where a healthy business is at, and I can't see AMD signing up for yet another 7 years of super low margins - you're basically wasting time and AMD is desperate need of money. High margin products will have the priority for sure. They can only take advantage of this period where Zen can compete with intel for so long before intel comes back with something.


Because we need some anchor point for reality.


A move to 512-bit vectors is a big change. That's double the bandwidth effectively from 256 bit vectors.

It is not AMD producing the APU of console but TSMC. AMD is just selling IP to Sony and MS and they build it with TSMC.

AMD has another contract with TSMC to build Zen 2 and probably Navi GPU.
 
Indeed, so I guess the question is how low can AMD go with the HW spec if certain things are a must for the console request.

It’s going to be a long generation again, and I believe it might be very prudent to go with a wide design initially (8c/16t) then deal with clocks and TDP over time as that will set the baseline for game engine threading practices while allowing for much easier upgrade SKUs down the line.

I could be wrong about whether devs can find enough work for that many threads subtract reserved thread/cache (I don’t know what that means for Zen, specifically given the quad core arrangements).

Idk, I was thinking wafer defects wouldn’t be too problematic come 2020 in terms of getting functional cores instead of having to disable them ala GPU shaders.

7nm still has the most process steps of any node before it as these iterative things go. All chances to mess up a wafer.

7nm+ helps by reducing steps, but I suppose the cost is the pellicle issue.
 
Last edited:
t is not AMD’s resource, though. It’s essentially a commodity until console volumes affect AMD’s cost for wafers on their non semi-custom products

Right. It is a shared commodity, I call it AMDs' resource because they are competing with their own designs here. They should know the pricing per chip that both Sony and MS should be paying and they should know the approximate volumes that they intend to produce. Each fabrication company will have limits on how many 7nm chips they can produce, and that has to be considered into pricing.
Why? Zen 2 is being used in the third iteration of the product. You had Zen, then Zen+ as the "Tick-Tock".
forgot about Zen+
 
Right. It is a shared commodity, I call it AMDs' resource because they are competing with their own designs here. They should know the pricing per chip that both Sony and MS should be paying and they should know the approximate volumes that they intend to produce. Each fabrication company will have limits on how many 7nm chips they can produce, and that has to be considered into pricing.

forgot about Zen+

But the only factors that drive that are die size and expected yield. It doesn’t matter what the IP is.
 
But the only factors that drive that are die size and expected yield. It doesn’t matter what the IP is.
run length. it takes 11-15 weeks from start to finish for silicon chips per run? (is that the right term?)
If fabrication space is unlimited this is a non issue. If the facility is full or there are delays for whatever reason, you can't push your chips through. We see delays all the time.
From TSMC:
https://www.tsmc.com/english/dedicatedFoundry/manufacturing/gigafab.htm
Maintaining dependable capacity is a key part of TSMC's manufacturing strategy. The Company currently operates three 12-inch GIGAFAB® facilities – Fabs 12, 14 and 15. The combined capacity of the three facilities exceeded 7 million 12-inch wafers in 2017. Production within these three facilities supports 0.13μm, 90nm, 65nm, 40nm, 28nm, 20nm, 16nm, 10nm, and 7nm process technologies, including each technology's sub-nodes. An additional portion of the capacity is reserved for R&D work on leading-edge manufacturing technologies, which currently supports the technology development of the 5nm node and beyond.

TSMC has developed a centralized fab manufacturing management system, Super Manufacturing Platform (SMP), to provide customers with greater benefits in the form of more consistent quality and reliability, improved flexibility to cope with demand fluctuations, faster yield learning and time-to-volume, and lower-cost product requalification.

So what happens when Sony and MS take up 1 GigaFab each? There 1 left that may or may not be occupied. If it is occupied and AMD does not want to delay 13+ weeks to release, they must get into the Mega/Mini Fabs, that's going to cost more and have less. Thus if AMD is getting royalties on the SoC per chip made, they are flooding their own ability to produce high margin products properly.
r2ytYy3.png
 
run length. it takes 11-15 weeks from start to finish for silicon chips per run? (is that the right term?)
If fabrication space is unlimited this is a non issue. If the facility is full or there are delays for whatever reason, you can't push your chips through. We see delays all the time.
From TSMC:
https://www.tsmc.com/english/dedicatedFoundry/manufacturing/gigafab.htm
Maintaining dependable capacity is a key part of TSMC's manufacturing strategy. The Company currently operates three 12-inch GIGAFAB® facilities – Fabs 12, 14 and 15. The combined capacity of the three facilities exceeded 7 million 12-inch wafers in 2017. Production within these three facilities supports 0.13μm, 90nm, 65nm, 40nm, 28nm, 20nm, 16nm, 10nm, and 7nm process technologies, including each technology's sub-nodes. An additional portion of the capacity is reserved for R&D work on leading-edge manufacturing technologies, which currently supports the technology development of the 5nm node and beyond.

TSMC has developed a centralized fab manufacturing management system, Super Manufacturing Platform (SMP), to provide customers with greater benefits in the form of more consistent quality and reliability, improved flexibility to cope with demand fluctuations, faster yield learning and time-to-volume, and lower-cost product requalification.

So what happens when Sony and MS take up 1 GigaFab each? There 1 left that may or may not be occupied. If it is occupied and AMD does not want to delay 13+ weeks to release, they must get into the Mega/Mini Fabs, that's going to cost more and have less. Thus if AMD is getting royalties on the SoC per chip made, they are flooding their own ability to produce high margin products properly.
r2ytYy3.png
Again, this is not IP dependent. It doesn’t matter if they’re Zen 2 or Zen 1 or whatever. Console manufacturers want 7nm. If AMD wants to play games and forego billions in revenue to protect their server and desktop markets, more power to them I guess. MS or Sony can go buy chips from Intel or IBM/ARM and use those same 7nm fabs anyway in the case of the latter. Clearly that’s not what’s happening.
 
run length. it takes 11-15 weeks from start to finish for silicon chips per run? (is that the right term?)
If fabrication space is unlimited this is a non issue. If the facility is full or there are delays for whatever reason, you can't push your chips through. We see delays all the time.
From TSMC:
https://www.tsmc.com/english/dedicatedFoundry/manufacturing/gigafab.htm
Maintaining dependable capacity is a key part of TSMC's manufacturing strategy. The Company currently operates three 12-inch GIGAFAB® facilities – Fabs 12, 14 and 15. The combined capacity of the three facilities exceeded 7 million 12-inch wafers in 2017. Production within these three facilities supports 0.13μm, 90nm, 65nm, 40nm, 28nm, 20nm, 16nm, 10nm, and 7nm process technologies, including each technology's sub-nodes. An additional portion of the capacity is reserved for R&D work on leading-edge manufacturing technologies, which currently supports the technology development of the 5nm node and beyond.

TSMC has developed a centralized fab manufacturing management system, Super Manufacturing Platform (SMP), to provide customers with greater benefits in the form of more consistent quality and reliability, improved flexibility to cope with demand fluctuations, faster yield learning and time-to-volume, and lower-cost product requalification.

So what happens when Sony and MS take up 1 GigaFab each? There 1 left that may or may not be occupied. If it is occupied and AMD does not want to delay 13+ weeks to release, they must get into the Mega/Mini Fabs, that's going to cost more and have less. Thus if AMD is getting royalties on the SoC per chip made, they are flooding their own ability to produce high margin products properly.
r2ytYy3.png

Read or watch the last TSMC talk to investor they give a guidance with less revenue because of slowdown in the smartphone industry.
 
Again, this is not IP dependent. It doesn’t matter if they’re Zen 2 or Zen 1 or whatever. Console manufacturers want 7nm. If AMD wants to play games and forego billions in revenue to protect their server and desktop markets, more power to them I guess. MS or Sony can go buy chips from Intel or IBM/ARM and use those same 7nm fabs anyway in the case of the latter. Clearly that’s not what’s happening.
Once again, it comes back to just basic economic principles. if the price is high you're going to get less demand, if the price is low very low you're going to get very high demand.

$399 was a dream price point because of the number of fabs that could produce 28nm sufficiently well. If the price is $499 or $599 then the demand drops until price goes down. It's the only sensible thing for limited Fab capacity.

~100K 300mm chips per month for each GigaFab, unless I'm reading that wrong. That's 10 months of dedication to get to 1 million chips. If you go the chiplet route, you've got 3x the number of chips to deal even though you can cut more per wafer it's obviously not that straight forward because a completed unit will require 1 CPU 1 Control 1 GPU.

GoFlo is running at ~60K chips per month for 7nm if I read their info correctly. Both Sony and MS sell more than 100K consoles monthly currently in North America alone.

You tell me how AMD should play this, because they still have to fulfill EYPC orders and MS spends $1B per month on data centre builds.
 
Once again, it comes back to just basic economic principles. if the price is high you're going to get less demand, if the price is low very low you're going to get very high demand.

$399 was a dream price point because of the number of fabs that could produce 28nm sufficiently well. If the price is $499 or $599 then the demand drops until price goes down. It's the only sensible thing for limited Fab capacity.

~100K 300mm chips per month for each GigaFab, unless I'm reading that wrong. That's 10 months of dedication to get to 1 million chips. If you go the chiplet route, you've got 3x the number of chips to deal even though you can cut more per wafer it's obviously not that straight forward because a completed unit will require 1 CPU 1 Control 1 GPU.

GoFlo is running at ~60K chips per month for 7nm if I read their info correctly. Both Sony and MS sell more than 100K consoles monthly currently in North America alone.

You tell me how AMD should play this, because they still have to fulfill EYPC orders and MS spends $1B per month on data centre builds.

The area of a 300mm diameter wafer is: Pi*r^2, or around 70k mm2, A factor of 233 off your production calculation, i.e. your 10 months to accumulate enough chips are (per your estimates, not mine) 2 days.

The total monthly raw production is 7 billion mm2.
 
The area of a 300mm diameter wafer is: Pi*r^2, or around 70k mm2, A factor of 233 off your production calculation, i.e. your 10 months to accumulate enough chips are (per your estimates, not mine) 2 days.

The total monthly raw production is 7 billion mm2.
ah shit, right I'm way off there. missed the ^2 part, wasn't there. I was thinking this number was way too low.

well carry on then. we can ignore my points.
 
Last edited:
Also, for those thinking that Simon Pilgrim is working on tools or something not directly related to game code:

An R&D engineer with an academic background in 3D Graphics, SIMD + Low Level Programming, Image Processing, Computer Vision and Speech/Language Processing fields. My work to date has been focussed on the media, entertainment and broadcasting sectors. Currently developing high performance maths, geometry and character animation systems for Sony Playstation 3, 4 and Vita platforms.

And here is Andrea Di Biago’s LinkedIn summary:

Working on the official compilers developed by Sony Interactive Entertainment for their modern gaming platforms.
Since January 2012, I have been working on the official PS4 compiler.
Previously worked on the official PS3 and PSVita compilers.
 
Last edited:
Status
Not open for further replies.
Back
Top