A 28nm GPU with a modest last gen GPU footprint (230mm^2) won't impress
A 28nm GPU with a modest last gen GPU footprint (230mm^2) will be have AMD HD 6870 (Barts) like performance?
Humor me for a moment with this punchy punch-line:
Next Gen (Xbox 3, PS4) graphics can be bought, right now, for about $155 (CPU not included). Yes, 2010 graphics chips may match 2013 consoles. And by the time the consoles launch in 2013 contemporary PC GPUs will be twice as fast and cost the same (or less!) than a new console. Punchy, right? Sadly the facts are trending this direction.
At face value it would seem a 28nm GPU, the guestimated target for next gen chips, could exceed 3GFLOPs (maybe even close in on 4) and 70GT/s texturing by simply moving a 40nm Barts AMD HD6850 (6870 with disabled units) down to 28nm but keeping the 255mm^2 footprint. Such a design would not be a top of the line 2013 GPU but it would be quite competitive. 28nm should double density (right?), offer more frequency (right??), and a big reduction in power draw (right???) … but reality isn’t as sweet. This is one reason I am not super excited about a 28nm console. I think console makers are looking at slightly reducing their silicon footprints from last gen and with additional chip manufacturing issues (and eye toward future reduction) and the dirty details about what a node reduction spell out, in my below math, a 2GFLOPs and 50GT/s GPU on 28nm (roughly AMD HD6870 performance—a far cry from the 3+ GFLOPs 70GT the above simple projection would indicate).
Persuade me: Give me intelligent reasons why I should keep my hopes up for a 3+ GFLOPs monster console GPU at 28nm.
Until then, let me convince you, and depress you, that a 28nm GPU in 2013 in a console is going to be no 2x 6850 but instead a single 6870-like chip.
Let’s start with budget. Last gen was about 230-260 mm^2 range at launch for GPUs in a console. We should consider this the upper bounds for silicon next gen as processes haven’t reduced chip costs significantly and the advent of motion controls and importance of storage media will be pressing on silicon budgets.
Let’s be conservative and see how a similar budget on 28nm would look like. Some basic information:
28nm is half the size for the finest geometries (e.g. SRAM) compared to 40nm. Logic is not as dense.
40nm is mature so 28nm won't be as robust, will be more expensive, and have lower yields. 80% scaling is optimistic IMO.
Architectural and efficiency differences aside (Xenos > RSX) last gen consoles look something like this (
from memory, and depending no how you count, so don’t shoot me as I know numbers below are wrong as I did this from memory on a lunch break but I wanted some context):
Code:
Model MHz area transistors GFLOP TMUS ROP
Xenos 500 230 232 240 16 8
RSX 500 255 300? 230? 24 8
Using AMD’s current (fall 2010) models as a baseline of what a modern GPU architecture and budgets looks like:
Barts: 255mm^2, 1700 transistors, VLIW5
Code:
Model MHz Shad TMU ROP GFLOP TDP
6790 840 800 40 16 1344 150
6850 775 960 48 32 1488 127
6870 900 1120 56 32 2016 151
Cayman: 389mm^2, 2640 transistors, VLIW4
Code:
Mod MHz Shad TMU ROP GFLOP TDP
6950 800 1408 88 32 2253 200
6970 880 1536 96 32 2703 250
- 6870 to 6850: 14% drop in Shaders, TMUs, and Frequency; 27% drop in GFLOPs
- 6970 to 6950: 9% drop in Shaders, TMUs, and Frequency; 17% drop in GFLOPs
Let’s acknowledge the following: the TDP scaling on PC GPUs doesn’t fit with a consoles metrics, silicon footprint of PC GPUs is far above the cost tolerances for consoles, 28nm won’t provide 100% real-word increase in transistor density, 28nm is going to be more expensive (yields, demand, general cost of progress, competition) in 2013 than 90nm was in 2005, the success of the Wii in the $250 price bracket has the console manufacturers more price sensitive (complicated issue), the cost of large standard storage and Kinect/Move like devices need to be compensated for in other aspects of the design, new technologies (stacked memory, Silicone Interposers, etc are not free), the RRoD/YLD and the mindfulness to decrease TDP/increase cooling through better coolers on original units* and increase volume, etc. Put all together a 300mm^2 GPU doesn’t seem to be a target console makers will be reaching for.
Assuming an AMD chip a major wildcard will be the transition to GCN / DX11.x+ architectures which will have additional feature costs and overhead not currently represented in the Barts/Cayman models. There will also be a transitioning from VLIW to SIMD (+ Scalar) with South Island (GCN;
http://www.anandtech.com/show/4455/amds-graphics-core-next-preview-amd-architects-for-compute/3).
Four major numbers to keep in mind: (1) a 10% reduction in area from 255mm^2 (Barts) to a more conservative 230mm^2; (2) the 15% redundancy seen in Barts models (6870 and 6850; note that Xenos already had redundancies like this so this will need to be factored into the smaller die size; 15% is aggressive as we see 9% in Cayman); (3) 22% drop (900MHz to 700MHz) in frequency, again aggressive but TDP is a major issue and seeing the TDP drop between a 6790 840MHz and 6850 775MHz even though a 6850 is a faster part; and (4) 80% scaling from 40nm to 28nm.
Applying these to a 40nm GPU 1 & 2 will result in about 25% reduction in functional units from a 6870 and with 3 we are looking at a total performance drop (about 1200GFLOPs) of nearly 40% from a 6870 (2016GFLOPs) and 19% from a 6850 (1488GFLOPs) to this hypothetical GPU on the 40nm process. Scaling first upward to 28nm (80% increase in density; 1.8 * 2016 = 3628GFLOPs) and then reducing for redundancy and the smaller die (about 25%; 3628 * .75 = 2721) arrives at about 2700GFLOPs. We are looking at a net functional unit scaling of a about 35% above a 6870. Reducing the frequency to a more reliable and power efficient 700MHz (22% drop) arrives at about 2100GFLOPs which is between today’s Barts 6870 and Cayman 6950.
Looking at some of these factors and expectations:
- 230mm^2 = smaller side of last gen GPU footprint which ranged from 230-260mm^2; 10% less area than Barts (6870, 255mm^2)
- 2.75B transistors = 230mm^2 Barts style GPU with 80% scaling from 40nm to 28nm
- 700MHz = Less than a Barts 6850 (770MHz, 127W TDP), 6870 (900MHz, 151W), Cayman 6950 (800MHz, 200W), and 6970 (880MHz, 250W). 28nm should bring a solid reduction in power but the increase in transistors is going to scale up power draw. A 6850 is a reasonable 127W considering the 128GB/s of memory bandwidth but a console will need to accommodate the optical drive, HDD, CPU and system memory, etc. With costs (yields/binning) and the RRoD (and YLD) firmly in memory conservative clocks will be likely although the “turbo” features in current GPUs indicates that 700MHz is on the very low end of what should be expected. Comparing a 6790 @ 840MHz 150W max TDP versus a 6850 @ 775MHz 127W max TDP indicates a chip with more functional units and more net performance uses less power than a chip with fewer units at higher frequency.
- 2100GFLOPs = 80% scaling from 40nm to 28nm minus ~ 10% for space reduction (Barts 255mm^2 to our 230mm^2), ~ 15% for redundancy, and ~ 22% reduction in frequency. GFLOPs may also be hit by the new SIMD+Scalar GCN architecture and DX11.1 overheard as well as additional raster pipelines; this may be higher due to many units not needing to scale and shaders are often easier to pack in. e.g. It is unlikely ROPs will scale from 32 to 64, there may even be a reduction to 24 or 16 ROPs on a console GPU, so this space may be utilized for more Shader units.
- 76 TMUs, 52.5GT/s = What is a TMU? I am basing this on Barts style TMUs with 56 TMUs at 35% increase of units. For comparison Cayman has 96. A 6870 (56@900MHz) is 50.4GT/s; 6970 (96@880MHz) is 84.5GT/s.
- 16 ROPs, 11.2GP/s = Or 24. A 6870 is 28.2GP/s (32 ROPs @ 900MHz). When Xenos and RSX shipped they had 8 ROPs when competing PC GPUs had 16. Consoles will have some limiting factors like targeting at most 2GPixel resolutions 1080p (and possibly 2 x 1080 with 3D / 2 player “split” screen) but most games will be 720p 30Hz scaled up to 1080p; there will also be a limiting factor of memory bandwidth. Consoles are about maximizing resources and 32 (or 64) ROPs doesn’t seem like an investment console designers will make when that area could be spent on more shaders.
I think this is on the conservative side. The CPU will be smaller than Xenon IMO (and most certainly CELL) and even this conservative GPU considers the fact more processing will be sent to the GPU.
I would like to think the above is wrong (or a reason not to have a new console on 28nm in 2013!) Putting this into perspective this theoretical GPU is a hair faster than a 6870 in GFLOPs and GT/s but not even half as fast in fillrate.
Code:
Mod MHz Shad TMU GT/s ROP GP/s GFLOP TDP
- 700 76 52.5 16 11.2 2100 -
6790 840 800 40 33.6 16 13.4 1344 150
6850 775 960 48 37.4 32 24.8 1488 127
6870 900 1120 56 50.4 32 28.8 2016 151
6950 800 1408 88 70.4 32 25.6 2253 200
6970 880 1536 96 84.2 32 28.2 2703 250
Simply put: I am not sure there is much stomach for future proofing. The early 2011 GPUs and the Fall 2012 GPUs are going to be a lot faster on the PC side. Further, I predict a mid-range PC GPU will be (1) faster and (2) cheaper. The Xbox and Xbox 360, even PS3, had relatively solid GPUs at the time of their launch but this does not seem likely. Cost, reliability, and the versatility of what a console needs to do shifts budgets.
As for cost I think the above leaves a lot of budget for a competitive prices console. One could argue to reduce things even further, but there is always a base cost and things only get so small. Looking at the retail costs of these models (6790 1GB $149, 6850 1GB $179, 6870 1GB $239, 6950 1GB $259, 2GB $299, 6970 2GB $299) and considering the fact both AMD and their distributors and the retailers indicates that even at the high end for a 1GB 255mm^2 chip ($239 retail) the actual cost is far below this and makes is a viable console part. As of today I see 6950’s 1GB at $220, 2GB $239, and 6870’s 1GB at $155 at NewEgg (11/29/2011).
There you have it folks: next gen graphics can be bought, right now, for about $155 (CPU not included).
If there is a glimmer of hope is that if AMD/Distributors/Retailers can all make their cut on a $155 product now, after a node reduction and going with a mild “loss leader” model you would think and hope that a $299-$399 console could pack in a lot more punch—but I don’t think the console makers are thinking along these lines.
And I haven’t even touched memory.
I will throw out this wild card: I think the smaller design above fits well with the cost considerations of stacked memory (higher performance, low power) and a Silicon Interposer (SI). The bigger your GPU the more expensive the SI as it has to fit the GPU and Memory.
We may see a setup with 1GB of very fast memory for the GPU (with an additional 2GB for system memory) and a GPU, following the above, but some concessions to get the GFLOPs up a bit as I think, at least for MS, a lot of processing will be moved to the GPU.
GPU Stats:
http://en.wikipedia.org/wiki/Comparison_of_AMD_graphics_processing_units
* As new models get new processes the cost of cooling will also go down. So an extra $5-$10 on better cooling (compared to a 360) on a launch unit is easier to justify as this cost will be reduced on new models where more aggressive cooling is not necessary.
Time to go back and compare my 2006 predictions!
Btw, this theoretical GPU, with 2GB UMA, would be about 10x faster in raw metrics (GFLOPs, Texturing, etc) and 4x increase in memory compared to current consoles. Factor in the 2.25x cost for 1080p (and double again for full 3D) and quite frankly: This is not really impressive. 28nm may not be dense enough to deliver a true *
traditional* next gen experience at the budgets console makers will likely be looking at. 20nm with FINFETs and the hopeful emergence of relatively affordable memory stacking may offer a huge jump over 28nm. The issue is TSMC probably won’t have solid product until 2015… if they don’t choose to chuck the roadmap. Again.
Ps- Sony/MS please, one of you, prove me wrong.