Predict: The Next Generation Console Tech

AlNom · Dec 9, 2011

TheChefO said:
Thanks for the link!

Good breakdown of utilization.

14% of general purpose 55nm production was for GPU's

Just for GP. If you go back to the previous page, they have figures for all of 65/55nm tapeouts, and PC GPUs are pretty insignificant in the grand scheme.

TheChefO · Dec 9, 2011

AlStrong said:
Just for GP. If you go back to the previous page, they have figures for all of 65/55nm tapeouts, and PC GPUs are pretty insignificant in the grand scheme.

One thing the chart didnt show was if we were talking about total units shipped or die-size/capacity used.

Smaller chips (mobile) are much smaller than GPUs and thus with the same percentage of capacity, the number of chips would be significantly higher.

Not that I doubt the numbers, (I'm sure there are many more cell phones shipped than dedicated GPU's) but it would be good to have that clarified.

Looking at "pc graphics" at 2% vs "pc chipset" at 1% leads me to believe maybe it is capacity, but I can't say for sure.

Are there really that few PC chipsets being sold?

anexanhume · Dec 9, 2011

AlStrong said:
Why not? MS always wanted OOOE for 360, but they had a crazy rushed schedule.

Gubbi said:
Does anybody even design in-order CPUs anymore ? Low performance crap for the mobile market not withstanding.

Even ARM went OOO with Cortex A9/A15. It is simply the most power efficient way to increase performance.

Cheers

Agreed. I'd bet the farm that even for gaming workloads, going OOO will result in more IPS than a clock increase ever would for an IO and probably does it more efficiently if you look at instructions/second per watt.

TheChefO · Dec 10, 2011

Not to disagree with the notion of OoOE, (and pardon my ignorance of the situation) but isn't the performance advantage of OoOE over IOE nullified when developers have control of what code is run when through the CPU?

Not saying there wouldn't be stalls, but couldn't these be ironed out in the development process?

If so, aren't in-order cores smaller and able to scale to higher frequencies?

To quote Guden Oden:

I am also sure that we'll never see a gen of consoles where either Sony or MS drops the execution units for more OoOE tho. That's oldfashioned these days, and won't lead the way to our glorious computing revolution future!

http://forum.beyond3d.com/showthread.php?t=33335

I'm not convinced either way, but IMO it will be of minimal importance as I see the GPGPU aspect of nextgen playing a more important role with the CPU being a modest ~4x bump over existing xb360/ps3 CPU's instead of the expected 10x of the new GPU's.

anexanhume · Dec 10, 2011

TheChefO said:
Not to disagree with the notion of OoOE, (and pardon my ignorance of the situation) but isn't the performance advantage of OoOE over IOE nullified when developers have control of what code is run when through the CPU?

Not saying there wouldn't be stalls, but couldn't these be ironed out in the development process?

If so, aren't in-order cores smaller and able to scale to higher frequencies?

To quote Guden Oden:

http://forum.beyond3d.com/showthread.php?t=33335

I'm not convinced either way, but IMO it will be of minimal importance as I see the GPGPU aspect of nextgen playing a more important role with the CPU being a modest ~4x bump over existing xb360/ps3 CPU's instead of the expected 10x of the new GPU's.

Let me ask this question. Would you rather game developers spend all their time making their code as linear as it can possibly be, or would you rather they spend time coding the core elements of gameplay?

Acert93 · Dec 10, 2011

TSMC Vows to Ramp Up 28nm Production in 2012, Start 20nm Manufacturing in 2013.

If 2013 is a console target date (big if) and with the uncertainty of process shrink (14nm looks like it will happen, all bets off after that?) it may not be a bad strategy to:

1.) Launch in fall 2013 with a "robust" offering on 28nm. Strong demand (8 years between gens) should allow Xbox 360 like pricing ($399) out of the gate.

2.) 2014 reduction to 20nm for one or both main chips. This would reduce costs (get closer to breaking even) or possibly do earlier incremental price reductions based on consumer demand/competition.

3.) Keep an eye on 450mm wafer production and 14nm process node roll out in 2017-2018.

TheChefO · Dec 10, 2011

anexanhume said:
Let me ask this question. Would you rather game developers spend all their time making their code as linear as it can possibly be, or would you rather they spend time coding the core elements of gameplay?

I suppose this would largely depend on how many man hours we are talking.

From what I understand, the majority of a games budget is in asset creation. Thus, a bit more time to get code running smoothly for a more powerful IOE CPU than a less powerful OoOE (given the same die budget) is acceptable.

I may be underestimating exactly how many man hours go into this streamlining/optimizing process though, so I heed to those in the know.

TheChefO · Dec 10, 2011

Acert93 said:
TSMC Vows to Ramp Up 28nm Production in 2012, Start 20nm Manufacturing in 2013.

Based on certain assumptions, it is believed that the 24 thousand wafers output for 28nm process tech will be reached by Q3 2012. By the end of the year the 28nm production capacity is expected to further increase to around 50 thousand.

Conflicting reports on 28nm wafers per month:

November_25th said:
The company has capacity to make 20,000 wafers per month, but in Q1 2012 it will open the new Fab 15.

http://www.nordichardware.com/news/...higher-demand-for-28nm-unable-to-deliver.html

AlphaWolf · Dec 10, 2011

opening a fab does not necessarily equate to said fab being at full capacity

TheChefO · Dec 10, 2011

AlphaWolf said:
opening a fab does not necessarily equate to said fab being at full capacity

Sure but one report says they can make 20k/mo now.

Other report says 24k/mo by q3/2012 (9 months from now).

AlphaWolf · Dec 10, 2011

CEO of TSMC said:
According to the company’s chairman and chief executive officer (CEO), Morris Chang, the company’s 28nm chip output will reach 24,000 wafers a month in 2012.

I think that's what I'd be inclined to believe.

http://cens.com/cens/html/en/news/news_inner_38603.html

The 100k number is probably when the new fab is at full capacity some time in the distant future.

homerdog · Dec 10, 2011

TheChefO said:
Not to disagree with the notion of OoOE, (and pardon my ignorance of the situation) but isn't the performance advantage of OoOE over IOE nullified when developers have control of what code is run when through the CPU?

Not saying there wouldn't be stalls, but couldn't these be ironed out in the development process?

This question should be answered by a programmer type, but from what I understand most programmers would prefer and OoOE CPU with slightly fewer cores to an in order design with slightly more cores.

As anexanhume said (btw welcome to B3D!), it's somewhat of a cost issue. The time spent trying to program around IOE pitfalls could be spent on other more productive efforts. Besides, for some things an OoO design will always be superior. This is evidenced by console ports running on PC. Console game code is obviously optimized for in order CPUs, yet my old ass 2.7GHz C2D E6750 chews through console ports with ease, in many cases at double the framerate of the XB360 versions. Obviously the E6750 carries a lot of advantages over Xenos other than OoOE, but OoOE seems to be a large factor in terms its IPC advantage.

tunafish · Dec 10, 2011

TheChefO said:
Not to disagree with the notion of OoOE, (and pardon my ignorance of the situation) but isn't the performance advantage of OoOE over IOE nullified when developers have control of what code is run when through the CPU?

In a cached architecture, no. Because memory latency is unpredictable, making it impossible to do proper in-order scheduling.

Not saying there wouldn't be stalls, but couldn't these be ironed out in the development process?

What stalls and when depends on every line of code ran before it since boot.

Even for tight loops, hand-scheduling is at least an order of magnitude more work than just coding it up in C++. There's no chance at all of them using this kind of optimization for normal game script.

If so, aren't in-order cores smaller and able to scale to higher frequencies?

Yes, but not by much.

Barbarian · Dec 10, 2011

TheChefO said:
From what I understand, the majority of a games budget is in asset creation. Thus, a bit more time to get code running smoothly for a more powerful IOE CPU than a less powerful OoOE (given the same die budget) is acceptable.

1) Engineers that know how to properly optimize for In-Order cores are very expensive and hard to find.
2) Certain In-Order designs (such as the X360/PS3 PPU) have fundamental flaws that no matter how clever the engineer or time spend optimizing will still stall all over the place.
3) Out-of-Order cores tend to deal much better with compiler generated code - especially true for crappy compilers
4) Out-of-Order cores tend to run legacy code better (and who doesn't have legacy code these days)

So yeah, as a developer I'll take an Out-of-Order core any day.

MfA · Dec 10, 2011

tunafish said:
In a cached architecture, no. Because memory latency is unpredictable, making it impossible to do proper in-order scheduling.

It also tends to be an order of magnitude larger than the ability of OoOE to schedule around, OoOE deals with stuff like branches and differences in cache latency ... not memory latency. That is what we have vertical multithreading for.

tunafish · Dec 10, 2011

MfA said:
It also tends to be an order of magnitude larger than the ability of OoOE to schedule around, OoOE deals with stuff like branches and differences in cache latency ... not memory latency. That is what we have vertical multithreading for.

Specifically, OoOE is usually capable of masking L2. Which alone is huge.

Acert93 · Dec 10, 2011

Barbarian said:
1) Engineers that know how to properly optimize for In-Order cores are very expensive and hard to find.
2) Certain In-Order designs (such as the X360/PS3 PPU) have fundamental flaws that no matter how clever the engineer or time spend optimizing will still stall all over the place.
3) Out-of-Order cores tend to deal much better with compiler generated code - especially true for crappy compilers
4) Out-of-Order cores tend to run legacy code better (and who doesn't have legacy code these days)

So yeah, as a developer I'll take an Out-of-Order core any day.

Thanks for the feedback Barbarian!

TheChefO · Dec 10, 2011

AlphaWolf said:
I think that's what I'd be inclined to believe.

http://cens.com/cens/html/en/news/news_inner_38603.html

The 100k number is probably when the new fab is at full capacity some time in the distant future.

Somewhere in those reports something is getting lost in translation.

Perhaps 20k now on a temporary/experimental line which will be moved to 20nm as soon as fab15 phase 1 is ready in q1 (24k/mo) then in q4, phase 2 is ready bringing the total to 50k (as stated in the report).

I just find it hard to believe that in 9 months time, they will only be able to add 4k/mo wafers to their current production of 20k/mo for a total of 24k/mo by q3/2012.

Something is missing there.

TheChefO · Dec 10, 2011

Acert93 said:
Thanks for the feedback Barbarian!

Yes, Thanks Barbarian, tunafish, and Mfa!

The question is, does anyone know or have reference to how much die area/transistors it would take to "add" OoOE (more likely borrow another OOE Power design) and how would that compare to higher thread count (4 instead of 2 per core) cpus and what would be the pros and cons of either approach?

Gubbi · Dec 10, 2011

MfA said:
It also tends to be an order of magnitude larger than the ability of OoOE to schedule around, OoOE deals with stuff like branches and differences in cache latency ... not memory latency. That is what we have vertical multithreading for.

True.

But there are really two advantages to OOOE. The first is obvious, execution continues after a data dependency, that would stall an in-order CPU, - until the scheduling window resources are exhausted. A fair amount of instructions executed will be memory ops, and some of these will miss in the on-die caches. The second advantage is thus that you get multiple memory ops going in parallel, whereas with an in-order it would basically be sequential for each memory request that misses the on-die caches.

Cheers

Predict: The Next Generation Console Tech

AlNom

Moderator

TheChefO

anexanhume

TheChefO

anexanhume

Acert93

Artist formerly known as Acert93

TheChefO

TheChefO

AlphaWolf

Specious Misanthrope

TheChefO

AlphaWolf

Specious Misanthrope

homerdog

donator of the year

tunafish

Barbarian

MfA

tunafish

Acert93

Artist formerly known as Acert93

TheChefO

TheChefO

Gubbi

Similar threads