Predict: The Next Generation Console Tech

homerdog · Nov 28, 2011

hoho I think you're wasting your time I already tried.

Cheezdoodles · Nov 28, 2011

Dr Evil said:
These are good points. I don't think launching 2013 vs 2012 would give the manufacturers any significant benefits on what you can put into the boxes. Just like it didn't give any benefits in 2006 vs 2005 as far as the chips were concerned.
.

Wasn't that mostly a result of the ps3 having a outdated gpu?

function · Nov 28, 2011

If you look at the processors made on a given node, there are lots of cases where a more mature node sees processors made that offer bigger leaps in performance over the first chips on that node. For example:

- Athlon 64 X2 90nm to Athlon 64 X2 65nm offered basically no improvement in raw performance, but a small improvement in perf/watt.

- Athlon 64 X2 (65nm) to Phenom (65nm) was almost double the raw performance, despite no change in node.

- Phenom (65 nm) to Phenom 2 (45nm) was about 25% more performance.

- Phenom 2's slowly got faster until later in the node when Phenom 2 X6 (45nm) was released and was about 50% faster than the fastest chips earlier in the node.

In all these cases the biggest jump in performance came with larger and more advances processors later on in the same node. The idea that you have have to be in there day 1 on a new node to get a big jump simply isn't always true. MS managed that with the 360 at great cost - they couldn't test properly, it generated more heat than they predicted and they were probably supply constrained (if not by the gpu then they would have been if they hadn't been more severely constrained by memory or the CPU) .

Recent GPU history shows several similar occurrences with node transitions. Build a bigger, faster, smarter chip on a mature node and you can sometimes get a bigger jump than just going in a new node day 1.

Ailuros · Nov 28, 2011

function said:
Recent GPU history shows several similar occurrences with node transitions. Build a bigger, faster, smarter chip on a mature node and you can sometimes get a bigger jump than just going in a new node day 1.

That goes only for NVIDIA until GT200. Fermi/GF1xx was manufactured under TSMC 40G and upcoming Kepler/GK1xx will be manufactured at TSMC 28HP. On the other side of the river bank ATI/AMD is traditionally manufacturing its GPU chips for years now on the smallest available half or full node process.

If you consider that GF100 was a tad short of 530mm2@40nm it isn't too far away from the upper threshold for 40nm. It wasn't obviously possible to "squeeze" those 3 billion transistors into 55nm, let alone 65nm. The G80 times are long gone in the GPU world, especially as chips become consistantly bigger due to scaling demands both in capabilities and in performance.

Vice · Nov 29, 2011

Hum....

Pinches of salt at the ready,folks. According to the latest speculation, Microsoft's sequel to the Xbox 360 will actually be two models,a pared down set-top box for casual gamers and a heftier model for the hardcore.

If the set-top version is true... sounds like the precursor for cloud/digital distribution gaming. IMO, the two model approach sounds like a train wreck.

hoho · Nov 29, 2011

I'd say they just repackage current XB360 into a set-top and sell the next-gen system as a real gaming console.

DuckThor Evil · Nov 29, 2011

Cheezdoodles said:
Wasn't that mostly a result of the ps3 having a outdated gpu?

ATI wouldn't have had anything fancy over Xenos either. The products that were better were also bigger and not suitable for a console.

DuckThor Evil · Nov 29, 2011

function said:
If you look at the processors made on a given node, there are lots of cases where a more mature node sees processors made that offer bigger leaps in performance over the first chips on that node. For example:

- Athlon 64 X2 90nm to Athlon 64 X2 65nm offered basically no improvement in raw performance, but a small improvement in perf/watt.

- Athlon 64 X2 (65nm) to Phenom (65nm) was almost double the raw performance, despite no change in node.

- Phenom (65 nm) to Phenom 2 (45nm) was about 25% more performance.

- Phenom 2's slowly got faster until later in the node when Phenom 2 X6 (45nm) was released and was about 50% faster than the fastest chips earlier in the node.

In all these cases the biggest jump in performance came with larger and more advances processors later on in the same node. The idea that you have have to be in there day 1 on a new node to get a big jump simply isn't always true. MS managed that with the 360 at great cost - they couldn't test properly, it generated more heat than they predicted and they were probably supply constrained (if not by the gpu then they would have been if they hadn't been more severely constrained by memory or the CPU) .

Recent GPU history shows several similar occurrences with node transitions. Build a bigger, faster, smarter chip on a mature node and you can sometimes get a bigger jump than just going in a new node day 1.

I'm sorry but those examples are if not terrible then not very good either. Yeah If you just transfer your existing design to a smaller process, you'r not going to see huge gains. Also going from dual core to quad core is not hard to see "double the raw performance" likewise 50% with 6 core > quad core...
AMD did see some nice die size shrinks though.

Also MS wasn't supply constrained to any meaningful degree for longer than a month or two, despite their hardships that imo had nothing to do with the chips, but errors in the cooling and manufacturing. Launch PS3 used even more power.

Recent GPU history doesn't really back that up at all. yeah there has been some room for tweaking but for example GTX 580 uses basically the same chip as the one in GTX 480, which is a huge chip and thus running into problems with it was more likely than with a smaller chip. On the AMD side, Cayman 6970 is only a little bit faster than the 5870, which was released two years ago when 40nm was the bleeding edge. Caymans gains also basically came from being bigger. AMD hasn't really been able to get any sort extra performance from tweaks during the 40nm era and the next leap in performance comes due to the 28nm process shrink. Imo the situation has been the same pretty much the whole time. yes there are few examples when the initial design has been so tiny that there has been plenty of room to increase the die size(3870>4870 or G80), but those times are long gone. The gains during the last two years on 40nm have been miniscule.

+ there are countless better examples showing the opposite of what you'r implying without having to resort to doubling the die size etc.

TheChefO · Nov 29, 2011

upnorthsox said:
A 500m transistor CPU? I didn't know you were being so generous. Then:

2 x 165 mil Xenon + 20% OoO + 15% expanded cache = 445 mil

If I'm high on my OoO and cache %'s you could possibly even add 12mb of L3 edram (1mb x 12 threads) and still make 500 mil tranny's.

Indeed...

TheChefO said:
Assuming 32/28nm launch in 2012 would yield 8x trans count, this would amount to a budget of roughly 4billion (497m x8 = 3,976m) if we are to assume equal budget/process node.

This leads to some pretty interesting potential hardware:

With that budget, MS could extend the xb360 architecture to the following:

10MB EDRam (100m) => 60MB EDRam (600m) Enough for a full 1080p frame buffer with 4xaa

3 core xcpu (165m) => 9 core xcpu (495m) - or an upgraded 6 core PPE with OoOe and larger cache along with an ARM core (13m trans)

This leaves a hefty 2.8b trans available for xgpu which could accommodate 3x AMD ~6770 (1040m) ~3 teraflops or 4x AMD ~6670 (716m) ~2.8 teraflops.

Such small, modular chips would enable good yields on new(er) processes until they were mature enough to combine together and eventually integrate to an APU.

As for fitting the 28nm CPU in the ~40watt budget, I'm not seeing how this would be an issue.

There aren't any cpus built on the 28nm process to compare, but there are a few built on 32nm.

Intel has a few, but I'm not going down that road again. Matter of fact forget I even mentioned it!

AMD has only the Fusion line to compare (I refuse to look at bulldozer any more than I have to!)

This leaves only the Llano to compare to.

Llano is a 1.45 billion transistor chip. CPU and GPU combined so it is very difficult to draw a direct comparison, but AMD does have one of these 1.45billion transistor (3 times my target size xb720 cpu) that operates at under 45watts and on 32nm to boot.

Granted, it's binned for mobile, but it is 3 times(!) larger and on 32nm instead of 28nm.

Other comparison:

http://www.anandtech.com/show/3774/welcome-to-valhalla-inside-the-new-250gb-xbox-360-slim/3

XB360s on 45nm.

Total system power draw 90watts at full load.

90watts x 80% = 72watts total system budget. Minus out the disc drive (13watts) and "other" power draws (9 watts?) puts the xcgpu at 50watts

Taking the traditional approach: 50watts @ 45nm = 25watts @ 32nm

Granted, it isn't likely to be perfectly linear, but it is also a smaller node (28nm vs 32nm) which should get us pretty darn close.

Conclusion:

Current xcgpu (~500 million transistors) power draw is 50watts@45nm.
Projected power draw 32nm = 25watts

I think it's safe to assume they can (comfortably) hit under ~40 watts for their next-gen 500million transistor cpu at 28nm.

TheChefO · Nov 29, 2011

hoho said:
And yet again, what Intel can do most (no-one) others can't.

...see above

TheChefO · Nov 29, 2011

hoho said:
I'd say they just repackage current XB360 into a set-top and sell the next-gen system as a real gaming console.

Indeed!

I've been pitching this idea for years.

MS has been taking baby steps in this direction lately, but I'm not seeing enough effort/investment on their part to really take this opportunity and run with it.

Maybe this will change now with Kinect and xb720 around the corner.

TheChefO · Nov 29, 2011

Dr Evil said:
I'm sorry but those examples are if not terrible then not very good either. Yeah If you just transfer your existing design to smaller process, you'r not going to see huge gains. Also going from dual core to quad core is not hard to see "double the raw performance" likewise 6 core > quad core...
AMD did see some nice die size shrinks though.

Also MS wasn't supply constrained to a any meaningful degree for longer than a month or two, despite their hardships that imo had nothing to do with the chips, but errors in the cooling and manufacturing. Launch PS3 used even more power.

Recent GPU history doesn't really back up that at all. yeah there has been some room for tweaking but for example GTX 580 uses basically the same chip as GTX 480, which is a huge chip and thus running into problems with is more likely than with a smaller chip. On the AMD size Cayman 6970 is only little bit faster than 5870, which was released two years ago when 40nm was the bleeding edge. Caymans gains also basically came from being bigger. Basically AMD hasn't been able to get any sort extra performance from tweaks during the 40nm era and the next leap in performance comes due to the 28nm process shrink. Imo the situation has been the same pretty much the whole time. yes there are few examples when the initial design has been so tiny that there has been plenty of room to increase the die size(3870>4870 or G80), but those times are long gone. The gains during the last two years on 40nm have been miniscule.

+ there are countless better examples showing the opposite of what you'r implying without having to resort to doubling the die size etc.

Exactly.

I don't think anyone would argue against the notion that a process node usually improves over time which allows for higher clocking, or lower power at the same clock.

However, it would be the first time in history (that I'm aware of) if roughly the same design (trans count) was lower power on the old node vs the newer one.

It's all about what is available in the short term.

2012-2013 is the time-frame.

28nm is as good as it gets for MS/Sony in that time frame. Teething problems (low yield) are expected at the introduction of a new process node, thus allowing for "later gains on the same process node" which is usually just taking advantage of the node's full capability.

Chips shipping now on 28nm will probably not be able to hit target spec for either speed, or TDP. But the process will get better.

AMD and Nvidia are both expected to launch new GPU's early next year (Q1) using 28nm.

As the process matures, better yields should surface (~mid 2012), just in time for mass production of ps4 and xb720 to launch late 2012.

If there is a problem with the 28nm process, obviously the launch window(s) will be pushed back, but things seem to be going rather smoothly for TSMC as they claim to be shipping all variants of 28nm for over a month now.

hoho · Nov 29, 2011

TheChefO said:
...see above

exactly. Especially about the part where real-world performance was compared

TheChefO · Nov 29, 2011

hoho said:
exactly. Especially about the part where real-world performance was compared

function · Nov 29, 2011

Dr Evil said:
I'm sorry but those examples are if not terrible then not very good either.

The general conversation was about CPUs. That was AMDs almost complete recent CPU history. AMDs fab is now Global Foundries, a foundry that TheChefO has been pimping heavily. Those example are the best and most appropriate ones available IMO. If you have any better or more appropriate examples then hey, I'd love to see them.

To round out AMDs entire recent lineup of CPU (just to make absolutely sure that under no circumstances can I be accused of cherry picking):

- Athlon 64 X4 (45nm) to Llano (32nm) showed a small improvement in perf/watt, no real increase in performance, and massive, massive yield issues that are unresolved over half a year later (after an initial delay of over half a year).

- Phenom 2 X6 (45nm) to Bulldozer (32 nm). OH. DEAR.

The jumps in performance across node transitions have been smaller than the increases that occurred during the lifetime of the node.

Oh yeah, WiiU CPU: 45nm. There's got to be a reason for this!

Yeah If you just transfer your existing design to a smaller process, you'r not going to see huge gains.

And yet this is what is commonly done. You want to ignore practice in favour of your theory - a theory which wants us ignore the reality that clocks and chip sizes typically increase as the node progresses.

I can't understand for the life of me why you'd claim that isn't relevant to this discussion. It's entirely relevant. It goes to the very heart of discussion.

Also going from dual core to quad core is not hard to see "double the raw performance" likewise 50% with 6 core > quad core...

And for AMD this has always happened on a mature node. Until Bulldozer.

Now lets look at Intel (from memory, tell me if I'm wrong):

- Single core to dual core: done on the same node (65nm)
- Dual core to quad core: done on the same node (45nm)
- Quad core to hex core: done on the same node (32nm)

Just like AMD (Bulldozer excepted). The real leaps in CPU power come on the same node, which is completely the opposite of what TheChiefO claimed. And we're not even counting the clock speed increases - the raw performance jump from the first Phenom 2 X4 to the last Phenom 2 X6 is really quite big.

Also MS wasn't supply constrained to any meaningful degree for longer than a month or two,

This isn't true. MS launched with a piddling amount of consoles instead of millions they could have sold on day one. Supply was constrained in at least the UK and America until after Christmas.

Recent GPU history doesn't really back that up at all. yeah there has been some room for tweaking but for example GTX 580 uses basically the same chip as the one in GTX 480, which is a huge chip and thus running into problems with it was more likely than with a smaller chip.

So you're more likely to run into problems with a big chip early on in the node, but this is ... somehow ... not ... relevant? C'mon man, think about it.

The 480 was late and hot with poor yields. The 580 was a very similar chip but came when the process was more mature. The 580 was 10 - 20% faster in games while using 10 - 20% less power and yields were vastly better. All on a process where there was apparently no difference start to finish!

A chip like the 480 (late, hot, slow) would be an unmitigated disaster for a console vendor.

Caymans gains also basically came from being bigger.

A bigger chip later on?!?

The gains during the last two years on 40nm have been miniscule.

The gains in the first 12 months were fairly big.

March 2009: Mobility Radeon 4830
April 2009: Radeon 4770
September 2009: Radeon 5870
January 2010: Mobility Radeon 5xxx

So small chips to start, then six months in the first performance part, then 10 months in the mobile, power efficient performance parts. That is not a surprising pattern.

+ there are countless better examples showing the opposite of what you'r implying without having to resort to doubling the die size etc.

Oh man!

Later in the node you can get acceptable yields on bigger chips, and run the same chips faster within the same power budget. This is relevant to consoles. That, in a nutshell, is it.

TheChefO · Nov 29, 2011

function said:
AMDs fab is now Global Foundries, a foundry that TheChefO has been pimping heavily...

Seriously?

I made mention that if MS/Sony were to launch in 2012, they would have the choice of GF 32nm or TSMC 28nm.

Lately the 28nm of TSMC has been getting more positive news reflecting the possibility of 28nm in 2012.

The only thing I've been "pimping" is that 2012 has a very real shot at introducing real next-gen consoles.

TheChefO · Nov 29, 2011

function said:
- Athlon 64 X4 (45nm) to Llano (32nm) showed a small improvement in perf/watt...

One has a GPU in it, the other doesn't.

I'll let you guess which is which.

TheChefO · Nov 29, 2011

function said:
The real leaps in CPU power come on the same node, which is completely the opposite of what TheChiefO claimed.

Where did I claim this?

DuckThor Evil · Nov 29, 2011

function said:
The general conversation was about CPUs. That was AMDs almost complete recent CPU history. AMDs fab is now Global Foundries, a foundry that TheChefO has been pimping heavily. Those example are the best and most appropriate ones available IMO. If you have any better or more appropriate examples then hey, I'd love to see them.

The jumps in performance across node transitions have been smaller than the increases that occurred during the lifetime of the node.

That sort of comparison is flawed by design. Chip companies like AMD and Intel have different motivations and timetables for their decisions than a console manufacturer trying to pick the optimum timeframe for launch. Often the first products that come out of a new node is a pipe cleaner that helps to ramp up the production line for other products. This saves money for them and they can use these products for something. I don't see any point in using a cost cutting measure or a pipe cleaner product as some sort of base here. I'm sure they could make a Pentium 1 on 32nm and then the jump during that node would be amazing...

You can't use something like "Athlon 64 X2 90nm to Athlon 64 X2 65nm offered basically no improvement in raw performance, but a small improvement in perf/watt."

as an argument here, because improvement in raw performance was never intended in that shrink. It was purely a cost cutting measure. No-one can expect a huge gain in performance in situations like that. Perhaps better overclocking a bit. Architectures don't usually scale in a way that sees huge improvements there. The 65nm chip was about half the size and had less cache...Raw performance was not the target there and most of your other examples are similar or flawed in some other way (2x bigger die giving 2x performance only due to the fact that the initial die was tiny). The optimal strategy for chip making is not the same as launching a game console.

Oh yeah, WiiU CPU: 45nm. There's got to be a reason for this!

My quess would be that the CPU in WiiU wont be a very aggressive design and therefore it doesn't require smaller process than that. New process also will take time to become cheaper than the one that still has high volume. Nintendo's design might fit nicely to older process and get costs benefit that way, it also puts a tighter ceiling on what you can put there.

Now lets look at Intel (from memory, tell me if I'm wrong):

- Single core to dual core: done on the same node (65nm)
- Dual core to quad core: done on the same node (45nm)
- Quad core to hex core: done on the same node (32nm)

I don't quite understand what you are saying here. Intel launched Hex core on 32nm (i7 980x) in March 2010 before any quads on that process if I'm not mistaken, but there were dual core Clarkdales slightly before that. Sandy Bridge was the first 32nm quad early 2011. Intel is making anything between 2-8 cores on 32nm currently, although two cores are disabled on the native 8 core chip (3980X) I'd definitely say that Intel's Tick Tock strategy has always brought major architectural changes on a new node. Core 65nm, Nehalem 45nm, Sandy Bridge 32nm.

In any case Intel's doings aren't really relevant either, because they are playing different game also.

This isn't true. MS launched with a piddling amount of consoles instead of millions they could have sold on day one. Supply was constrained in at least the UK and America until after Christmas.

360 launched November 18 in NA and early December in Europe. By February it was quite easy to find a unit. PS2 for example had far bigger shortages. 360 supply was able to meet the demand quite quickly. I'm still happy with my 1-2 months, but if it helps add one or two more as it doesn't change anything.

So you're more likely to run into problems with a big chip early on in the node, but this is ... somehow ... not ... relevant? C'mon man, think about it.

The 480 was late and hot with poor yields. The 580 was a very similar chip but came when the process was more mature. The 580 was 10 - 20% faster in games while using 10 - 20% less power and yields were vastly better. All on a process where there was apparently no difference start to finish!

A chip like the 480 (late, hot, slow) would be an unmitigated disaster for a console vendor.

I brought that up, because that example is the best and proper argument for your angle and actually what you should have brought up in the first place instead of something like Athlon 64 X2 90nm vs 65nm.

My point at the same token was that, as that truly is the best case for your angle. Those percentages happened on a absolute monster 500mm2 chip that would never have any business inside a console. Smaller mid range-ish chip is not going to run into problems of that magnitude and therefore in the console realm the problem is not going to be as big as in that example.

A bigger chip later on?!?

It's not that much bigger, it was quite a modest improvement in every way, that was just to get a new product out there and it would have closed the performance gap AMD was behind to GTX 480, enabling AMD to rise their prices to higher level, but then came the GTX 500-series and AMD was back to square one or actually a bit worse.

The gains in the first 12 months were fairly big.

March 2009: Mobility Radeon 4830
April 2009: Radeon 4770
September 2009: Radeon 5870
January 2010: Mobility Radeon 5xxx

So small chips to start, then six months in the first performance part, then 10 months in the mobile, power efficient performance parts. That is not a surprising pattern.

Those are not gains. Different products for different purposes/goals

Later in the node you can get acceptable yields on bigger chips, and run the same chips faster within the same power budget. This is relevant to consoles. That, in a nutshell, is it.

Later in the node you get better yields yes. How much that matters if you compare console life cycles of 2012-2020 vs 2013-2020 is debatable. Especially if the earlier launch helps you in other ways and the chips wont be that big in the first place. 10-20 % percent faster console one year later is probably not going to light the world on fire and that boost would probably drown to the fact that in the alternate scenario the devs have had more time with the tiny bit weaker hardware.

TheChefO · Nov 29, 2011

function said:
Later in the node you can get acceptable yields on bigger chips, and run the same chips faster within the same power budget.

As I've said.

Hence the reason why I've been repeating the idea of using multiple GPU's having merit.

Splitting the gpu budget to two smaller chips for ... wait for it ... increased yields on a newer process.

...

Predict: The Next Generation Console Tech

homerdog

donator of the year

Cheezdoodles

+ 1

function

None functional

Ailuros

Epsilon plus three

Vice

hoho

DuckThor Evil

DuckThor Evil

TheChefO

TheChefO

TheChefO

TheChefO

hoho

TheChefO

function

None functional

TheChefO

TheChefO

TheChefO

DuckThor Evil

TheChefO

Similar threads