Predict: The Next Generation Console Tech

Status
Not open for further replies.
I could see a 2 billion transistor GPU.

GT200 / GTX 280 is already ~1 billion+


4 GB RAM total (system memory + graphic memory or unified)


400-500 GB/sec system bandwidth


for next-gen Xbox if it again uses EDRAM the bandwidth on that could be TeraByte or several TB per second since the EDRAM in Xbox 360 is already 1/4 of a TB/sec.

I expect you won't ever see a 200 watt graphics card in a console. ;)

In 3 or 4 years they might be able to make something that big doable (@32nm maybe) in a console, but I don't think you should be looking up from there.
 
I think that when it comes to a games console machine (and evidently since cache based architectures do not seem to die in the HPC field ;)) those do not seem a bad thing at all.

HPC is, in a way, a matter unto itself which serves purposes that are mostly oriented towards experiments in applied computer engineering. It intersects scientific computation in some areas (and where I have interests) but most scientific computation doesn't map very well to higher performance architectures. As vector machines gave way to vector parallel, to VLIW/clusters/whatever it also meant that fewer problems could take advantage of these advances.

Amdahls law is a harsh reality.

What makes supercomputing architectures interesting as far as games consoles go, despite the 4-5 orders of magnitude difference in hardware cost, is that these machines are dedicated to solving a single problem fast, as opposed to business computing which is more dedicated to dealing with as many transactions as possible in the shortest possible time. Desktop PC chips have mostly evolved along the business server model architecturally.

Games consoles however are devoted to a single task - running a game, (as opposed to 500 instances of an application) and so the problem they are meant to attack looks more like traditional scientific computing.

A very important question (the $1 Billion question so to speak) is where exactly should our efforts on the combination of OS + programming languages (new or extended) + hardware resources should better focus on, how best to attack the problem holistically on all three fronts.

The most important question is - what is the problem we are trying to solve?
Game consoles are relatively cheap toys, with a total hardware cost budget of a hundred to a few hundred dollars at most. The less expense the better, both for the manufacturer and for the end-user. This, rather than ultimate performance, is what drives the interest for efficient hardware utilization.

So, do we need more CPU performance than we have today to produce console games? I would hope to hear some developers chime in on that question. But I haven't heard much complaining about lack of CPU horsepower. Wishes for more memory, sure, and more graphics horsepower doing away with the need for rendering at lower resolutions and upscaling, or allowing the use of AA and so on. But higher CPU performance? It's been real quiet. And it is a bit remarkable that the Wii with its weaker CPU is the platform with physics oriented games like Boom Blox and Elebits.

There is strong interest from all sides to keep costs down, there is significant interest in keeping power draw, size and noise down, and there is seemingly little driving need for higher CPU performance. So, what is the conclusion? Will a lot of effort and money be spent driving the CPU development, or are those resources better spent elsewhere if we do, indeed, take a more holistic approach to what we are trying to achieve?
To me the answer seems crystal clear, but then, I'm looking at this from the outside and insiders may well have a different perspective.

For the PS4 it seems like a no-brainer to go with a smaller, cooler and cheaper update of the BE in the PS3, allowing the continued use of existing software resources, and yielding better return on tool investments. The architecture invites extension if additional performance is seen as desireable for marketing reasons.
The 360 CPU design doesn't really lend itself so well to extension, but maybe it isn't really needed either. Lithographic advances will yield some improvements pretty much for free anyway.
The only console where a more drastic change in CPU may be justifiable for the next generation is the Wii. But the priorities of Nintendo for its next generation is harder to predict than Sony or Microsoft.

I'll repeat my main point - that the strong drive for efficiency in consoles is driven by cost constraints, not ultimate performance.
And that this, in turn, will decide how the design of the next generation of console is approached.
 
I think Mintmaster gave some kind of response to the question in one of it last posts:
http://forum.beyond3d.com/showpost.php?p=1170410&postcount=34

I'm like you willing to ear other devs comments about this matter.

In my opinion, given the evolution of gpu, the cpu next gen could be kept tiny.

For your comment about the xenon and it successor, it would depend about the response to your question (how much "cpu power").
If load of computations are off load to the gpu, I would like to know if a quad core with two hardware thread per core could fit the bill? (an improved xenon)
 
Last edited by a moderator:
See, I've been thinking about some sort of generational leap in performance in most of my posts.

Maybe we (and developers) don't need a huge leap in CPU power. PS2/Xbox to Xbox360/PS3 was a massive leap in CPU performance. Totally new architectures with a *greater* than 30x leap in floating point performance, in both the case of Intel 733 Mhz to Xenon and Emotion Engine to PS3's version of CELL. Perhaps a small 2-3x improvement will be good enough for next-gen, allowing the CPUs to be very small, draw less power, etc.

The GPUs need to improve more though. 720 with 8x AA and 1080p with 4x AA would be a nice resolution/AA target. No game should ever run at less than 30fps at any time. Unlike this gen where games are not even meeting the locked-30fps standard. More games will make use of 1080p but it'll still be a split between 720p and 1080p. No games should run at less than 720p quite a few do this gen.

DX11 / Shader 5.0 or equivalent (since Sony & Nintendo don't use DirectX) should be a given.


RAM is always good. Nintendo should go for 1 GB (assuming an Xbox 360+ level console) while Sony and Microsoft should be at no less than 3-4 GB. PCs will be at 8 GB or so
 
Yeah, and its really starting to frustrate me that so many people still believe there will be: huge 4 billion transistor GPUs with 8GB of memory, 700GB bandwidth, and all these ludicrous specs.
I agree that a lot of speculations are unrealistic.

1) the Wii will have an impact both on costumers and stock holders.
2) R&D per mm² goes higher as you cram more and more transistors per mm²
________________________________________________________________

Going with your estimation (~400mm²) or mine (300-350mm²) using a 32nm process , manufacturers should have between 2 and 2.5 billions of transistors to play with (I use a gross approximation based on information Alstrong provide 150mm² ~1 billion transistors).

Ms could use a derivative of xenon there is a lot of room for improvments if I'm to believe some comments here.

I don't think Ms would need to use more than four cores, the focus should be put on making these cores better. There is a lot of opportunities here as xenon is not rumored as a peformer...

It could be made through more cache, faster caches, better cache hierarchy (something closer to cache organisation in phenom or nehalem).

Implementation of an OoO engine

Better branch predictor

Wider simd units.

Fix some broken implementations ( I remember reading that some stuffs were mostly broken her, like some data coming from the altivec/fp pipeline have to go in cache to be availlable to others execution units, same for L2 cache trashing).
________________________________________________________________

It could look like:

Xenon II:

4 cores @3.2 GHz
64KB L1(data + instructions) (128? insight welcome)
4x256KB L2
2MB of L3

SMT support for two hardware per core, may be in an improved manner.
OoO execution
able to issue three instructions per cycle (against two actually)
better branch predictor
256bits wide reworked altivec units (does reintroducing integer support would helps for some tasks?)


This would be <500millions (slightly more than twice actual xenon)

Xenon is~170 millionsof transistors, it's safe to assume that this CPU would be way better than xenon, while being "pretty" tiny and easy enough to cool.

It would actually like a toy in regard to a 2009 Nehalem :LOL: and lots will shout this is a super conservative guestimation.
But anyway if it's good enough not to mention really close to PC and BC should be easy.



That would let between 1.5 and 2 billions transistors for the gpu.

So by watching R770 figures should provide a theorical peak performance around the 2 TFlops now (if shaders work ~1GHz).

I think that:
This could be slightly too much power dissipated
GPU manufacturers are likely to trade some ALUs for control logic too make them more flexible

So I would put the figure around 2 TFlops for the system as a whole (obviously it's a "theaorical meaningless peak figure" anyway) as the most optimistic figure.

In regard to edam I don't know as it can put constrain on non standard uses of the gpu (for xenos at least tiling prevents some more exotic uses of the GPU).
And enough to fit a 1080 frame buffer with AA would eat a lot of silicon and complexify the mobo design.

It depends I guess on the cost:
rice of upcoming rambus technology or fast GDDR5/6
vs
price of edram

In fact I think I would favor the most flexible design (no edram) even if it means rounding the corners elsewhere (slightly tinier GPU).
_______________________________________________________________

Anyway for those that might consider 2 TFlops disapointing, well I think that they should consider other parts of the design:
Bandwidth
Amout of RAM
and that we're likely to speak of "real programmable Flop"

At the time next gen are push out of the windows, GPU may well look like a pool of strange CPU with really few fixed functions units.
A R670 is made of four "processors" a cluster of VLIW and SIMD unit.
We could look at a sea of "processors" thus a lot of useful power useful power for non graphical (or exotic graphical works).


Ok I've to find a more ineresting job... I've too much time to think about things I don't well enough...
 
Last edited by a moderator:
I think a redesigned/modified 32nm or 22nm shrink of the RV770 would make an exellent GPU for Nintendo's next console. RV770 will be concidered lowend by 2010-2011 and sub-lowend by 2012, and yet, it would provide an absolutely massive generational leap beyond Flipper & Hollywood. Now of course in actuality Nintendo wouldn't use an RV770 derivative, so what I really mean is, a GPU with RV770 level of power and features. RV770 is already a conservative GPU, it's not bleeding edge like GT200 / GTX280. Think how much smaller, how much less power-draw an RV770 GPU would be in 2011.

BTW RV770 doesn't provide 2 TFLOP, it's 1 TFLOP.

It's the R700 (4870 X2 with two RV770 GPUs) that is going to be around 2 TFLOP.
 
Numbers keep changing for the r770 ;)
But last time I watch the figure was just under 900millions transistors, so I did a gross x2 for the peak flops count.
 
Entropy said:
So, do we need more CPU performance than we have today to produce console games?
Not for traditionally CPU centric tasks, that treshold was largely reached by last gen machines already. But games always walked the line between "what belongs on CPU" and what elsewhere, and in this age it's gotten very blurred.
So it seems most ideas are pointing in direction of configurable resource pools without discrete allocation dictated by processor designs (or physical packages) (hell there was a time some of us speculated that in relation to PS3).

IMO the only thing that is in question is whether next console cycle will happen before this convergence makes its way into console designs or not. And given the way market has been changing, I think it's more likely next cycle will be later then sooner. That, or we'll enter era of incremental upgrades like Wii, and that's a lot less interesting to talk about.
 
This is a weird thread, Cell without local stores? In your dreams!

Cell already has a cache, it's primarily for the PPE but SPE memory accesses are also checked against it. That's why the latency is so high. The programming docs actually advise you to avoid using it.

I fully expect they'll use the 34 core Cell in the PS4, I also expect it'll have full DP support to save the development of 2 chips. There is some info about it already but not much: The PPEs will be new, they'll use some POWER7 tech. The SPEs will have lower latency local stores.

Switching to cache on a chip like this would be a disaster, cache latency would go onto another planet. Other designs are keeping cache but getting around the problems by switching to using large numbers of threads.

I don't think you'll see OOO suddenly making an appearance, especially given almost the entire industry is moving away from it. That said the in-order / out-of-order line is rather blurred, no one does "pure" in-order these days, a very "lite" form of OOO might be possible but I very much doubt it.

Cell was designed to be very efficient and scale very well, they'll move more in that direction, not less.

As for the 360 I expect it'll become more like Cell, not the other way around, either that or they'll try shifting the load to the GPU a lot more.
 
The 970FX was a processor IBM designed for Apple several years ago to try to get into a power envelope that would let them use it in a mobile setting, I think they ended up using them in iMacs.

http://en.wikipedia.org/wiki/PowerPC_970

Obviously Apple decided IBM couldn't provide them with a competitive CPU (at least without investing a significant amount on r&d, or so the story goes) and in hindsight it's hard to argue that they were wrong to leave.

The original idea for the 970FX was a low consumption 970 that could be used in laptops but Apple never finished the creation of them until after the famous Intel annoncement and they never were released to the market.

Another thing are the 970FX that were released to the market, they were nothing more than smaller 970 (more cheaper) but with the same power consumption.
 
Numbers keep changing for the r770 ;)
But last time I watch the figure was just under 900millions transistors, so I did a gross x2 for the peak flops count.



Think about this: GameCube is rated at 10.5 GFLOPs, total system. That's wasn't PR-flops either, like Xbox1's 80 GFLOPs, Xbox 360's 1 TFLOP or PS3's 2 TFLOP was.

Of that 10.5 GFLOPs in GCN, Flipper is around 8.6 GFLOPs. Assuming Hollywood in Wii is just 50% more thanks to the clock increase, that's 12.9 GFLOPs, call it 13.

Now a 1 TFLOP AMD GPU could be really small in 2011-2012 on 32nm or 22nm when the next Nintendo should be out. I have every reason to believe Microsoft and Sony will go with multi-TFLOP GPUs, while drawing much less electrical power than GTX 280. Nintendo can still have significantly less performance than XB3/PS4, and still provide a leap over Wii that's at least as much as N64-to-GCN. All highly speculative. I'd be pretty bummed out if all Nintendo did was a clocked up Wii with a few more pixel pipelines just to handle HD. Nintendo could still have a low-end CPU with just 2 cores / 4 threads, something comparable in performance to Xenon and less than PS3 CELL, but they will need to increase the GPU tremendously. Unless they think they can get away with 4 GCNs duck-taped together (and they could!).
 
Last edited by a moderator:
They had the same option this gen though. They could have put in a small, cool, cheap GPU with vastly more oomph than a doubled-up GC, but chose not to. Next-gen is likely to be very different, because they'll be so substantially behind the curve that they'll need to match at least what's up now, and for that they'll need to abandon the current architecture. So if they're going for a new architecture, may as well buy into something reasonably powerful. But still, we can't rely on Nintendo to choose the sensible hardware route!
 
What about HD 4870 in the next gen? It seems to be too early, but it's probably a 250-300 size die (actually unknown, could be bigger). So maybe in 2009 they can half that on 40nm, say a 150 die, that's about console size? We wont be seeing 32nm for much longer so, maybe around 2k10-11 this could go in Xbox 720? You have to realize it would probably be too big and too hot right now, will need time to get down to console specs.

I think the ATI parts, which have such massive shading power (the newest expectation is an incredible 800sp's on RV770!, >1 GFLOP) but are texture limited in PC games where the hardware must conform to the software, would be much better in consoles where the software conforms to the hardware, and all that shading power could be utilized.
 
What about HD 4870 in the next gen? It seems to be too early, but it's probably a 250-300 size die (actually unknown, could be bigger). So maybe in 2009 they can half that on 40nm, say a 150 die, that's about console size? We wont be seeing 32nm for much longer so, maybe around 2k10-11 this could go in Xbox 720? You have to realize it would probably be too big and too hot right now, will need time to get down to console specs.

I think the ATI parts, which have such massive shading power (the newest expectation is an incredible 800sp's on RV770!, >1 GFLOP) but are texture limited in PC games where the hardware must conform to the software, would be much better in consoles where the software conforms to the hardware, and all that shading power could be utilized.

I'm thinking something more like this:
2000sp
64TMUs
24ROPs
256bit/384bit
Clock: 600-800Mhz.
No Edram-> MSAA resolve with shader core...
One pool memory of 4GB GDDR5@5,5-6,5Ghz


About 2,5 times RV770, but with a 32nm process manufacturing this chip could have a die-size smaller than 150 mm^2. (transistor density increased 4-5 times).
And maybe a 8-core@2 Ghz@32nm like CPU will be great.
 
I'm thinking something more like this:
2000sp
64TMUs
24ROPs
256bit/384bit
Clock: 600-800Mhz.
No Edram-> MSAA resolve with shader core...
One pool memory of 4GB GDDR5@5,5-6,5Ghz


About 2,5 times RV770, but with a 32nm process manufacturing this chip could have a die-size smaller than 150 mm^2. (transistor density increased 4-5 times).
And maybe a 8-core@2 Ghz@32nm like CPU will be great.

That all seems reasonable. I think you could pitch above that some what on 32nm node, and aim for 200 mm^2 I would hope the clock would be at least 800Mhz core, Shader clock around the 1.5Ghz would be awesome. and would hope for 32ROPS 96TMUs. Possibly aiming for 200mm^2

That would be an awesome console.
 
The question with the 150 mm^2 is, re these companies going to be happy to have a machine that isn't going shrink and save them buckets on hardware? Won't this keep the price very high? They'll have to be thinking of hitting mainstream pricing. Even PS3's crazy high price is set to come down due to process shrinks. High-performance next gen is going to be moderately large and, most importantly, not hugely reducible, the rate things are going. Thus the launch price won't be too far removed from the lowest price attainable in the hardware cycle, and thus the specs will need to factor in the price, such as saying 'let's target $250 at launch and it'll be down to $150 at the end of the generation'. That there will be the limiting factor IMO.
 
The question with the 150 mm^2 is, re these companies going to be happy to have a machine that isn't going shrink and save them buckets on hardware? Won't this keep the price very high? They'll have to be thinking of hitting mainstream pricing. Even PS3's crazy high price is set to come down due to process shrinks. High-performance next gen is going to be moderately large and, most importantly, not hugely reducible, the rate things are going. Thus the launch price won't be too far removed from the lowest price attainable in the hardware cycle, and thus the specs will need to factor in the price, such as saying 'let's target $250 at launch and it'll be down to $150 at the end of the generation'. That there will be the limiting factor IMO.


Sorry, I dont really know what your key point is there. Are you saying that companies are not likely to find much process shrinks after the 32nm node?

Are you saying that 150 mm^2 cant be shrunk much past that size as it is? Is that because of the die size needed for a 256bit mem bus?

or is there something else that I haven't quite understood correctly?
 
I believe his assertion is that the process shrinks are/will be getting further apart and less effective. It certainly seems that way to me also, but I've not looked at a proper graph of it lately.
 
Status
Not open for further replies.
Back
Top