Predict: The Next Generation Console Tech

Status
Not open for further replies.
I'm thinking something more like this:
2000sp
64TMUs
24ROPs
256bit/384bit
Clock: 600-800Mhz.
No Edram-> MSAA resolve with shader core...
One pool memory of 4GB GDDR5@5,5-6,5Ghz


About 2,5 times RV770, but with a 32nm process manufacturing this chip could have a die-size smaller than 150 mm^2. (transistor density increased 4-5 times).
And maybe a 8-core@2 Ghz@32nm like CPU will be great.


You cant have >128 bit bus in a reasonable size die. Or at least, it cannot be shrunk enough in the future.

The rest of the chip can shrink but there is a certain minimum size 256 bit and above cannot shrink beyond. Therefore the console makers avoid them.

Although I've heard people say they could do 256 initially, then switch to 128 with double speed RAM later on. But I'm not sure if this is technically possible.
 
Entropy said:
So, do we need more CPU performance than we have today to produce console games?
Not for traditionally CPU centric tasks, that treshold was largely reached by last gen machines already. But games always walked the line between "what belongs on CPU" and what elsewhere, and in this age it's gotten very blurred.
So it seems most ideas are pointing in direction of configurable resource pools without discrete allocation dictated by processor designs (or physical packages) (hell there was a time some of us speculated that in relation to PS3).

IMO the only thing that is in question is whether next console cycle will happen before this convergence makes its way into console designs or not. And given the way market has been changing, I think it's more likely next cycle will be later then sooner.
This makes sense - one of the advantages that consoles have is that they don't need to adhere to the modular build tradition of PCs, nor to backwards compatibility. This is very significant, and within limits, the console can be designed as a whole. Within limits, though, since IP and manufacturing can't necessarily be combined at will. At present the IP for CPUs and graphics typically resides on different hands, so the freedom is mostly in a mix-and-match kind of way, and of course in interfaces. Specifically, the IP situation makes it difficult to design the CPU+GPU as a whole. I'd tend to believe that the CPU and the GPU will remain as separate entities for the next generation of consoles, but tighter coupled with each other than in a PC, and with a tailor fitted memory solution.

Just how such a system could be put together is the kind of speculation that fits both forum and thread, but it would be useful to hear some further input from developers. What is frustrating or awkward today? What would be interesting to do that we can't do already?
That, or we'll enter era of incremental upgrades like Wii, and that's a lot less interesting to talk about.
Well, here our perspectives differ a bit. I tend to see the Wii as a device that embraced change, rather than just extrapolating SNES->N64->GC one step further along the same curve.
It's success is an important example in many respects, but it makes the next generation less predictable, less likely to be just more of the same. Much more interesting to talk about. :)

For instance, it would seem reasonable that Nintendo would take the success of the Wii, keep the design ethos of small, quiet, unobtrusive and inexpensive, but substantially upgrade the hardware in order to adapt the concept to our future HD homes, and take steps to further improve the precision in the motion sensing to aid diversity and skill building.
But they could also introduce a portable device that is networkable, interfaced with via stylus and whatever, that can be plugged in to any TV set and you can play the same game there upscaled, and using a Wii-mote for pointing. You could seamlessly play, not only single player games at home and on the go, but also network with a friend or connect to servers on the go. One gaming device, everywhere. The technology is pretty much here as IP already, and with a couple of more years to get all the geese in line this is eminently doable.

Or they could take a stab at stereoscopic 3D, coupled with great positional audio, or they could add vision tracking, or...
There are a lot of things that are possible once you stop putting one foot in front of the other, lift your head and look around. Moving on in the same direction is one of them, for sure, but for the next generation of consoles I don't regard that as a given.
 
Last edited by a moderator:
Maybe Rambus technology could be a saviour, it may cost more than GDDR but if it allows manufacturers to pass on Edram and use tinier chip the tech could find its place in next generation system.
 
Last edited by a moderator:
For next gen can Wii GPU be upgraded to handle HD ? Like putting more eDRAM and beef up fixed T&L, combiner and texel and pixel fillrate and improved the output stage to handle 1080p ?
 
I think that 128-bit bus for next-gen consoles would be horrible in terms of bandwidth, except for a Wii 2 which can get away with 128-bit coupled with double-clocked memory.
(GameCube and Wii both use 64-bit bus IIRC).

I'd expect nothing less than 256-bit bus(s) in 3rd-gen Xbox and PS4.
 
For next gen can Wii GPU be upgraded to handle HD ? Like putting more eDRAM and beef up fixed T&L, combiner and texel and pixel fillrate and improved the output stage to handle 1080p ?

I'm sure that's possible. might even be the cheapest approach. Tho i sure hope that's not what Nintendo does.

I hope Nintendo goes with a modern GPU architecture that's shader model 4 or shader model 5 class, instead of a tricked out HD-upgraded version of Flipper/Hollywood which are both a late 90s architecture (given that Flipper was designed mainly in 1999 and Hollywood is hardly any different).

Of course, with that, I am still thinking in terms of SNES-->N64-->GCN advancements in silicon.

Nintendo might decide that an improved Wii with HD resolution (not much better or different in actual graphics) and better motion sensing tech is enough.
 
Last edited by a moderator:
I can't help myself. I found this quote from the NY Times http://bits.blogs.nytimes.com/2008/06/10/apple-in-parallel-turning-the-pc-world-upside-down/ where Steve Jobs says
“The way the processor industry is going is to add more and more cores, but nobody knows how to program those things,” he said. “I mean, two, yeah; four, not really; eight, forget it.”
And then of course he goes on to describe how Apple has made a breakthrough in utilization of parallel processors, "Grand Central".
I remain dubious as to just how effective their new system level resources will be (I think it's a good effort though), but it was neat to see Steve Jobs publicly backing up my experiences and opinions on SMP systems. ;)
 
I can't help myself. I found this quote from the NY Times http://bits.blogs.nytimes.com/2008/06/10/apple-in-parallel-turning-the-pc-world-upside-down/ where Steve Jobs says
And then of course he goes on to describe how Apple has made a breakthrough in utilization of parallel processors, "Grand Central".
I remain dubious as to just how effective their new system level resources will be (I think it's a good effort though), but it was neat to see Steve Jobs publicly backing up my experiences and opinions on SMP systems. ;)

He still did not suggest Local Stores :p.

J/K :), btw there is a CRAPLOAD of patents from the same basic group that worked on the patent I linked a few posts back on the same BTE+VTE's architecture... again and again the idea of a Workload Manager (the function of the BTE/PPU-like thingy) sitting close by a MT friendly Mailbox facility, shared L2 cache, and a set of throughput oriented independent vector cores strikes me to be very interesting... CELLv2/v3 with flexibly and quickly lockable cache lines to offer a mix of both worlds ?

We'll see :).
 
I'm sure that's possible. might even be the cheapest approach. Tho i sure hope that's not what Nintendo does.

I hope Nintendo goes with a modern GPU architecture that's shader model 4 or shader model 5 class, instead of a tricked out HD-upgraded version of Flipper/Hollywood which are both a late 90s architecture (given that Flipper was designed mainly in 1999 and Hollywood is hardly any different).

Of course, with that, I am still thinking in terms of SNES-->N64-->GCN advancements in silicon.

Nintendo might decide that an improved Wii with HD resolution (not much better or different in actual graphics) and better motion sensing tech is enough.

There is always a chance that Nintendo will go for the kill next gen but probably not. Maybe they'll just stick an upscaler and upscale everything from 480p, and just have a higher clocked Wii GPU and CPU with more RAM. After all DVDs are good enough for most people.
 
There is always a chance that Nintendo will go for the kill next gen but probably not. Maybe they'll just stick an upscaler and upscale everything from 480p, and just have a higher clocked Wii GPU and CPU with more RAM. After all DVDs are good enough for most people.

There's a pretty big difference between upscaling DVD and upscaling games (unless they can somehow work 16xAA in there).
 
There is always a chance that Nintendo will go for the kill next gen but probably not. Maybe they'll just stick an upscaler and upscale everything from 480p, and just have a higher clocked Wii GPU and CPU with more RAM. After all DVDs are good enough for most people.

It's highly doubtful that Nintendo will try to compete on technology; they've gone on record saying something to the tune of 'Nintendo is in the games business, not the technology business' and have used their experience with the gameboy as a reference. I think that even HD support will depend on what HD adoption looks like in 2010 (or whenever).

Even if they go that route, I doubt they'd put in really fancy upscaling. Considering they're 'in the games business', it doesn't make sense to make BC TOO nice. You want people to have a reason to rebuy their old games.
 
When I think about the R770 which is 256mm² @55nm I amazed at the power nextgen system could be provide with by a gpu heavy design.

Even with a tinier silicon budget, Two terraflops are conservative@32nm
How many ALUs ATI will be able to cram in the next xbox GPU, I'm drooling...
 
When I think about the R770 which is 256mm² @55nm I amazed at the power nextgen system could be provide with by a gpu heavy design.

Even with a tinier silicon budget, Two terraflops are conservative@32nm
How many ALUs ATI will be able to cram in the next xbox GPU, I'm drooling...

yes, and to be honest, I think the top end console for next gen (maybe ms) will have around the capability (not architecture) of the dual r700 (thats right, a quad rv770) in the ball park of that performance.

I think targetting 250^mm2 @32nm this can be done.

so around 5 TF of shading power. I also believe that the shading power and pipeline flexability will be greater than that of a dx10.1 card, so pushing more towards dx11 SM5 standard of architecture.

Ive heard that dx11 will be showed in a couple of months... can this be confirmed? I bet you anything that DX11 Unreal Engine4 Cry-Engine 2.5 shader model5 etc, will all target 2011 for the dawn of next gen consoles since this is where the most cash is nowadays.

DX10 was somewhat mis-matched to the release of this round of consoles and therefore crippled the uptake of the API with devs. However I think every level of the industry now understands the importance of timing with consoles since thats where the dev money is.

DX11 will be MASSIVE compared to DX10, Im sure it will be prolific. We are going to see some really good new stuff in the next 18 months leading up this phase. Its going to be like when we first saw UE3 at the GDC and e3 all those years ago. That to me was generation defining stuff.

Way back in 2004 from UE3 tech demo its been slow progress. That was a quantum leap in real time tech at the time.... aaah how I miss those days. :)

http://www.amd.com/us-en/assets/content_type/DigitalMedia/AMD_Ruby_S04.swf

http://www.neogaf.com/forum/showpost.php?p=11610995&postcount=30

This stuff is now in my opinion getting close enough to what I believe will be next gen quality rendering that it has quenched my thirst somewhat in needing to know what the future holds... for now at least...
 
When I think about the R770 which is 256mm² @55nm I amazed at the power nextgen system could be provide with by a gpu heavy design.
This is exactly why we won't see a 256-bit bus next gen. When you look at the fantastic demos they've created for that sucker, is there any need for something several times faster? We're already at the point this gen where people barely care that 360 is doing 2.5 times the samples at 20% faster framerate in GTAIV. Going well beyond the abilities of RV770 isn't going to make much financial sense, as the on-screen image quality isn't going to improve as much with SP count as with art and software.

At 32nm and less, we're going to be looking at a GPU that's too small for a 256-bit bus. EDRAM will make a lot of sense because it really makes everything else cheaper and more scalable, particularly now that we've gone through all the teething problems of tiling on 360. It'll probably be one die, too, with more flexible usage.
 
Are you advocating one ASIC (CPU+GPU+EDRAM)?
I was actually just talking about GPU+EDRAM. IMO the only reason it was separate on 360 (and thus had limitations compared to PS2) was that there was no fab able to combine them cost effectively during the design stage.

One ASIC is possible even for the final revisions of this gen, but I doubt we'll see that at the launch of next gen.
 
I think next-gen consoles will be mainly one or two CPUs with very simple Cores, up to 32 of them, with wide SMT/HT to hide latency, maybe 4 or 8 threads. they will either have _very_ specialized instructions to access various formats (kinda like a sample instruction for textures in the shader) or they'll have some fixed function slaves, that will return a bunch of requested texel-samples or vertices.

So, wondering how many TMUs they'll have, i'd say nobody will really care, like in the ps2/xbox time all ppl were wondering how many triangle/s next gen will be capable to transform and nowadays we dont really care cause we do most of the lighting etc. anyway on per pixel basis.

I also think it won't be that critical how many GFlops you'll have, but more about how good all latency can be hidden. So, the challenge will be, how good you can split your work to work with those insane (from nowadays point of view) 256 (slow) threads. (so next time marketing will push the thread numbers, like they did GFlops this time and MTri/s last time).

regarding memory, I think ram will more and more move to some kind of cache, we'll have rather less, but very high bandwidth memory. there reason I think it's that way is because of all the threads. having nowadays 4 cores fighting for cache and dram banks is already bad, but with 256threads you'd end up in a complete cache-trashing nightmare. so the cache controller will try to collect several requests to one big batch and load the memory covering several requests in one go. therefor the latency of the memory can be some kind high, but the bandwidth needs to be extremely good. (an alternative would be something like 1TSRam of the cube. but 2GB of 1TSram might be a bit insane, cause nearly nobody is moving that tech forward, in opposite to GDDRam).

on the other side I'd expect the mediums to not be that much faster than nowadays, but the latency will go down to 10%, just because that's the most critical factor on nowadays' streaming.


and I'd expect the consoles to move more and more toward the pc, not just allowing mouse gaming (like ps3 already does), but to be a replacement for surfing etc. being a real replacement for the PC for most of the ppl.

that imply that big pc players like NV/Intel/AMD will urgently try to get into this marked, offering their solutions of cpu+gpu+memcontroller mixtures and we're back at the cpu with a lot of cores+hyperthreading+special-instructions/fixedfunction-slaves.


maybe it's a bit sick forecast, but yeah, that's kinda what i'd expect.
 
Interesting point of view Rapso.
I wondered at some point if a sea of ARM core would not be better for this kind of job than the "x86" we will find in Larrabee ;) as they have high perf per watts, they are tiny even if ones decide to attach huge SIMD units to them.

You're take on the memory hierarchy is also interesting.
How much bandwidth 2GB of 1TSram would provide?
But I guess it should be huge, on the other side Rambus seems also in position to provide lot of bandwidth with its upcoming technology.

In regard to the heavily chanelled IO, you make ME WANT to learn more about how the IBM Z series processors work ;)

I guess such a system is doable but on software side, what it would like?
Tougher
or complete hell?
EDIT
It's also likely that such a system would fall short in regard to raw power against a classic CPU + GPU design
 
Last edited by a moderator:
Interesting point of view Rapso.
I wondered at some point if a sea of ARM core would not be better for this kind of job than the "x86" we will find in Larrabee ;) as they have high perf per watts, they are tiny even if ones decide to attach huge SIMD units to them.
I think at some point it doesn't matter which instructionset they have, i even expect larrabee to be x86 just by marketing.


You're take on the memory hierarchy is also interesting.
How much bandwidth 2GB of 1TSram would provide?
1TSram was said for the opposide possibility to my high-latency-low-bandwidth idea, 1TSram has very low random access latency. that could be the other side to tackle the problem with multicores generating random access (like the name says, it's kinda like sram). but nobody is pushing it, so there are no chips with high bandwidth (at least non I'd know of).

But I guess it should be huge, on the other side Rambus seems also in position to provide lot of bandwidth with its upcoming technology.
I think any GDDR will be fine, cause they're anyway going towards more bandwidth and higher latency, and it's mass marked, so it should be kinda cheap.

In regard to the heavily chanelled IO, you make what to learn more about how the IBM Z series processors work ;)
z series? those zero downtime cpus?

I guess such a system is doable but on software side, what it would like?
Tougher
or complete hell?
that's heavily depends on how lightweight those threads will be, having a pool of threads that you can start nearly for free to complete simple loops would make ppl use them everywhere. that would make life easy, but I guess it won't be that nice.
but in general, intel's TBB is how I expect the future to be.



EDIT
It's also likely that such a system would fall short in regard to raw power against a classic CPU + GPU design
I dont think so, cause GPUs are already moving towards general purpose computing, they are that fast, because they can hide the latency very good, so they keep the simple units they have busy all the time. they might have latencies of 100cycles for a simple add and it doesn't matter cause they have zillions of threads that can start an instruction every cycles. that's why I think the core will have a lot of thread.
the currently biggest issue with cpu performance is latency. most code is like
load to r0
load to r1
add r0, r1, r0
store r0
all those dependency kills performance, most of the cpu is just idleing and waiting for the previous instruction to complete.
that's why I wrote
it won't be that critical how many GFlops you'll have, but more about how good all latency can be hidden.



(again, this is just my idea, i dont say it has to be that way).
 
x86 ain't so bad and ARM ain't that good. (Want to see how to do a tight efficient core? Look at Cell.)
 
Last edited by a moderator:
Status
Not open for further replies.
Back
Top