Predict: The Next Generation Console Tech

Status
Not open for further replies.
Sooner than that. Would it even be competitive with the 360's 3 enhanced Altivec units for SIMD heavy tasks?

Yes. It can do 8 32-bit FMADDs a clock (two 128-bit units) for every module, and the memory subsystem is vastly, vastly better. It has fewer registers, but frankly, it can spill half the time and it will still get better utilization than the 360.
 
Any educated guess how much area & power 8 SPEs (and associated logic) on 32nm are going to use?
 
has anyone seen this?

http://www.gtplanet.net/gran-turismo-at-4x-hd-resolution-240-frames-per-second/

that's 4 cells running together in parallel. Are their any processors out there that can do this as well?

Per the multi-console many-screen output, this is nothing new at all and was done by their competitor, Turn10, in 2007:

http://www.youtube.com/watch?v=HuQE858nk-4

MS was doing this with MS Flight Sim waaay back in the day.

Interleaving frames (hello AFR aka Crossfire/SLI) and scaling above 60Hz I won't even comment.

The answer to your question: Yes, there are many other processors and consoles that do exactly what you linked to.
 
It's half that.

No, it's that per module, divided between two cores. The fpu in each module has 2 independent 128-bit fmadd pipes.

So the total throughput of a 2-module design is 16 FMADDs per clock, and if you count FMADD as two flops (like the consoles do), then the 32 flops per clock is nicely more than what the 360 has.
 
The latest interview with Tim Sweeney on tablets, SSD, nextgen consoles and a bit of UE4. He sounded very positive about home consoles overall.
http://www.gamesindustry.biz/articles/2012-03-13-an-epic-interview-with-tim-sweeney
GI: The reception for Unreal Engine 4 has been very positive from what we've heard, although it's not been shown to press. When are you looking to push forward with that and launch Unreal Engine 4?

TS: Well our engine has been in active development for a long time. It's up and running stably now and a select few partners have and will have access to it shortly. It is already in the early stage of the development pipeline at a number of studios. But we're saving the announcements until everything is lined up for when we can make a really big public splash. Obviously we're trying to line up Epic's roadmap up with a few other roadmaps whose existence we can't even acknowledge.
And yeah;), he's definitely talking about PS4 and Xbox720.
 
Per the multi-console many-screen output, this is nothing new at all and was done by their competitor, Turn10, in 2007:

http://www.youtube.com/watch?v=HuQE858nk-4

MS was doing this with MS Flight Sim waaay back in the day.

Interleaving frames (hello AFR aka Crossfire/SLI) and scaling above 60Hz I won't even comment.

The answer to your question: Yes, there are many other processors and consoles that do exactly what you linked to.

That was done on the PS2 with GT4 in 2004, a first on consoles i think.

But good idea, shrink the PS3 CPU/GPU and put 4 of them into the PS4, set them up in a SLI style and have them render 1/4 of the picture...!!! :)
 
That was done on the PS2 with GT4 in 2004, a first on consoles i think.

Thanks, you confirmed even the PS2 could do this.

But good idea, shrink the PS3 CPU/GPU and put 4 of them into the PS4, set them up in a SLI style and have them render 1/4 of the picture...!!! :)

No, horrible idea. For the 240Hz framerate it would be fine (because the lag would still be relatively low) but once you scale down to 60Hz or 30Hz with frame interleaving you are going to get horrible, horrible latency (think effective framerate of 15-7 fps) and micro-stuttering. And if you want to go the 1/4th of a picture route (SFR) basically you are going to have to submit the geometry more than once and with such a setup with no real chip communication you will essentially be stuck with 1 Cell processor doing the work 4 times over (across each core) and each RSX unit working on the frame fragment. This is just a wasteful use of resources.

I know fans get really excited when companies show off this stuff, but it has been on the PC for ages and there are serious drawbacks. Which is why I totally disagree with your comment that this is a "good idea." It is a horrible idea! I don't even know where one could even begin to layout an argument WHY this would be a good idea!! It is a crazy idea!

And addressing Elan's comment again, no, those 4 Cell processors running in "parallel" in the GT demo is not special or anything that hasn't been done for a long time. And what was specifically shown is not interesting for next gen due to cost and inefficiency.
 
But good idea, shrink the PS3 CPU/GPU and put 4 of them into the PS4, set them up in a SLI style and have them render 1/4 of the picture...!!! :)
Pretty sure newer GPUs have far better per-transistor efficiency than the old stuff. Not to mention programmability.
 
And yeah;), he's definitely talking about PS4 and Xbox720.

It seems Epic is also quite interested in much faster consoles:

GI: More generally speaking, what are the things you are looking for in next-gen systems, the kinds of things that get you and Epic excited creatively?
TS: Gosh, my list for the next generation; it's really two big things. One is to bring all that's best about other computing devices - the convenience, the access to social media, the connectivity with the internet, Facebook or Twitter - and continue to bring that forward in the console experience. If you look at the console generation previous to this one, these were offline devices. You'd install a game, play it by yourself and you're done. Nowadays you go online, play games, and buy games through XBLA or PSN. I think we've really only seen the tip of the iceberg there. There is a continual challenge for the industry to push forward in order to remain relevant and competitive with the awesome things that are happening on iOS for example.
"If you go into the next generation with a budget of $100 million, you are doing it wrong and are being far too brute force."
Tim Sweeney

Number two is to deliver the maximum amount of computing power that is economically possible. Really, that's the reason consoles exist in the future. They have an enormous amount of graphics processing power that delivers an experience that goes far beyond what you can get on a lighter weight device. Pushing forward, we measure that performance in teraflops, trillions of floating point operations per second. When I started programming, you had about one thousand floating point operations per second. Now we have, on nVidia's fastest hardware, two and a half to three teraflops. To push next-generation up to those levels will really ensure that they will remain relevant for another generation, even as other cool consumer devices like iPads and iPhones become more prevalent.
 
No, it's that per module, divided between two cores. The fpu in each module has 2 independent 128-bit fmadd pipes.

So the total throughput of a 2-module design is 16 FMADDs per clock, and if you count FMADD as two flops (like the consoles do), then the 32 flops per clock is nicely more than what the 360 has.

The fpu works like a spe, one 128bit pipe(p0) does arithmetic/int ops and the other(p1) does data transers/transforms. The total madd thoughput of the 2 pipes is 4 madd flops per cycle. Times that by 4 and you have a total of 16 madd flops per cycle for a BD chip in its largest config, at 3.2 ghz that's 51.2 gflops. That's at 95W though.
 
The fpu works like a spe, one 128bit pipe(p0) does arithmetic/int ops and the other(p1) does data transers/transforms.

No, this is not correct. The BD FPU actually has 4 pipes -- 2 almost symmetrical FPU pipes that both do 128-bit FMAC (P0, P1), and two "MMX" pipes (P2, P3), which are used for packed integer, stores(only P3) and transforms. Loads issue to the units on the integer side and do not occupy a pipe in the FPU.

Realworldtech article on it.
AMD optimization guide, the FPU is detailed on p.37

The total madd thoughput of the 2 pipes is 4 madd flops per cycle.
No, the FPU can issue two independent 128bit FMADD ops or one 256-bit AVX op per cycle, along with a store, and a transform. I have personally measured this to be true.
 
It seems Epic is also quite interested in much faster consoles:

Yeah they are, and this seems a little ominous

729h.png


Posted 3/14

Also implies the next console specs still arent settled. Even 2013 might be too early.
 
No, this is not correct. The BD FPU actually has 4 pipes -- 2 almost symmetrical FPU pipes that both do 128-bit FMAC (P0, P1), and two "MMX" pipes (P2, P3), which are used for packed integer, stores(only P3) and transforms. Loads issue to the units on the integer side and do not occupy a pipe in the FPU.

Realworldtech article on it.
AMD optimization guide, the FPU is detailed on p.37

No, the FPU can issue two independent 128bit FMADD ops or one 256-bit AVX op per cycle, along with a store, and a transform. I have personally measured this to be true.

From page 37:

The FPU can receive up to four ops per cycle. These ops can only be from one thread, but the
thread may change every cycle. Likewise the FPU is four wide, capable of issue, execution and
completion of four ops each cycle. Once received by the FPU, ops from multiple threads can be
executed.
 
No, this is not correct. The BD FPU actually has 4 pipes -- 2 almost symmetrical FPU pipes that both do 128-bit FMAC (P0, P1), and two "MMX" pipes (P2, P3), which are used for packed integer, stores(only P3) and transforms. Loads issue to the units on the integer side and do not occupy a pipe in the FPU.

Realworldtech article on it.
AMD optimization guide, the FPU is detailed on p.37

No, the FPU can issue two independent 128bit FMADD ops or one 256-bit AVX op per cycle, along with a store, and a transform. I have personally measured this to be true.

Just to be clear, I was responding to your 8 madd ops per cycle when I said half that. If you want to do that funky int ops = FP ops like they did with the consoles then be my guest and double it up (however I believe it would be 7 not 8 for BD). Actually, it'd probably be more accurate to call it a spu instead of an fpu but then MS worked too hard to turn Cell into a four letter word for them to consider that.
 
From page 37:

The FPU can receive up to four ops per cycle. These ops can only be from one thread, but the
thread may change every cycle. Likewise the FPU is four wide, capable of issue, execution and
completion of four ops each cycle. Once received by the FPU, ops from multiple threads can be
executed.

Yes, and?

The point still stands -- each module can do 8 SP FMADDs per cycle, or, a 4-module, 8-core chip can do 32 FMADD each cycle, not 16.

Just to be clear, I was responding to your 8 madd ops per cycle when I said half that. If you want to do that funky int ops = FP ops like they did with the consoles then be my guest and double it up (however I believe it would be 7 not 8 for BD).

Umm, still no. I am not counting the integer ops.
 
Status
Not open for further replies.
Back
Top