Predict: The Next Generation Console Tech

aaronspink · Apr 4, 2010

corduroygt said:
There are FAR more developers and companies developing for x86, yet we haven't seen anything like what was being done on the cell being done on C2D. A computationally intensive task such as folding is much slower on a C2D then the cell, or I believe it has also been compiled to work on GPUs, which also destroy C2D.

F@H is hardly a good example for you to be using. One should realize that they don't give the same problems out to all clients because except for the PC, the clients can't do all the work.

Cell does AI just fine as seen in KZ2, the 360 cpu does AI just fine as seen in Halo 3, physics can be handled by GPU's as seen with physx, so no, we don't need amd/intel cpu's for those either. In fact cell's are used as physics simulators in supercomputers, so you're wrong there too.

The KZ2 and Halo3 AI's are hardly impressive. They really don't even get to the level of the Condition Zero AI.

As far as physx, you should realize that the GPU cannot handle any interactive physics calculations and only handles non-interactive kernals. In addition, the latest data suggests that a proper multi-core implementation of the Physx kernals runs at roughly the same speed on a CPU while also being more capable and being able to handle interactive physics calculations as well. In addition, there is generally a significant performance hit for the GPU to do physics calculations as well.

Cell is severely limited by its programming environment along with its memory model. It is at best used to accelerate certain calculations but even to get to that point a lot of programming work is required. Vastly more than for a general purpose CPU.

I showed you examples where AI and physics are calculated on other than CPU's. I would still like to see one task that needs the high integer performance of an x86 on a console game that can't be done on a console cpu.

And I've provides reasons why your examples are incorrect.

PS3 and 360 both have non x86 architectures. They would have to be recompiled to work on x86. BC is much more relevant due to digital downloads this gen.

you mean exactly how the majority of the digital content had to be recompiled to actually run on the 360 and PS3? BC is not a requirement. Its been proven time and again that it isn't a requirement and even the pinnacle of the BC requirement vendors, sony, dumped it.

All the stuff you listed can be done with console CPU's and stream processors, in fact SPU's are used for a lot of those things you listed. And while the architecture of GPGPU's are different than Cell, they can be used for the same purpose. The only limitation I see there is one of memory when doing destructable worlds.

GPUs cannot do complex interactive physics because they are very poor at handling anything that isn't a simple matrix multiply. Complex AI involved a significant amount of decision trees which both the SPU and GPUs handle quite poorly.

aaronspink · Apr 4, 2010

corduroygt said:
Because both console cpu's cost far less and use less silicon than a C2Q. Regardless of when it came to your country, the console was released in 2006, where the best x86 option would be the lowest cost c2d available, which would be E6300 with 2MB cache and 1.86Ghz at a price of $186 in quantities of 1000.

Cost? Don't even start on cost, you'll lose ridiculously. The per part NRE costs for the console CPUs simply kills them in any comparison. The marginal production costs for x86 CPUs are also significantly less due to the yields and volume of the x86 CPUs. There are more x86 cpus sold in a year than the combined volume of all 3 consoles over their lifetimes.

I won't even get into the tool quality and availability differences between x86 and the console CPUs.

It should be obvious that a PC with a 1.86Ghz E6300 and a crippled 7800GTX cannot run GOW3 level graphics, or do anything gaming related better than a PS3.

Writing to the metal as in console gaming, I would wager that combo would at least hold its own.

aaronspink · Apr 4, 2010

MfA said:
I'd rather they spend some of the transistor budget on a boatload of atom cores with a switched networking fabric Cell might have been too hard to program due to it's cache and DMA limitations, but it's still the right idea IMO. I'd say one wide superscalar core is perfectly enough, After that it's time for higher density architectures ... now of course development costs might still make simply going with some multicore COTS processor the better deal, but I'd prefer to see something a little more elegant.

Cell isn't at all elegant. And lets not forget that a billion in design NRE and tools NRE pays for a LOT of silicon!

Squilliam · Apr 4, 2010

Sorry, NRE? What does that stand for?

DuckThor Evil · Apr 4, 2010

Squilliam said:
Sorry, NRE? What does that stand for?

My quess is non-recoverable expense.

rpg.314 · Apr 4, 2010

MfA said:
Context storage is not free (especially with register renaming) and more threads means more cache pollution. A set branch target instruction and a rudimentary branch predictor are simply nice to have, they don't cost that much.

Just mux 4/8 threads over an in-order core and give each thread it's own cache bank. Niagara Ahoy.

rpg.314 · Apr 4, 2010

MfA said:
I'd rather they spend some of the transistor budget on a boatload of atom cores with a switched networking fabric Cell might have been too hard to program due to it's cache and DMA limitations, but it's still the right idea IMO. I'd say one wide superscalar core is perfectly enough, After that it's time for higher density architectures ... now of course development costs might still make simply going with some multicore COTS processor the better deal, but I'd prefer to see something a little more elegant.

Then why not just bolt an (ARM A9X4)/(Phenom 2 X4)/(PowerPC X4) on top of a (7870 - ff hw) and be done with it.

Or may be just reconfigure Llano to have 2 x86 cores and a giant GPU core and call it a day.

archangelmorph · Apr 4, 2010

aaronspink said:
GPUs cannot do complex interactive physics because they are very poor at handling anything that isn't a simple matrix multiply. Complex AI involved a significant amount of decision trees which both the SPU and GPUs handle quite poorly.

I'd like to see your justification for this...

What makes you think SPUs & modern GPUs aren't suited to physics and AI processing?

& makes you so sure AI performance in the vast majority of cases is ever going to be anywhere near your bottleneck on a production game..?

MfA · Apr 4, 2010

rpg.314 said:
Or may be just reconfigure Llano to have 2 x86 cores and a giant GPU core and call it a day.

Because I believe there is a workload best served by fully independent processors which are throughput optimized. Monolithic CPUs aren't throughput optimized, GPUs shaders aren't fully independent.

rpg.314 · Apr 4, 2010

MfA said:
Because I believe there is a workload best served by fully independent processors which are throughput optimized. Monolithic CPUs aren't throughput optimized, GPUs shaders aren't fully independent.

Individual cores even in today's gpu's (not alu's) are quite independent.

MfA · Apr 4, 2010

Independent control flow.

rpg.314 · Apr 4, 2010

MfA said:
Independent control flow.

Independent control flow at work item granularity is anathema to high throughput processing. I don't see anyone optimizing both at the same time.

MfA · Apr 4, 2010

I don't see anyone doing it either ... but can that ultimately be justified?

Lets say we have a 5 wide VLIW with 32 bit integer/FP per channel (and fused 64 bit integer/FP at quarter rate for multiplies, half rate or full rate for the rest). 4 active threads run alternating to cover instruction latency, as an option also allow single threaded execution ... the compiler is responsible for hazards. 256x4x4x32 bit registers, with register windows for up to say 128 threads for vertical multithreading. That's a whole lotta hardware ... and more importantly, pretty close to an Evergreen SC. I simply do not believe an instruction cache and decoder makes a huge dent in that.

corduroygt · Apr 4, 2010

aaronspink said:
Cost? Don't even start on cost, you'll lose ridiculously. The per part NRE costs for the console CPUs simply kills them in any comparison. The marginal production costs for x86 CPUs are also significantly less due to the yields and volume of the x86 CPUs. There are more x86 cpus sold in a year than the combined volume of all 3 consoles over their lifetimes.

MS went from x86 to custom CPU so you're obviously wrong about the cost here. Intel/AMD just won't sell the CPU's that cheap to console makers when the PC market is much much bigger. No other console makers have used x86, so I guess you're going to take the laughable position of comparing yourself to pros in the industry and saying that you're better, just like you did with Credit Suisse analysts.

Writing to the metal as in console gaming, I would wager that combo would at least hold its own.

BS. It wouldn't be able to do half the graphics processing that cell does. Most game related calculations suit GPU's and Stream processors well, not CPUs. Writing to the metal would only help it to get by with 512MB total memory.

corduroygt · Apr 4, 2010

aaronspink said:
The KZ2 and Halo3 AI's are hardly impressive. They really don't even get to the level of the Condition Zero AI.

I assume you've played both so you can say this? What's so special about condition zero AI? I've seen presentations about KZ2 AI calculations and I've seen it in action, and it's really special, compared to a mod of a mod of an old game. I've played all half life games and the AI didn't compare to KZ2. Just by the fact that I don't see any PC exclusive games have better AI than anything on a console disproves your statement. Console cpu's are able to handle AI just fine, no x86 bloat needed.

aaronspink said:
As far as physx, you should realize that the GPU cannot handle any interactive physics calculations and only handles non-interactive kernals. In addition, the latest data suggests that a proper multi-core implementation of the Physx kernals runs at roughly the same speed on a CPU while also being more capable and being able to handle interactive physics calculations as well.

Where is this "Latest Data?" How many cores is being compared? Since I don't expect you to actually give credible links, here's the best I could find from August 2008, Software refers to a QX9650 (a very expensive CPU), PPU refers to Aegia Physics processor (they were bought by NV), and other the GPU.

Rangers · Apr 4, 2010

I beat KZ2 and the AI did not stand out at me.

AI aggressiveness != great AI.

Kaotik · Apr 4, 2010

aaronspink said:
As far as physx, you should realize that the GPU cannot handle any interactive physics calculations and only handles non-interactive kernals. In addition, the latest data suggests that a proper multi-core implementation of the Physx kernals runs at roughly the same speed on a CPU while also being more capable and being able to handle interactive physics calculations as well. In addition, there is generally a significant performance hit for the GPU to do physics calculations as well.

Actually the newest(?) PhysX build brought support for GPU accelerated rigid body physics, too, you know, those interactive things

corduroygt · Apr 4, 2010

Rangers said:
I beat KZ2 and the AI did not stand out at me.

AI aggressiveness != great AI.

Which PC game has great AI and impossible to do because of this on a console?

Acert93 · Apr 4, 2010

Some RTS have brutal AI demands.

aaronspink · Apr 5, 2010

Kaotik said:
Actually the newest(?) PhysX build brought support for GPU accelerated rigid body physics, too, you know, those interactive things

reference? Can't find anything supporting this in the SDK notes, the physx forums, etc.

Predict: The Next Generation Console Tech

aaronspink

aaronspink

aaronspink

Squilliam

Beyond3d isn't defined yet

DuckThor Evil

rpg.314

rpg.314

archangelmorph

MfA

rpg.314

MfA

rpg.314

MfA

corduroygt

corduroygt

Rangers

Kaotik

Drunk Member

corduroygt

Acert93

Artist formerly known as Acert93

aaronspink

Similar threads