PS3 could transform video game industry!

bbot said:
Question for SimonF:
exceed the "1 Tflops" power of a cell-based PS3 (with an amazing 72 apus)?
Hmmm.... let's do some back-of-an-envelope maths as a sanity check...

If we assume a conservative 500Mhz clock, then 1TFlop => 200k floating point units. Those would have to be pipelined and so would need several register stages. Let's assume that you have 48 bits (average of 2 floats) of storage for each pipeline stage and perhaps 4 pipeline stages. That would then imply 4.6MB of storage just for the FP units!

I think that'd be a very expensive chip. [BTW Take this with a grain of salt because I'm primarily a software/algorithms researcher not a HW guy]


Besides, having worked on a parallel rendering system (well 32~64 processors) in my previous jobs, it's not all that easy to get the system to be efficient.
 
Simon F said:
bbot said:
Question for SimonF:
exceed the "1 Tflops" power of a cell-based PS3 (with an amazing 72 apus)?
Hmmm.... let's do some back-of-an-envelope maths as a sanity check...

If we assume a conservative 500Mhz clock, then 1TFlop => 200k floating point units. Those would have to be pipelined and so would need several register stages. Let's assume that you have 48 bits (average of 2 floats) of storage for each pipeline stage and perhaps 4 pipeline stages. That would then imply 4.6MB of storage just for the FP units!

I think that'd be a very expensive chip. [BTW Take this with a grain of salt because I'm primarily a software/algorithms researcher not a HW guy]


Besides, having worked on a parallel rendering system (well 32~64 processors) in my previous jobs, it's not all that easy to get the system to be efficient.


Sounds like you don't have faith that Sony/IBM/Toshiba can successfully design a teraflops multiprocessor chip.
 
bbot said:
Sounds like you don't have faith that Sony/IBM/Toshiba can successfully design a teraflops multiprocessor chip.
I don't want to imply that such a thing is infeasible, just that it'd probably be big and expensive.
 
If we assume a conservative 500Mhz clock, then 1TFlop => 200k floating point units. Those would have to be pipelined and so would need several register stages. Let's assume that you have 48 bits (average of 2 floats) of storage for each pipeline stage and perhaps 4 pipeline stages. That would then imply 4.6MB of storage just for the FP units!

:) The patent stated that each APUs having 4 floating point units with expected performance of 32 GFLOPS. It also has another 4 interger units.

The patent seems to imply that there are going to be 4MB of local storage though. It also going to have buses > 128 bit.

I think that'd be a very expensive chip.

I agree :) not forgetting the 64MB of eDRAM that needs to be pack in there.
 
The 1 TFLOPs chip is the BroadBand Engine. it's ment to have 32 APUs (NOT 72) and 4 PowerPC cores. 128k cache per APU. plus 64 MB eDRAM for the whole BE. each APU is ment to have 4 FPUs and 4 integer units. so thats 128 FPUs and 128 interger units total.

the Visualizer has its own set of processors and memory.
 
Back
Top