Hmmm.... let's do some back-of-an-envelope maths as a sanity check...bbot said:Question for SimonF:
exceed the "1 Tflops" power of a cell-based PS3 (with an amazing 72 apus)?
If we assume a conservative 500Mhz clock, then 1TFlop => 200k floating point units. Those would have to be pipelined and so would need several register stages. Let's assume that you have 48 bits (average of 2 floats) of storage for each pipeline stage and perhaps 4 pipeline stages. That would then imply 4.6MB of storage just for the FP units!
I think that'd be a very expensive chip. [BTW Take this with a grain of salt because I'm primarily a software/algorithms researcher not a HW guy]
Besides, having worked on a parallel rendering system (well 32~64 processors) in my previous jobs, it's not all that easy to get the system to be efficient.