aaronspink said:ATI R500 48*.5 = 24 Giga Ops per cycle. Don't need to account to wait states because of multiple contexts provided in hardware.
Vince's PS3 on crack 16*4/160 = 400 Mega Ops per cycle.
Didn't we already discuss what an SPU is and didn't I mention it's widely believed that it can already handle 32 simultaneous contexts? Not taking into account the work of the main DMAC and PU in arbitration.
aaronspink said:I'll provide you your answer: a large number of hardware contexts optimized to cover the texture fetch latency.
As I already stated, this isn't dependent upon the unified SIMD Vector|Scalar datapath output (which is what was being discussed), it's dependent upon the complex you build around it. A Synergistic Processor is just this.
aaronspink said:A building block in the Sony design is not optimized for this workload because it can't handle the texture fetch latency as well as various other operations (sampling, filtering, early Z reject, etc) that are designed into the GPU.
I fully agree with you on the latter functions, which is what I believe Sony will utilize, the ROP/Pixel Engine functionality. I do think you're incorrect in your assessment of texture access effeciency.