Panajev2001a
Veteran
This, running at 3-4 GHz ( possible IMHO... the only parts that would run that high would be the APUs, the busses would only run at 1 GHz [they are 1,024 bits busses after-all, this would already mean 128 GB/s at 1 GHz] and the e-DRAM could run at 1 GHz SDR or 500 MHz but use DDR signalling for data transfers ) would provide a nice amount of power.
Features:
24 APUs in total
2x Pixel Engine + Image Cache + CRTC controller ( each Pixel Engine fills an independent triangle as the pixel pipelines are not tied ).
Shared e-DRAM block.
4 PEs: 2 of them are of the "Visualizer" kind.
APUs' clock: 3-4 GHz
PUs' clock: 1.5-2 GHz ( preferrably it stays 1/2 the APU speed ) or less ( APUs can be pretty independent )
FP/FX Performance: 24 APUs * 8 ops/clock [FP or FX] * 3-4 GHz = 576-768 GFLOPS/GOPS
Bus Bandwidth local: ~128 GB/s ( 128 bytes per cycle or 1,024 bits per cycle )
Local Storage to Register File Bandiwdth ( in each APU ): 256 bits per cycle ( two 128 bits vectors worth of data ).
A Redwood based bus could connect all 4 PEs together: in the diagrams it specifies it to be naturallya 1,024 bits bus, but Redwood's high Data Signalling Rate could allow for a smaller bus or maybe Redwood would be used to connect this big MPU and the I/O ASIC which would contain the Memory Controller for the External Memory.
With this configuration all the concerns of Mr. Dave Baumann should disappear