DeanoC said:
Per-pixel shading is not a very good idea really... A pixel is not the unit you want to be shading at, in general its either too often or not often enough (you don't need per-pixel shading for shadow volumes and you want sub-pixel shading in procedural shaders). If you want true displacement mapping as well, per-pixel shading starts looking even worst.
If you have the processing power, a micro-polygon (Reyes) architecture starts looking very tasty. The hardware "polygons" are very simple, basically linear interpolated colour and depth testing. A higher level takes real polygons and vertices and breaks them into micro-polygons based on the shader program (simple point sample texturing would use one micro-polygon per texel). This allows for easy per-pixel displacement (just move the micro-polygons) and very simple rasterisor hardware (so you get bucket loads of fill-rate).
Of course I have no idea if Sony are going down this route but if they can supply the processing power and the correct hardware it would look lovely. All you really need is a very fast rasterisor, fast general texture lookup and lots and lots of general CPU power (to run the shader to micro-polygon converters).
That is interesting, but it seems that it would eat quite a lot of CPU resources as T&L will be REALLY intensive... What would you say about ~1 TFLOPS and ~1 TOPS ( Integer ) ?
What can you say regarding your idea and the processor described in this patent ?
http://makeashorterlink.com/?B4DB23903
Basically We could also run pixel programs as the Visualizer chip described in the patent and in one of the images attached to it would be programmable and if we wanted it to operate with pixels I think it could do it... of course it could also work as the simple and fast Rasterizer you are talking about with the Broadband Engine ( as seen in the patent ) should have enough power to dynamically tesselate visible surfaces ( hey we can do deferred T&L... sort the HOSs' control points and tessellate only the visible patches... ) to micro-polygons and light them...
Out of 1 TFLOPS ( theoretical max ) how much do you think would be left for Physics and other FP intensive game code ? ( and if this takes also a hit on the ALUs, how much would it be ? )
Very quick spec sheet for the Broadband Engine and the Visualizer ( as described in the patent ):
Broadband Engine:
4 PEs:
Each PE has:
8 APUs and 1 PU
PU: RISC processor ( it could be a compact PowerPC derivative )
APU: 4 FP Units, 4 Integer Units ( correct me if I am wrong, but the 4 FP Units and the 4 Integer Units can work in SIMD mode and each can deliver a 128 bits result per cycle [Fused Multiply-Add, FP and Integer], someone pointed out that each FP Units is a SIMD VU, but that seems contraddicting the patent, the drawings attached to the patent and common sense ), 128 KB of Local Storage ( SRAM ) and thirty-two 128 bits registers ( shared between the 4 FP Units and the 4 Integer Units ).
64 MB of e-DRAM
Visualizer:
4 PEs: each PE has 4 APUs, 1 PU and 1 Pixel Engine + Image Cache + CRTC ( the Pixel Engine, Image Cache and CRTC replace the 4 APUs you normally find in the PE... this should be helpful for manufacturing as the Visualizer could be a "slightly" modified Broadband Engine and they would share lots of functional blocks... ).
Unspecified amount of e-DRAM: I suspect 64 MB of e-DRAM ( it would make it easier to manufacture as we could use the same manufacturing lines for both Broadband Engine and Visualizer... ).
FP performance of 1 APU ( I suspect that this result is considered for the Broadband Engine and that the Visualizer would ship at a lower clock-speed ) would be 32 GFLOPS which assumes a 4 GHz clock-speed ( the e-DRAM doesn't have to run at that clock-speed ) for the APU ( target process should be 65 nm [65 nm WILL be ready by mid 2004... Toshiba's engineers which co-developed this process with Sony, and with ideas from the 100 nm SOI IBM technology which Sony licensed, affirmed that they are confident to have fabs ready for early production by March 2004 in time to start speeding up and get the fabs ready to mass-manufacture Cell chips for a mid 2005 launch in Japan with the North American launch following few months later...] with 45 nm to follow as soon as it is ready for mass manufacturing... )...