Panajev2001a
Veteran
Yeah teapots and bowling pin and an armadillo with like 1200 floating points ops per fragment.
Then reduce the number of FP ops
Deferred shading...
Don't need REYES for this.
But this will help the REYES-like renderer...
I think that a micro-polygon based renderer has place in PlayStation 3's lifetime...
Well if it can't operate on per fragment, it needs too, to be competitive with Xbox2 and GC2.
It should operate per fragment too, nothing prohibites the APUs on the Visualizer to process Pixel Programs while the APUs on the Broadband Engine do T&L and run Vertex programs... I just do not see the Pixel Engines in the Visualizer to be OVERLY complex and the architecture being generally designed to push tons of small polygons instead of bigger polygons with high degree of multi-texturing required.
Tons of simple primitives ( single textured or flat shaded ) pushed to a streamlined Rasterizer ( the Pixel Engine part of the GPU ) in large quantities by a monster CPU with tons of local bandwidth...
Shaders will used alot of textured, not just color, but for other things as well, to compute the final fragments either directly or from micropolygons. There aren't going to be alot of single textured stuff.
The Shaders can use how many textures they prefer, I remember the REYES pipeline and what happens in the Shading stage ( texture input is one of those things )... wether they are procedurally generated ones or not...
What I was referring was the configuration of the Rasterizer unit: like DeanoC was commenting in a post about REYES-like renderers a while ago, the Rasterizer doesn't need to be overly complex and doesn't need to do tons of texture layers each cycle...
Textures are sampled in the Shading phase and when the sea of micro-polygons is sent to the rasterizer what we will worry about will be if the Rasterizer can draw them on screen as fast as they come...
The micro-polygon after the Shaders does not arrive with tons of textures to be applied in layers... during the shading phase the textures were sampled and the color of the micro-polygons was processed... if we want to accelarate and use the Shaders to process the micro-polygon until a single texture remains to be applied we can... after all, the GPU will be probably supporting texturing as I do not think they can ask developers to move in mass to a REYES like processing from night to day...
You will still have people using regularl OpenGL pipeline processing ( I think we should see OpenGL 2.0 ).
That Imagine processor is alot like a single PE in BE.
Except with the fact that we should have 4 PEs in the BE and that we have much more local bandwidth thanks to the e-DRAM and the PEs should also be clocked higher than 400 MHz...
Also the BE would have a bit more local memory, the Imagine Stream Processor has 128 KB of SRF ( Stream Register File ) divided between the 8 SIMD clusters while a single PE has 128 KB and thirty-two 128 bits GPRs per each APU...
The BE should have the clock and resource advantage over the Imagine Processor...
I can see the influence of the Imagin on Cell... Sony supposely has been active collaborating with Universities world-wide and they could have brought in the Cell project some of the results they got...
Small polygon, doesn't equate micropolygons and REYES style rendering. Simple polygon, doesn't mean that to compute it you don't required alot of texture.
Look at the Stanford paper comparing REYES and OpenGL... where does the REYES pipeline spends the most time ?
In the Geometry phase ( slicing 'n dicing, Shaders, etc... ) and much less time on the Rasterizing phase...
This kind gives you an idea of where we have the processing bound part of the rendering time... adding to the Pixel Engines capability of doing two/four textures per cycle and then offering loop-back would be wasted for a REYES-like renderer, what we need is a VERY fast CPU to do the Shading part and when we have a 1 TFLOPS class CPU I think we have a nice candidate for the job...
The GPU of PlayStation 3, the Visualizer as described in the patent, contains its fair share of APUs that can assist the Pixel Engines running Pixel Programs or that can assist the over-all rendering of the REYES-like renderer by helping the BE to balance the Geometry processing load...
Distributing the processing load on a Cell system would be facilitated as the architecture was designed to have the standard units of work, the Apulets, travel from Cell to Cell to find the APU that can process them ( in short: software Cells/Apulets can migrate if the host system is running at full capacity and another connected device [the GPU would be "connected" ] has the ability of process the Apulet [and return it back in time... it would be disadvantageous if it took the Apulet more time to be sent, processed and received than waiting for a local APU to be free ).