Jaws said:
REYES pipeline!
We know the 136 inst/cycle matches with E3 but the DOTs/cycle don't. There are NV patents for geometry shaders/programmable primitive processors etc. The 24 PS units will largely remain unmodified but the vertex/geometry pipeline/triangle setup will get modified so that micro-polygons can get shaded like conventional fragments using the PS units!
*Warning: Extreme speculation!*
Reyes isn't just about micropolygons... First you bound every object in camera space, then it's size is tested and if it's too big, it'll be splitted. Any primitive that's outside of the view frustrum gets culled immediately. This loop continues until a given limit is reached, then each small primitive enters the pipeline indipendently. This is where it gets converted into a grid, which is diced into micropolygons. So at this point a large amount of the scene has already been thrown out.
PRMan then shades every vertex of the grid using a SIMD approach, and only after this will the grid undergo the hidden surface evaluation. It's performed in this order because a displacement shader usually moves vertices and thus the results from hiding before shading could be wrong. Micropolygons are tested individually, so you'll have to split the grid apart for this. This approach to visibility testing also means that geometry AA is decoupled from shading AA (which is dependent on the actual size of the micropolygons, ie. grid vertex density).
Motion blur is performed by shading the primitive's vertices at the starting point of its motion, and the movement is always linear. This creates a result that's phisically wrong, but the speed hit is very small for it.
I'm not a coding expert, but I can see the following possible issues with a hw-based reyes renderer:
The efficiency is optimized for huge scene sizes and HOS geometry. Small, simple scenes actually render pretty slow.
PRMan relies heavily on background storage and caching for both geometry and texture data. The way the pipeline works means that you can stream data through it from disk, and throw away a lot of data as soon as it's been processed.
But you do have to keep one information for each pixel, and that is a list of micropolygon vertices that cover any sampling points under that pixel. That's because primitives are streamed in no particular order and the pixel cannot be computed until all the geometry has been processed. This is a HUGE amount of data. PRMan overcomes this problem by bucket rendering, which is basically tiling; so primitives will also get sorted when they're bound in the first stage of the pipeline. This also means that the whole geometry database has to be kept in memory, as a tradeoff for huge lists of visible points.
Implementing Reyes in hardware means that it most likely has to be some sort of a deferred renderer. With offline rendering, PRMan already has the scene data written out to disk in RIB files (which can take more than 50% of total rendering time to create from Maya scenes or whatever), but in a realtime enviroment you have to capture it.
Also, dicing up primitives into grids and then into micropolygons can not be done with a vertex shader AFAIK, and Cell's SPEs probably don't have the bandwith to feed the RSX with micropolygons, that'd then had to be written out into some memory once the shading has been done. Working with tiles, this memory should be located on the chip - but can it store enough data for nextgen complexity?
All in all, Reyes is highly unlikely IMHO, perhaps even for the PS4/Xwhatever.