- Sampler runs at scheduler clock (half the hot clock)
- 4 samplers per cluster (64 total)
- Sampler will do jittered-offset for Gather4 (no idea how, the texture-space offset is constant per call)
- 4 tris/clock setup and raster
- Raster area per unit is now 2x4 rather than 2x16
- PolyMorph Engine (heh), effectively pre-PS FF, one per cluster
- ROPs now each take 24 coverage samples (up from 8)
- Compression is improved, 4x->8x delta drop is less than GT200 clock-for-clock
- Display engine improvements
That's the list of the stuff I either got wrong or missed in my article at TR, concerning the graphics. Biggest thing is probably the > 1tri/clk for small triangles, and the change in the per-clock rasterisation area for each of the four units. Aggregate setup and rasterisation performance is no faster per clock than G80+ for triangles that are > 32 pixels.
Sampler count was out by 2x, so NV will need a > 1.6 GHz hot clock to beat a GTX 285 in peak possible texture performance, and there's a distinct lack of information about the sampler hardware in the latest whitepaper. Doing more digging there, but it looks like no change to texturing IQ other the ability to jitter the texcoords per sample during an unfiltered fetch.
NV claim that everything they list in the PolyMorph block exists as a physical block in the silicon. Obviously interesting thing there that didn't exist before is the tessellator, and it seems the fixed block there is responsible for generating the new primitives (or killing geometry too), and the units run in parallel (where possible), with most other stuff running on the SM.
As for my clock estimates, I doubt 1700 MHz hot clock at launch
sad
, but the base clock should be usefully higher, up past 700 MHz. They still haven't talked about GeForce productisation or clocks, but at this point it looks unlikely the fastest launch GeForce will texture faster than a GTX 285.
That's about it, will have an article up ASAP.