Digit-Life is the article where I gathered a bit more info:
http://www.digit-life.com/articles2/gffx/index.html
R300 & NV30:
Bilinear filtering 8 textures per clock 8 textures per clock
Trilinear filtering 8 textures per clock 8 textures per clock
Minimal aniso 8 textures per clock 8 textures per clock
http://www.digit-life.com/articles2/gffx/index.html
Now, contrary to what Dave said, it seems that GeForce FX indeed incorporates three vertex processors (that's the same info I heard from other places, so could you check it out again Dave?):1.The highest texture sampling and filtering rate is up to 8 per clock.
2.Number of pixel shader instructions executed per clock cycle: 2 integer and 1 floating-point or 2 (!) texture access instructions. The latter option is possible as during preceding shader's computational operations the texture units could sample texture values with known coordinates beforehand and save them in special temporary registers, which are 16 in all. I.e. the texture units can single out not more than 8 textures per clock but the pixel shader can get up to 16 results per clock.
3.Like the previous generation of the chips, the GeForce FX works with two types of MSAA blocks - 2x diagonal and grid 2x2. The 6xS and 8x AA modes are hybrid modes based on averaging of several base blocks one or another way (pattern).
4.The frame buffer compression works only in the FSAA modes of MSAA, and only on the MSAA blocks level. Hence lossless compression, about 4:1 in the modes with the 4x base MSAA block, and 2:1 for 2x blocks.
5.The chip supports the activity control scheme which controls intensity of operation of a cooling system depending on the load and heating of the chip.
6.The chip doesn't incorporate DVI or TV-Out controllers, like all earlier top NVIDIA's solutions. Integrated controllers are used mainly in mass products.
7.The mass production of the second revision of the chip, which will be used for production cards, is already about to start.
Next comes the myth about Displacement mapping and the basic idea of it's implementation on GeForce FX has surfaced here more than once...It's interesting that the GeForce FX incorporates three vertex processors according to the number of pixels in a triangle, instead of four like in the ATI's product Besides, in case of dynamic implementation the shaders can take a different number of clock cycles for different vertices, but new vertices are started up simultaneously, i.e. the units that have completed execution of shaders wait for those which haven't to start processing three more vertices at the same time. It's clear that dynamic jumps made NVIDIA use additional transistors. Three processors can be a weak point and a quite balanced solution - we still don't have enough information on a performance of a separate vertex processor per unit of clock speed
Next we have the tidbit I was talking about sometime ago...Reportedly, the GeForce FX won't support Displacement Maps and hardware tesselation of N-Patches. That is why the DM technology will probably suffer the same fate as the N-Patches - the support is officially provided in applications, but real models developed for it are absent. If NVIDIA's products do not support the DM, the number of applications potentially supporting it can fall down significantly. At present, the N-Patches and DM are not an obligatory requirement for the DX9 compatibility.
However, the result remains pretty much the same:It's interesting that NVIDIA managed to realize texture fetching commands in a pixel processor without any delays. Even dependent texture fetching going one after another are fulfilled at a clock. This can give the GeForce FX a considerable advantage over the R300 in case of complex shaders.
R300 & NV30:
Bilinear filtering 8 textures per clock 8 textures per clock
Trilinear filtering 8 textures per clock 8 textures per clock
Minimal aniso 8 textures per clock 8 textures per clock
However, the clock speed of the GeForce FX is higher. But real effectiveness of balancing the chip and its performance is yet to be studied.