Luminescent
Veteran
I read very recently that the geforce 3 displayed increased latency when executing rsq and rcp functions due to the two passes required by its "special purpose unit" (a unit alongside the vec4 processor?). Furthermore, in an extremetech session with an Nvidia employee, the nv-25 pipeline was discussed and admittedly elongated to provide the necessary latency for these instructions (maybe for the purpose of single pass execution) but I'm assuming the execution unit remained the same. Now, in vertex program tests run on the radeon 8500 and the geforce 3, (http://216.239.51.100/search?q=cach...40/final.pdf+radeon+vertex+alu&hl=en&ie=UTF-8, http://www.reactorcritical.com/review-battletitans2/review-battletitans2_2.shtml) the latency of complex programs on the hardware (rsqu and rcp primarily) remained approximately the same as that of the simple programs while the geforce 3 significantly decreased performance. How does the radeon VS implementation differ from that of the geforce architecture, does it have a dedicated scalar unit running alongside the vec4 unit?
Also, the radeon 8500 VS is capable of storing 192 constants as opposed to 96 and contains more registers than needed. Shouldn't the vertex shader thus be more capable (standalone) than that of the geforce 4ti. And can anyone confirm the fact that it also contains a hardwired unit, unlike the geforce 3 and 4, which emulate T&L through a vertex program?
I know it is a little late to discuss this architecture but I hear so much about newer competing architectures being superior in implementation, but theoretically this should be a strong competitor. Finally, why does a 60 million transistor processor with support for an advanced VS, PS 1.4, a hardwired T&L unit, and truform have less of a transistor count than the 63 million transistor geforce 4ti. Is it the lightspeed memory architecture and the amount of cache? Maybe the 8500 has more logic and less cache, which explains the single pass inefficiency issues in Doom 3.
Thankyou
Also, the radeon 8500 VS is capable of storing 192 constants as opposed to 96 and contains more registers than needed. Shouldn't the vertex shader thus be more capable (standalone) than that of the geforce 4ti. And can anyone confirm the fact that it also contains a hardwired unit, unlike the geforce 3 and 4, which emulate T&L through a vertex program?
I know it is a little late to discuss this architecture but I hear so much about newer competing architectures being superior in implementation, but theoretically this should be a strong competitor. Finally, why does a 60 million transistor processor with support for an advanced VS, PS 1.4, a hardwired T&L unit, and truform have less of a transistor count than the 63 million transistor geforce 4ti. Is it the lightspeed memory architecture and the amount of cache? Maybe the 8500 has more logic and less cache, which explains the single pass inefficiency issues in Doom 3.
Thankyou