WinHEC slides give info on early spec of DX in longhorn

Optimization wouldn't come from doing fewer calculations, but rather just having the hardware itself active more of the time. And calculations have been invariant between shaders and fixed function for a long time now (even on GF3 hardware).
 
Chalnoth said:
Optimization wouldn't come from doing fewer calculations, but rather just having the hardware itself active more of the time. And calculations have been invariant between shaders and fixed function for a long time now (even on GF3 hardware).

no. they haven't. thats why option invariant was implemented! heck.. get real.


and the optimisations you mentoin are.. not possible. the vertex pipeline has always the same speed, and if it does not have a special short way for the fixed function, but has to go through the vs units, it simply can't get any faster while doing the same.

it eighter has a fixed function half-way, or not. you can't optimise anything else.


and option invariant shows very clearly there is a fixed function there, and you can choose it to process the vertices, while doing the rest of the shading in the vs => there is a special hw path for fixed function transformations.

taken out on gf6, possibly, yes. but definitely not gf3/gf4. gffx, i don't really know.
 
You're still assuming that when executing the shader that emulates fixed function, maximal efficiency of available computational resources is maintained.
 
you have to do the same computation, it's 4 dp4. how can you optimize that? if the vs is not capable of doing such a simple thing at its full performance possible, then those gpus have a very shitty implementation.

how can you beat the vs unit with a shader that can not be simplified except if you move to dedicated hw paths? i'm not saying a different hw. it can share the computational resources. but it switches to another hw path, and does not use the vs path.

and it is not the equal result that spits out. variance issues do exist. there is dedicated hw.
 
As much as I know how difficult it is to alter Chalnoths "belief" systems, ERPS's explanation is the same that I have had previously from NVIDIA – when they moved to a vertex shader platform the transformation end had always been done with vertex programs, whilst they left some specific hardware for lighting. That was present from NV2x since GF3 only had one vertex shader its performance would have been significantly lower than hardwired T&L from NV1x (especially the GTS Ultra) – performance scaling indicated that this was still present in NV3x. I have no specific information on NV40, but the die constraints seem to indicate that that they would want to remove this at this point and the NV4x platform appears to be the cleanest sweep of legacy generation than I’ve seen from NVIDIA in a while – the overall VS performance may now completely negate the performance differential as it is anyway.

IIRC ATI never went along these lines because their first programmable part, 8500, already had two VS’s and hence fairly good vertex program T&L performance in the first place, although I seem to remember that both VS appeared to only work in parallel in a few applications.
 
Back
Top