Chalnoth said:
Well, as for the triangle setup of a degenerate triangle, you would think it would be exceedingly simple, or possibly bypassed entirely with a smart TnL engine.
I think the smartest thing to filter degenerate triangles by index. If at least two indices of the triangle are equal than it's guaranteed to be degenerate.
It could even be implemented for strips only.
The alorithm is simple: in an indexed strip if the current index (describing a triangle) is equal to the previous one, then this and the followup triangle is skipped (not outputted to the triangle setup).
AFAIK none of the available hardware does this.
As for shader complexity, I suppose it would matter what data the shaders were dependent upon for processing. If the only data used in the vertex program that varies from triangle to triangle is based entirely upon the vertex positions, then you would think that any vertex program would be entirely skipped by just using the post-TnL cache.
The shader makes doesn't treat vertex position any differently than any other data contained within a vertex.
You assume post-TnL cache could be looked up by input data.
I know nVidia doesn't do this, but I'm not aware of the details of other cards.
(The amount of information you can get on ATI's site on how to optimize for their cards is 0)
But, anyway, I suppose this is just one of many situations where a smart architecture can do a lot better than a basic one. It is for this reason that I have an aversion to certain websites' apparent claims that all differences between architectures running at the same clock speed and with similar specs are based upon drivers (See xbitlabs' recent article benchmarking a Quadro, GF4, and R9700 in professional apps...).
Well, theres always some tricks you can do in the drivers.
When a game becomes a popular benchmark its up to IHVs to try to do the optimizations that the original author
forgot to do.
Actually thats true for 3dmark too.