Mintmaster
Veteran
Nice article, as usual.
Couple of things:
One thing you should realize is that pure math logic isn't very expensive at all. It's the routing and temporary storage of data that uses most of the transistors. Filtering alone needs only a fraction of the logic of a shader core, and triangle setup needs to be done in front of the triangle rasterization. I'm not too sure why triangle setup hasn't been improved beyond once per clock, but I think there may be difficulties in parallelizing while preserving order of the triangles and their quads throughout the pipeline. I don't see anything that can't be overcome, though.
Also, on the last page, don't you mean 1/10th of a terazixel instead of petazixel? I have a tough time believing a 1500 fold increase over G80.
Couple of things:
Moving these things to the shader core require gobs of operand bandwidth. For custom filtering, you can already do point samples and whatever you want from there.We truly wonder when IHVs will get a clue and move triangle setup to the shader core (to improve performance) and make texture filtering/ROP blending programmable (even if it hurts performance when running custom code).
One thing you should realize is that pure math logic isn't very expensive at all. It's the routing and temporary storage of data that uses most of the transistors. Filtering alone needs only a fraction of the logic of a shader core, and triangle setup needs to be done in front of the triangle rasterization. I'm not too sure why triangle setup hasn't been improved beyond once per clock, but I think there may be difficulties in parallelizing while preserving order of the triangles and their quads throughout the pipeline. I don't see anything that can't be overcome, though.
Also, on the last page, don't you mean 1/10th of a terazixel instead of petazixel? I have a tough time believing a 1500 fold increase over G80.