Now that I have a little bit of time on my hands....
-Harder to make fast tbdr hardware.
Pardon? I presume you are refering to clock rate? Well it is harder to make hardware with a higher clockrate than a lower clock rate but that does not necessarily give you good returns when you factor in the silicon costs. It is probably true that a tbdr may be a bit more complex than a standard renderer, but standard renderers are also jumping through hoops to lower overdraw.
-TBDR hardware doesn't give much of a benefit to pixel shaders.
How did you arrive at that conclusion? I would say you have that completely back-the-front.
-Vertex shaders are so fast now that the vertex load of a game isn't really a limiting function anymore.
I don't see how that is relevant. Besides, it's unlikely that the vertex load has ever
really been the major bottleneck in renderering. It's nearly always been fillrate.
Current hardware does tile, not in the same way but it still helps with memory bandwidth usage.
It does lower page breaks etc, yes.
The primary advantage of making TBDR hardware today would be lowering fillrate requirements, but you'd make a vastly weaker chip to do so.
Again, how do you come to these conclusions?
Perhaps.
I remember the add-in boards they made using those chips. I was tempted to get one myself because I'd seen virtua fighter or if it was soul calibur or somesuch fighting game on DC and thought it was pretty much the coolest smoothest thing I ever saw.
However I got discouraged by reports from all over those boards were more trouble than not in many games.
Is it so wrong then to assume where there's smoke there's also fire?
There
were a few items that caused problems from time to time.
- Pre-Kyro, the chips did not provide a way to save out the Z-buffer and some games absolutely insisted on reading back the odd pixel from the Z-buffer. This was usually always detrimental to performance on any architecture.
- Some games insisted that there had to be hardware T&L which was just ridiculous - it turned out that the x86 CPU was nearly always more than fast enough to do those calcuations and run the game logic as well. The work around was simply to lie to the application and push the vertices through the CPU.
- Some games insisted on a particular (optional) texture format which might not have been supported.
There may have been some others but I can't recall what they were. They usually stemmed from not checking the DX caps flags and coding accordingly. <sigh>
One giant issue on consoles is the memory use of TBDR.
nAo (or DeanoC) mentioned he's pushing 2M polygons per frame in some parts of HS which would have to be binned for deferred rendering. Post-transform vertex size can easily be over 100 bytes, and I don't think Ninja Theory would be happy if they had 100-200MB less RAM to work with.
I have two issues with this:
* How do you come up with a figure of 100 bytes per vertex? If we assume that maps entirely to IEEE floats, that's ~25 values. If we assume 4 are for the position data, we end up with ~20 for colour and texture data. That seems to imply
quite a number of texture layers. If this
is the case, then the cost of vertex data is going to be insignificant compared the time spent shading!
* The second problem I have with this is that you are assuming that you do have to keep all the data before rendering.
There are some ways to reduce this like trying to separate position and iterator parts of the vertex shader, or doing two passes on the geometry and storing a bitmask the first time, but it gets messy and either reduces vertex throughput or requires much more vertex-related silicon.
I'm sorry, but I really have no idea what you are trying to say here.