-Harder to make fast tbdr hardware.
I don't know that I would say that. The thing is there isn't even an isolated example of a TBDR design (and I do not consider Xenos to be a counterexample) which was made by a major manufacturer who had the resources and the ability to target high-end hardware. It's actually not that difficult to scale the concept up to high speeds, but given that the only manufacturers who have attempted are the ones who could never have a shred of hope of making a splash in the market... it only follows that the concept should fail.
Moreover, it doesn't help that neither ATI nor nVidia were even willing to acknowledge that it had a place in the market. ATI simply proclaimed "We're not interested so there's no point in talking about it," while nVidia proclaimed "You're an idiot for asking in the first place. Begone! The power of Christ compels you! The power of Christ compels you!" Business as usual.
-TBDR hardware doesn't give much of a benefit to pixel shaders.
-Vertex shaders are so fast now that the vertex load of a game isn't really a limiting function anymore.
I wouldn't agree with parts of that. I agree with the first point, but I would put pixel shaders in your second point rather than vertex shaders. Vertex throughput is still a pain in the neck, and it's not just shader performance that affects that. We simply stay well under those limits because you'd be in really hot water otherwise.
-Current hardware does tile, not in the same way but it still helps with memory bandwidth usage.
Mmmmm... I'd hardly say that the extent to which it saves bandwidth usage is even worth mentioning. And that is the biggest advantage of working on local small tile caches over simply having your ROPs output quads at a time for every sample that hits that quad.
-The primary advantage of making TBDR hardware today would be lowering fillrate requirements, but you'd make a vastly weaker chip to do so.
I fail to see how implementing TBDR inherently guarantees that the chip MUST be weaker. ROPs that write to small local tilecaches and the logic to writeback tiles on eviction isn't that big a deal, and it's not going to destroy anything else in the chip. Granted, the more fillrate you've got, the more tile caches you might need to avoid pointless evictions.
Frankly, I think you have it backwards. It's not that TBDR made all TBDR GPUs pitiful. They were pitiful to begin with and that would have been more obvious without it.
-Also, nVidia and ATI don't have much experience in making a TBDR, and the parts of the graphics pipeline it helps are already very functional, accurate, and fast, why mess with them? The focus is more on the shader processing power of the chips, which a TBDR can do less to help with and takes up valuable die space. Why try to do something that's already done well differently when any small misstep can cost you large portions of your market?
Because consistent growth trends (particularly, exponential ones) can never be indefinitely sustainable. Memory will never keep up, power consumption will just keep skyrocketing, and the obvious answer of throwing more silicon at the problem is a guaranteed recipe for failure at some point down the line. At that point, something has to give.