I myself, am in the TBDR camp, for now. And I am approaching this issue from the POV of practically unlimited memory capacity (since you have the system memory to lean on in a unified system) and memory bandwidth being the primary constraint on performance.
The usual argument against TBDR is that geometry binning is it's Achilles heel and tessellation would just kill it.
Here's a patent describing how it might be handled.
As I understood it, it proposes running the hull shader, the tessellator and the part of domain shader which calculates the final position in the first phase. The patch attributes, and tessFactor are dumped to memory. Since you now know the positions, the overlapping tiles are computed and in those tile lists, only the compressed indices represented the triangles are written. The patch attributes should not be much more than the attribute data that was read by the vertex/hull shader in the first place and the indices should be quite small. All in all, the extra memory bw used should be quite small.
In the second phase, the per tile indices, the patch attributes, are read and the position part of domain shader is re run, HSR is performed, the rest of domain shader runs, and from then on, it's business as usual.
The way I see it, it all comes down to which operation is more bandwidth efficient or has better locality. For an IMR, this would be the hw managed ROP cache. For a TBDR, this would be the object list. Without tessellation, I would argue that the two are probably close but intuitively, it appears that there is more locality in object space. With tessellation, especially with very large tessellation factors, an IMR will have to juggle lots of fragment traffic while this implementation of TBDR will have to deal with patch attributes (which would be small in comparison to fragment traffic as this data doesn't scale with tessFactor's) and compressed indices, which should be very tiny.
The position computation has to be done twice, but the evaluation itself would be very cheap and hence, the real cost would be in displacement map lookups, but one could argue that this will have very good locality and with a good texture cache, this wouldn't scale with tessFactor.
Reference Threads (Good ones, IMO)
http://forum.beyond3d.com/showthread.php?t=37290
http://forum.beyond3d.com/showthread.php?t=11554
The usual argument against TBDR is that geometry binning is it's Achilles heel and tessellation would just kill it.
Here's a patent describing how it might be handled.
As I understood it, it proposes running the hull shader, the tessellator and the part of domain shader which calculates the final position in the first phase. The patch attributes, and tessFactor are dumped to memory. Since you now know the positions, the overlapping tiles are computed and in those tile lists, only the compressed indices represented the triangles are written. The patch attributes should not be much more than the attribute data that was read by the vertex/hull shader in the first place and the indices should be quite small. All in all, the extra memory bw used should be quite small.
In the second phase, the per tile indices, the patch attributes, are read and the position part of domain shader is re run, HSR is performed, the rest of domain shader runs, and from then on, it's business as usual.
The way I see it, it all comes down to which operation is more bandwidth efficient or has better locality. For an IMR, this would be the hw managed ROP cache. For a TBDR, this would be the object list. Without tessellation, I would argue that the two are probably close but intuitively, it appears that there is more locality in object space. With tessellation, especially with very large tessellation factors, an IMR will have to juggle lots of fragment traffic while this implementation of TBDR will have to deal with patch attributes (which would be small in comparison to fragment traffic as this data doesn't scale with tessFactor's) and compressed indices, which should be very tiny.
The position computation has to be done twice, but the evaluation itself would be very cheap and hence, the real cost would be in displacement map lookups, but one could argue that this will have very good locality and with a good texture cache, this wouldn't scale with tessFactor.
Reference Threads (Good ones, IMO)
http://forum.beyond3d.com/showthread.php?t=37290
http://forum.beyond3d.com/showthread.php?t=11554