If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#1 |
|
Senior Member
|
I myself, am in the TBDR camp, for now. And I am approaching this issue from the POV of practically unlimited memory capacity (since you have the system memory to lean on in a unified system) and memory bandwidth being the primary constraint on performance.
The usual argument against TBDR is that geometry binning is it's Achilles heel and tessellation would just kill it. Here's a patent describing how it might be handled. As I understood it, it proposes running the hull shader, the tessellator and the part of domain shader which calculates the final position in the first phase. The patch attributes, and tessFactor are dumped to memory. Since you now know the positions, the overlapping tiles are computed and in those tile lists, only the compressed indices represented the triangles are written. The patch attributes should not be much more than the attribute data that was read by the vertex/hull shader in the first place and the indices should be quite small. All in all, the extra memory bw used should be quite small. In the second phase, the per tile indices, the patch attributes, are read and the position part of domain shader is re run, HSR is performed, the rest of domain shader runs, and from then on, it's business as usual. The way I see it, it all comes down to which operation is more bandwidth efficient or has better locality. For an IMR, this would be the hw managed ROP cache. For a TBDR, this would be the object list. Without tessellation, I would argue that the two are probably close but intuitively, it appears that there is more locality in object space. With tessellation, especially with very large tessellation factors, an IMR will have to juggle lots of fragment traffic while this implementation of TBDR will have to deal with patch attributes (which would be small in comparison to fragment traffic as this data doesn't scale with tessFactor's) and compressed indices, which should be very tiny. The position computation has to be done twice, but the evaluation itself would be very cheap and hence, the real cost would be in displacement map lookups, but one could argue that this will have very good locality and with a good texture cache, this wouldn't scale with tessFactor. Reference Threads (Good ones, IMO) http://forum.beyond3d.com/showthread.php?t=37290 http://forum.beyond3d.com/showthread.php?t=11554 |
|
|
|
|
|
#2 |
|
Senior Member
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
|
|
|
|
|
|
|
#3 |
|
Regular
|
Hopefully flexible enough not to tie up multipliers for MSAA Z-comparisons too.
__________________
Cinematic is the new streamlined. |
|
|
|
|
|
#4 |
|
Epsilon plus three
Join Date: Feb 2002
Location: Chania
Posts: 7,818
|
***delete
__________________
People are more violently opposed to fur than leather; because it's easier to harass rich ladies than motorcycle gangs. |
|
|
|
|
|
#5 | |
|
Epsilon plus three
Join Date: Feb 2002
Location: Chania
Posts: 7,818
|
Quote:
As for the less foreseeable future beyond roughly half a decade I doubt that IMG intends to go a more sw oriented route, nor that Intel in the meantime won't utilize fore mentioned IMG GPU IP.
__________________
People are more violently opposed to fur than leather; because it's easier to harass rich ladies than motorcycle gangs. |
|
|
|
|
|
|
#6 | |
|
Senior Member
|
Quote:
Jut because future hardware will be flexible enough to do both techniques doesn't mean that it will do both techniques equally efficiently. So yes, picking sides matters. |
|
|
|
|
|
|
#7 | ||
|
Member
Join Date: Mar 2002
Location: UK
Posts: 570
|
Quote:
With such a limited amount of memory you would quickly need to spill to extenal memory creating performance cliff edges or serious limitations on what you can do. Quote:
John. |
||
|
|
|
|
|
#8 | |||
|
Senior Member
|
Quote:
Quote:
Quote:
|
|||
|
|
|
|
|
#9 | |||
|
Member
Join Date: Mar 2002
Location: UK
Posts: 570
|
Quote:
Configuring the RAM as a cache won't help unless you have enough memory to encompass the full expanse and layers of the pixels that use the UAV. Obviously there are post processing hacks for AA which aren't too bad that you could argue make a reasonable replacement to brute force multi-sampling (I would argue that they're not good enough). For MRT's I'm not seeing any practical replacement so you still have to accommodate their footprint somewhere. There's also environment maps and shadow maps to consider, the latter of which need even more memory. Quote:
Quote:
John. |
|||
|
|
|
|
|
#10 |
|
Senior Member
|
How about constructing the TAG buffer for the entire frame on chip in one go? That should work.
|
|
|
|
|
|
#11 | |
|
Member
Join Date: Mar 2002
Location: UK
Posts: 570
|
Quote:
John. |
|
|
|
|
|
|
#12 |
|
Senior Member
|
But then, it isn't TBDR anymore, is it?
|
|
|
|
|
|
#13 |
|
Member
Join Date: Mar 2002
Location: UK
Posts: 570
|
Eh? That's a moot point as even though doing a full frame tag buffer would mean you're not a tiler it still doesn't buy you anything relative to an IMR i.e. it still suffers from needing large amounts of on chip memory in order to efficiently support things like G Buffers.
|
|
|
|
|
|
#14 |
|
Unknown.
Join Date: Aug 2002
Location: UK
Posts: 4,882
|
I've got to agree with JohnH here. However I do believe that there is one very good use case for a large block of SRAM on an IMR: keep the current Z-Buffer completely on-chip! The coolest part is that neither ridiculous resolutions nor MSAA are a fundamental obstacle because you want to support Z Compression anyway (think shadowmaps).
So you could have a very simple scheme where you have 4MB of SRAM on chip (enough for 1280x720 0xMSAA without compression!) and reserve the full framebuffer size in external memory anyway. If the compression ratio for a tile is good enough, depth-related bandwidth is zero. If the compression ratio isn't good enough, you write part of the tile to your on-chip SRAM and the remaining part to DRAM. So if you had a moderately complex tile, you might still save 50% bandwidth, and even for very complex tiles you might save 10% (for example) on both reads and writes. If the depth buffer is required afterwards (e.g. shadowmaps) you write the data from the on-chip SRAM to the already reserved DRAM memory locations, nothing more and nothing less. If you had a 2D GUI without a Z Buffer, you could reuse that SRAM as a gigantic cache (blending, textures, etc.) but I'm honestly not sure how beneficial that would be compared to the Z-Buffer case (it could be nice for GPGPU though). You wouldn't get most of the benefits of a TBDR but you wouldn't get the binning overhead either. This gets us back to the original topic of this thread which is ways to minimise the binning overhead. Tesselation is a very interesting and important corner case where specific optimisations can help a lot but there certainly are things you can do to improve the general case as well. This kind of discussion is obviously (and sadly) very sensitive for legal reasons - I don't think it's a coincidence John isn't replying to the topic's original subject here, and I certainly can't blame him for it!
__________________
Focusing on non-graphics projects in 2013 (but I still love triangles) "[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions." |
|
|
|
|
|
#15 |
|
Epsilon plus three
Join Date: Feb 2002
Location: Chania
Posts: 7,818
|
Honestly I expected a worthier analysis from you Arun on the patent itslef. Not really an Uttargram from hell (God help!) but you know what I mean
__________________
People are more violently opposed to fur than leather; because it's easier to harass rich ladies than motorcycle gangs. |
|
|
|
|
|
#16 | |
|
Member
Join Date: Mar 2002
Location: UK
Posts: 570
|
Quote:
Not saying that these sort of problems are insurmountable, just pointing out practicalities. |
|
|
|
|
![]() |
| Tags |
| early z, imr, memory bandwidth, tbdr, tessellation |
| Thread Tools | |
| Display Modes | |
|
|