Xmas said:One big advantage of an alternate tile rendering approach is the inherent load balancing, so the software doesn't have to do that.
Doesn't AFR accomplish that as well tho? And ATI already has experience with that.
Xmas said:One big advantage of an alternate tile rendering approach is the inherent load balancing, so the software doesn't have to do that.
LINKsireric said:The R3xx and the R4xx have a rather interesting way of tiling things. Our setup unit sorts primitives into tiles, based on their area coverage. Some primitives fall into 1 tile, some into a few, some cover lots. Each of our backend pixel pipes is given tiles of work. The tiles themselves are programmable in size (well, powers of 2), but, surprisingly, we haven't found that changing their size changes performance that much (within reason). Most likely due to the fact that with high res displays, most primitives are large. There is a sweet spot in the performance at 16, and that hasn't changed in some time. Even the current X800 use 16, though I think we need to revisit that at some point in the future. Possibly on a per application basis, different tile sizes would benefit things. On our long list of things to investigate.
Anyway, each pipe has huge load balancing fifos on their inputs, that match up to the tiles that they own. Each pipe is a full MIMD and can operate on different polygons, and, in fact, can be hundreds of polygons off from others. The downside of that is memory coherence of the different pipes. Increasing tile size would improve this, but also requires larger load balancing. Our current setup seems reasonably optimal, but reviewing that, performance wise, is on the list of things to do at some point. We've artificially lowered the size of our load balancing fifos, and never notice a performance difference, so we feel, for current apps, at least, that we are well over-designed.
In general, we have no issues keeping all our units busy, given the current polygons. I could imagine that if you did single pixel triangles in one tile over and over, that performance could drop due to tiling, but memory efficiency would shoot up, so it's unclear that performance overall would be hurt. The distribution of load accross all these tiles is pretty much ideal, for all the cases we've tested. Super tiling is built on top of this, to distribute work accross multiple chips.
As well, just like other vendors, we have advance sequences that distribute alu work load to our units, allocate registers and sequence all the operations that need to be done, in a dynamic way. That's really a basic requirement of doing shading processing. This is rarely the issue for performance.
Performance issues are still very texture fetch bound (cache efficiency, memory efficiency, filter types) in modern apps, as well as partially ALU/register allocation bound. There's huge performance differences possible depending on how your deal with texturing and texture fetches. Even Shadermark, if I recall correctly, ends up being texture bound in many of its cases, and it's very hard to make any assumptions on ALU performance from it. I know we've spent many a time in our compiler, generating various different forms of a shader, and discovering that ALU & register counts don't matter as much as texture organization. There are no clear generalizable solutions. Work goes on.
Guru3D said:Back to the meeting, pretty basic stuff we talked about SLI. We know for sure that their cores can handle it but after going a little deeper into the conversation I couldn't help feeling an rather antagonistic tone from Chris Hook regarding SLI and of course NVIDIA's approach (drivers/scalability) to it. Despite the rumors I'm really not confident that ATI will do some sort of SLI solution for the big market. The fact that they actually could do it is something else.
I preferred the plain English version.tEd said:..or http://v3.espacenet.com/textdes?DB=EPODOC&IDX=EP1424653&F=0&QPN=EP1424653 for a more in-depth description
Xmas said:DemoCoder said:What do you mean by hardware solution? Clearly, the bulk of the work for nV SLI is being done by drivers. If the SLI connector is just there to transfer front buffers for display, then clearly the driver is pulling all the weight through the PCI-E bus.
I had the impression Dave was talking about ATI's solution ("SuperTiling" on multi-chip boards), not NVidia's.
JawedMVP supports split frame rendering using supertiling, where the screen is split up into tiled areas with each tile processed on a GPU, using any GPUs that support supertiling. That's anything from R300 up, but it's likely to be limited to R4xx GPUs. You can use X700 and X800, X800 XL and X850 XT PE, or any other mix that you can think of. There's the potential to increase anti-aliasing IQ using supertiling (multipassing the tiles through a GPU) and MVP.
[url=http://www.beyond3d.com/forum/viewtopic.php?p=190448#190448 said:Sireric[/url]]For non-AA, each chip renders part of the scene, the frame being divided into tiles (called "Supertiling"). Each chip only renders and sees pixels in its tiles.
For 2xAA, 4xAA and 6xAA, it's the same principle.
For higher AA modes (8xaa, 12xaa, 16xaa, 24xaa), chips start rendering the same pixels, but each chip renders a different version of the pixels, as was described above.
Jawed said:Temporal AA was an interesting tech for a while, but does anyone use it?
Jawed
tEd said:..or http://v3.espacenet.com/textdes?DB=EPODOC&IDX=EP1424653&F=0&QPN=EP1424653 for a more in-depth description
DaveBaumann said:Issue? Never and issue, but a choice. IMO, if this is the route that is chosen on then AA will probably be one of the primarly selling points.
geo said:So a dually would hypothetically be able to do 12x?
Sounds like ATI's solution is entirely for boosting pixel processing speeds, meaning that two cards can be geometry limited just as easily as one. Is this correct?DaveBaumann said:Should that be the method employed by a solution from ATI, should they be seeking a multi graphics solution, Rys hasn't quite got how the AA may work right, IMO.
[url=http://www.beyond3d.com/forum/viewtopic.php?p=190448#190448 said:Sireric[/url]]For non-AA, each chip renders part of the scene, the frame being divided into tiles (called "Supertiling"). Each chip only renders and sees pixels in its tiles.
For 2xAA, 4xAA and 6xAA, it's the same principle.
For higher AA modes (8xaa, 12xaa, 16xaa, 24xaa), chips start rendering the same pixels, but each chip renders a different version of the pixels, as was described above.
Ostsol said:Sounds like ATI's solution is entirely for boosting pixel processing speeds, meaning that two cards can be geometry limited just as easily as one. Is this correct?DaveBaumann said:Should that be the method employed by a solution from ATI, should they be seeking a multi graphics solution, Rys hasn't quite got how the AA may work right, IMO.
[url=http://www.beyond3d.com/forum/viewtopic.php?p=190448#190448 said:Sireric[/url]]For non-AA, each chip renders part of the scene, the frame being divided into tiles (called "Supertiling"). Each chip only renders and sees pixels in its tiles.
For 2xAA, 4xAA and 6xAA, it's the same principle.
For higher AA modes (8xaa, 12xaa, 16xaa, 24xaa), chips start rendering the same pixels, but each chip renders a different version of the pixels, as was described above.
Ostsol said:Sounds like ATI's solution is entirely for boosting pixel processing speeds, meaning that two cards can be geometry limited just as easily as one. Is this correct?