ATI Multi-VPU up and running. . .

Xmas said:
One big advantage of an alternate tile rendering approach is the inherent load balancing, so the software doesn't have to do that.

Doesn't AFR accomplish that as well tho? And ATI already has experience with that.
 
sireric said:
The R3xx and the R4xx have a rather interesting way of tiling things. Our setup unit sorts primitives into tiles, based on their area coverage. Some primitives fall into 1 tile, some into a few, some cover lots. Each of our backend pixel pipes is given tiles of work. The tiles themselves are programmable in size (well, powers of 2), but, surprisingly, we haven't found that changing their size changes performance that much (within reason). Most likely due to the fact that with high res displays, most primitives are large. There is a sweet spot in the performance at 16, and that hasn't changed in some time. Even the current X800 use 16, though I think we need to revisit that at some point in the future. Possibly on a per application basis, different tile sizes would benefit things. On our long list of things to investigate.

Anyway, each pipe has huge load balancing fifos on their inputs, that match up to the tiles that they own. Each pipe is a full MIMD and can operate on different polygons, and, in fact, can be hundreds of polygons off from others. The downside of that is memory coherence of the different pipes. Increasing tile size would improve this, but also requires larger load balancing. Our current setup seems reasonably optimal, but reviewing that, performance wise, is on the list of things to do at some point. We've artificially lowered the size of our load balancing fifos, and never notice a performance difference, so we feel, for current apps, at least, that we are well over-designed.

In general, we have no issues keeping all our units busy, given the current polygons. I could imagine that if you did single pixel triangles in one tile over and over, that performance could drop due to tiling, but memory efficiency would shoot up, so it's unclear that performance overall would be hurt. The distribution of load accross all these tiles is pretty much ideal, for all the cases we've tested. Super tiling is built on top of this, to distribute work accross multiple chips.
As well, just like other vendors, we have advance sequences that distribute alu work load to our units, allocate registers and sequence all the operations that need to be done, in a dynamic way. That's really a basic requirement of doing shading processing. This is rarely the issue for performance.

Performance issues are still very texture fetch bound (cache efficiency, memory efficiency, filter types) in modern apps, as well as partially ALU/register allocation bound. There's huge performance differences possible depending on how your deal with texturing and texture fetches. Even Shadermark, if I recall correctly, ends up being texture bound in many of its cases, and it's very hard to make any assumptions on ALU performance from it. I know we've spent many a time in our compiler, generating various different forms of a shader, and discovering that ALU & register counts don't matter as much as texture organization. There are no clear generalizable solutions. Work goes on.
LINK
 
What's up with this? A webbie seeing what he wants/expects to see, or something deeper?

Guru3D said:
Back to the meeting, pretty basic stuff we talked about SLI. We know for sure that their cores can handle it but after going a little deeper into the conversation I couldn't help feeling an rather antagonistic tone from Chris Hook regarding SLI and of course NVIDIA's approach (drivers/scalability) to it. Despite the rumors I'm really not confident that ATI will do some sort of SLI solution for the big market. The fact that they actually could do it is something else.

http://www.guru3d.com/article/cebit/192/6/
 
Xmas said:
DemoCoder said:
What do you mean by hardware solution? Clearly, the bulk of the work for nV SLI is being done by drivers. If the SLI connector is just there to transfer front buffers for display, then clearly the driver is pulling all the weight through the PCI-E bus.

I had the impression Dave was talking about ATI's solution ("SuperTiling" on multi-chip boards), not NVidia's.

Yes .. I too got the impression that Dave was talking ATI and not Nvidia.

US
 
Hexus has an interesting take:

http://www.hexus.net/content/reviews/review.php?dXJsX3Jldmlld19JRD0xMDQx

MVP supports split frame rendering using supertiling, where the screen is split up into tiled areas with each tile processed on a GPU, using any GPUs that support supertiling. That's anything from R300 up, but it's likely to be limited to R4xx GPUs. You can use X700 and X800, X800 XL and X850 XT PE, or any other mix that you can think of. There's the potential to increase anti-aliasing IQ using supertiling (multipassing the tiles through a GPU) and MVP.
Jawed
 
Should that be the method employed by a solution from ATI, should they be seeking a multi graphics solution, Rys hasn't quite got how the AA may work right, IMO.

[url=http://www.beyond3d.com/forum/viewtopic.php?p=190448#190448 said:
Sireric[/url]]For non-AA, each chip renders part of the scene, the frame being divided into tiles (called "Supertiling"). Each chip only renders and sees pixels in its tiles.

For 2xAA, 4xAA and 6xAA, it's the same principle.

For higher AA modes (8xaa, 12xaa, 16xaa, 24xaa), chips start rendering the same pixels, but each chip renders a different version of the pixels, as was described above.
 
Issue? Never and issue, but a choice. IMO, if this is the route that is chosen on then AA will probably be one of the primarly selling points.
 
But it'll still be MSAA. It seems to me the only peeps left caring about AA are the ones who want SSAA. I can't see ATI embracing SSAA, especially with all the caveats it has. ATI's number one rule above all others seems to be "no more SSAA".

Temporal AA was an interesting tech for a while, but does anyone use it?

Jawed
 
Technically its a hybrid of MSAA and SSAA anyway since, should there be two rendering devices in a graphics subsystem, both will be doing the entire workload for the the same pixel. Although supporting straight MSAA over two chips may be easier, they could alter the sample positions of each chip to produce 2x SSAA with 4x or 6x MSAA.
 
DaveBaumann said:
Issue? Never and issue, but a choice. IMO, if this is the route that is chosen on then AA will probably be one of the primarly selling points.

Indeed. Personally I've been saying SLI wouldn't interest me much until it allows you to crank up the IQ beyond single-card solutions.

I started salivating when I noticed that thread you pointed at is about 24x AA. . .then I noticed that was for a 4x R300 solution, which makes sense when a single ATI card today does 6x. So a dually would hypothetically be able to do 12x?
 
geo said:
So a dually would hypothetically be able to do 12x?

Tho I suppose if we really wanted to be free-swingers, we could guess that with the increase to 512mb that a bump to 8x single-card is in the works. . .then throw Temporal in there. . .and now your dually-solution is 32xt. . .that starts the ole salivary glands going. . .
 
DaveBaumann said:
Should that be the method employed by a solution from ATI, should they be seeking a multi graphics solution, Rys hasn't quite got how the AA may work right, IMO.

[url=http://www.beyond3d.com/forum/viewtopic.php?p=190448#190448 said:
Sireric[/url]]For non-AA, each chip renders part of the scene, the frame being divided into tiles (called "Supertiling"). Each chip only renders and sees pixels in its tiles.

For 2xAA, 4xAA and 6xAA, it's the same principle.

For higher AA modes (8xaa, 12xaa, 16xaa, 24xaa), chips start rendering the same pixels, but each chip renders a different version of the pixels, as was described above.
Sounds like ATI's solution is entirely for boosting pixel processing speeds, meaning that two cards can be geometry limited just as easily as one. Is this correct?
 
Ostsol said:
DaveBaumann said:
Should that be the method employed by a solution from ATI, should they be seeking a multi graphics solution, Rys hasn't quite got how the AA may work right, IMO.

[url=http://www.beyond3d.com/forum/viewtopic.php?p=190448#190448 said:
Sireric[/url]]For non-AA, each chip renders part of the scene, the frame being divided into tiles (called "Supertiling"). Each chip only renders and sees pixels in its tiles.

For 2xAA, 4xAA and 6xAA, it's the same principle.

For higher AA modes (8xaa, 12xaa, 16xaa, 24xaa), chips start rendering the same pixels, but each chip renders a different version of the pixels, as was described above.
Sounds like ATI's solution is entirely for boosting pixel processing speeds, meaning that two cards can be geometry limited just as easily as one. Is this correct?

yes it only increases pixel/shader fillrate
 
Ostsol said:
Sounds like ATI's solution is entirely for boosting pixel processing speeds, meaning that two cards can be geometry limited just as easily as one. Is this correct?

This is a general problem for almost all multi-gpu solutions. There will be a minor improvement in geometry speeds but no where near the pixel improvement.

In order to increase geometry processoring you would need something like the 3DLabs solution with an additional geometry chip OR you need to bin the polygons in the driver. Don't know the performance trade-off with binning in the driver.
 
Back
Top