AMD: R8xx Speculation

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

  • Within 1 or 2 weeks

    Votes: 1 0.6%
  • Within a month

    Votes: 5 3.2%
  • Within couple months

    Votes: 28 18.1%
  • Very late this year

    Votes: 52 33.5%
  • Not until next year

    Votes: 69 44.5%

  • Total voters
    155
  • Poll closed .
It seem one of the problems of a "Dual-Core" GPU or "Two Core on one Die" GPU is the sorting of the Triangles. IMG seems to have solved this problem with the SGX543 MPxx with only a very small penalty regarding performance and die-area.

In 2003 "Delay-Streams" were developed (Link: http://www.cs.lth.se/EDA075/lectures/delaystreams.ppt ) to improve the vertex sorting of normal IMR GPUs.

Would such a system still be viable? Or are systems like HyperZ, HirZ etc.. as effective?

If such a system would still be viable, could it improve the sorting of traingles between two cores too?
 
Would a vertikal SFR-split be possible? You'd still be having the triangle-crossing-problem of course but at least the inherently uneven load distribution between top-half (50% skybox) and bottom-half of the screen would be alleviated.

OTOH, you could go 4-way and doing some screen-space tiling à la Xbox360.
 
AFAIK, SLI in SFR automatically adjusts the splitting line to balance loads. But it's still slower than AFR, probably an inherent SFR problem.
 
Would a vertikal SFR-split be possible? You'd still be having the triangle-crossing-problem of course but at least the inherently uneven load distribution between top-half (50% skybox) and bottom-half of the screen would be alleviated.
Overlapping really isn't a problem to begin with ... if Larrabee can handle it with it's tiny tiles why would anyone think it to be a problem for parallel rendering on 2 graphics cards?

There are 2 big problems with usable non AFR parallel rendering (SFR or tiled partitions with doubled vertex load is not usable).

- Bandwidth for sharing dynamic textures and vertices.
- Maintaining rasterization in the application given order.

The first is simple to solve, it will just take a lot of pins. The second can be solved in two ways.

You can do the vertex pipeline on one card. Obviously if the vertex load exceeds 50% this is going to be inefficient, but even besides that it makes it extremely hard to balance the pixel load between the cards. Using only the known vertex/pixel load from the last frame will obviously be inefficient, but adjusting the pixel load dynamically during the rendering of a single frame will be extremely difficult (unless the sideport bandwidth is so huge you can just afford to render to the remote framebuffer).

You can add a sequence number to tris and not rasterize one till all previous ones have started rasterization (easier said than done). This way you can parallize the vertex pipeline and setup between the cards, so if you just use tiling to balance the pixel load it becomes easy to maintain balance (even relatively large tiles like say 512x512 sub-pixel samples should be enough).

Deferred rendering makes parallelization so much easier ... damn immediate mode rasterization makes everything hard.
 
Last edited by a moderator:
If it is indeed an MCM, then may be it is workable. Otherwise we are seeing the sweetspot strategy blow up in it's face. As the die sizes have increased from 192->260->(300+x) mm2.
 
Would parceling work out on the level of patches help?
If some conservative bounds can be placed on the output of a set of control points, a lot of patches can trivially be rejected out of one or the other GPU's working set.
Given the level of amplification, that can lead to a lot of triangles filtered out.

Perhaps an extra line check in the new tesselation stages for an additional arbitrary reject of generated geometry?
 
if Larrabee can handle it with it's tiny tiles why would anyone think it to be a problem for parallel rendering on 2 graphics cards?

I agree, I think this answers all the questions concerning the viability of multiple die rendering, including proper display order. If you can do it properly, with acceptable performance in multiple small tiles, then you can do it properly in much larger tiles in something akin to SFR on multiple dies.
 
Although you can scale 3d graphics across multiple chips using such techniques as AFR and SFR, I am much more interested in CPU style scaling where the chip scaling applies to general purpose HPC rather than a single purpose 3d graphics algorithm. Larrabee, and perhaps GT300, will be in a position to provide this. I wonder what AMD plans to do in this space.
 
I agree, I think this answers all the questions concerning the viability of multiple die rendering, including proper display order. If you can do it properly, with acceptable performance in multiple small tiles, then you can do it properly in much larger tiles in something akin to SFR on multiple dies.

It's not simply tiling. Larrabee explicitly maintains triangle order by binning/tagging ALL geometry before processing a single tile.

In the front-end, each PrimSet is given a sequence ID to identify where in the rendering stream it was submitted. This is used by the back-end to ensure correct ordering....Once all front-end processing for the RTset has finished and every triangle has been added to the bin for each tile that it touched, back-end processing is performed. Here, each tile is assigned to a single core, which shades each triangle from the associated bin.
 
The big advantage of general purpose chip scaling is that it provides you with the same software transparent scaling that you get when creating a larger-faster chip, so that all applications and games will achieve the benefit from the scaled performance.
 
Would parceling work out on the level of patches help?
Only with the help of developers ... displacements make analysis of the transform shaders a lost cause IMO.

It would be really nice if each draw call came with a bounding box ... but unfortunately they don't.
 
It's not simply tiling. Larrabee explicitly maintains triangle order by binning/tagging ALL geometry before processing a single tile.
You can do this too with immediate mode rendering (maintain triangle ordering that is). It will just require more buffering than normal, I imagine overflowing to external memory will be quite common.
 
Last edited by a moderator:
You can do this too with immediate mode rendering ... it will just require more buffering ... I imagine overflowing to external memory will be quite common.

Yep, was just pointing out that you would need to do more geometry pre-processing upfront like Larrabee does in order to make this work. And of course it all depends on where rasterization falls in the pipeline. Do shaders process bins and submit them to the rasterizer as a service or is it all fixed function stuff happening before the rasterizer? It just seems much easier on Larrabee due to a lot less rigidity.
 
Back
Top