AMD: R8xx Speculation

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

  • Within 1 or 2 weeks

    Votes: 1 0.6%
  • Within a month

    Votes: 5 3.2%
  • Within couple months

    Votes: 28 18.1%
  • Very late this year

    Votes: 52 33.5%
  • Not until next year

    Votes: 69 44.5%

  • Total voters
    155
  • Poll closed .
I see. Still tough to say how much difference there is between generations without knowing what those extra trannies count for though, wouldn't you say?

I haven't counted how often in the past I read in different fora how close supposedly RV7x0 is to X11 and how little effort they'd hypothetically need for full X11 compliance. Au contraire it seems that the road to reach X11 wasn't anything but small or mediocre. In fact some of the ancient "RV870" rumors were as wild to claim something ~200mm2@40nm.

My reasoning back then and today is fairly simple: IHV (as any IHV) typically targets up to twice the performance of predecessor with each new generation and of course anything to reach X11 compliance. Anything in between 250-300mm2@40nm sounded way more reasonable to accomodate all the fore mentioned for a performance chip without having to sacrifice anything in terms of performance.

I'm merely reacting now to some of the notions that AMD's X11 generation will be nothing more than a RV7x0 with say a hull + domain shader slapped on for instance.
 
Do you guys really think they are doing an MCM, instead of packaging two dice in the same PCB?
It's been rumored as being around the corner for years, so I'll believe it when I see it.
It might have some advantages, though.
There wouldn't be as many if it's still crossfire on a stick, but if an MCM enables a high enough bandwidth between the chips so that they can behave more like a single unit, that would mean shaving off redundant RAM, the extra PCI bridge chip, and possibly some of the extra space that is inserted between the packages on the PCB.
Whenever I see pics of GPU boards, the flip-chip packages always seem to have a healthy bit of empty space around them, although that could be for routing or cooling reasons.

Dropping all those other things might make board partners happy. The boards would be smaller. I don't know if they'd save any layers on the PCB, if a lot of high-frequency routing stays in the package, or if there's some cost to having a more complex package that places demands on the board.

Conceptually, though, the more the two-chip solution appears to be a single (if slightly larger package) chip to the outside world, the less complicated the board would need to be.

Power circuitry would still need to be double-strength, although there might be slightly more margin since it would have dispensed with all the duplicate RAM and bridge chip.
 
Forgot? Was it confirmed previously? SFR would imply they solved the geometry scaling and memory sharing problems.

Is it that bad anymore?
Geometry scaling is vertex shader performance (hardly an issue anymore), geometry shader performance (which isn't really a performance parameter yet) and the 1 tri/cycle triangle setup, which could become a major bottleneck. However, if SFR-readyness is part of your design, it shouldn't be hard to discard the triangles faster making it a lesser issue (mainly thinking along the lines of a split-screen approach, supertiling would be worse ofcourse)
And memory sharing shouldn't be that bad - in the graphics pipeline you really only have to read from the same areas, not write.
 
I still believe that-----

RV870= 1600 SP+ 80 TMU + 32 Rop

RV830= 800 SP+ 40 TMU+16 Rop

Then the rumored 300 sqmm2 might even be too conservative after all.

By the way I'm not concerned about SFR at all; I'd be rather concerned about raw bandwidth with those kind of fillrate increases especially considering the pixel fillrates (higher core frequencies included).
 
Is it that bad anymore?
Geometry scaling is vertex shader performance (hardly an issue anymore), geometry shader performance (which isn't really a performance parameter yet) and the 1 tri/cycle triangle setup, which could become a major bottleneck. However, if SFR-readyness is part of your design, it shouldn't be hard to discard the triangles faster making it a lesser issue (mainly thinking along the lines of a split-screen approach, supertiling would be worse ofcourse)

Maybe, but I recall that it's not trivial to simply cull geometry based on the SFR split. Can't remember the details now though.

And memory sharing shouldn't be that bad - in the graphics pipeline you really only have to read from the same areas, not write.

Sure, but those writes will be to the same render target in most cases. You then need to compose the pieces and write the full buffer into the other "slave" memory pool for use in subsequent passes.
 
Maybe I'm way out on the loony side but wouldn't multi-GPU be better used for MRT's (obviously ones without data dependency)? For example, one card completely dedicated to Normal Mapping then feed the resultant MRT to a dedicated off-GPU register combiner slash Frame Buffer... at least ID Tech 4 would approve... bah, I guess that would be too specialized for wide adoption...
 
Maybe I'm way out on the loony side but wouldn't multi-GPU be better used for MRT's (obviously ones without data dependency)? For example, one card completely dedicated to Normal Mapping then feed the resultant MRT to a dedicated off-GPU register combiner slash Frame Buffer... at least ID Tech 4 would approve... bah, I guess that would be too specialized for wide adoption...

Is that in the best interest of Scaling?
 
Yeah, triangles will regularly cross the split.

Most triangles are pretty small and won't cross - in a splitscreen aproach it will only be a few percent.
The idea would be to have a faster-than-1-tri/cycle cull stage before the regular triangle setup bottleneck. Backface culling alone would help a lot.. But I guess it would have to be the whole way before that too (picking up a new triangle, sending vertices to vertex shader, getting resuls back).
 
Unless the developer helps out you have to do culling at the end of the pipeline, doubling the vertex load.

If there is enough sideport bandwidth to share dynamically rendered textures there should be enough to just do sort middle parallelization ... ie. you don't cull, you sort and just pass the tris to the appropriate chip.
 
Last edited by a moderator:
Could it be possible to implement triangle-based SFR?
Sometimes as you could composite frame buffers, based on depth, at the end of the frame. This doesn't work with blending though. Speculation was Lucid will attempt this in applicable situations.
 
A lot will also depend if they do a 50:50 split screen rendering or the scissor mode. The later would mean a better load distribution but also more overhead.
 
Back
Top