AMD: R8xx Speculation

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

  • Within 1 or 2 weeks

    Votes: 1 0.6%
  • Within a month

    Votes: 5 3.2%
  • Within couple months

    Votes: 28 18.1%
  • Very late this year

    Votes: 52 33.5%
  • Not until next year

    Votes: 69 44.5%

  • Total voters
    155
  • Poll closed .
But i asked about 192bit (in the part you cut) not 256-bit cause 55nm RV770 has 256-bit and gddr5 needs only 30% more pins than gddr3/4 and it a full new half node since 55nm ;)
The physical I/O for DDR interfacing can't shrink with successive process nodes - as driving an interface is mostly about using lots of power and lots of shielding to maintain signal integrity.

As for 192- versus 256-bits, I think there might be something in it. It could be possible that the die is too small for 256-bits + sideport, but big enough for 192-bits + sideport.

But right now I'm thinking that using chip perimeter as the constraint for quantity of I/O may be inaccurate. It may turn out that double-layering of PCI Express and sideport is enough with no need to double-layer DDR.

Jawed
 
Charlie's heard rumblings

http://www.semiaccurate.com/forums/showpost.php?p=1122&postcount=10

There are rumblings of packaging problems on Cypress, but I can't confirm or deny them yet.
Being conservative, let's assume that AMD is only going to introduce MCM gradually. Presumably the right place to do this is in the top SKUs, what have been X2 cards.

So Cypress would be MCM. It would consist of 2x Redwood. Below Redwood (300mm²?) would be Juniper (181mm²) then Cedar (120mm²?) and some 64-bit? runt called Hemlock (is that a joke?).

Jawed
 
http://www.semiaccurate.com/forums/showpost.php?p=1122&postcount=10


Being conservative, let's assume that AMD is only going to introduce MCM gradually. Presumably the right place to do this is in the top SKUs, what have been X2 cards.

So Cypress would be MCM. It would consist of 2x Redwood. Below Redwood (300mm²?) would be Juniper (181mm²) then Cedar (120mm²?) and some 64-bit? runt called Hemlock (is that a joke?).

Jawed


Or Cypress is 2x2xJuniper. Redwood is 2xJuniper on one package. Juniper is the 180mm² Chip.
 
These MCM rumors are most likely false and so out of this world. Why do people even bother speculate? They don't make any sense what-so-ever.

1. Ati had a smash hit with RV770 and it is a well balanced architecture, so you can bet their follow-up will be of similar architecture and not these silly MCM rumors. If it ain't broken, why fix it?

2. Side-port was never used, so again, if something is not absolutely necessary for high performance it will be removed from the die. Why spend millions of trannies on something you wont use and make the product more expensive than it has to be?

An educated guess of the next performance chip would be 1280 – 1600 shaders with 64 – 80 TMU's and 32 ROP's. 256-bit GDDR5 of course. If ATI has decided to give nVidia a real kick in the nuts they might go for 1920 shaders and 96 TMU's but such a chip would probably be bandwidth limited unless they go for a wider bus (or has some other tricks up their sleeve) and I'm quite sure they don't want to go that route. Both the single chip cards and the X2's would become more expensive and I think ATI would like to keep the 256-bit bus as it is. Makes most sense to me.
 
2. Side-port was never used, so again, if something is not absolutely necessary for high performance it will be removed from the die. Why spend millions of trannies on something you wont use and make the product more expensive than it has to be?

Extrapolating based on that sole data point(RV770's Sideport drama) may prove to be suboptimal, IMHO.
 
As geometry gets more complex (tessellation :p) it becomes progressively more and more expensive to have to re-create vertices because PTVC missed.

Sure, but my question was whether this increase in complexity actually leads to an increase in probability of reuse after being evicted from cache. I don't see why there would be.

Now a vertex can "appear" as a result of TS, which means that pre-processing through VS and HS had to occur - both of these are "low-frequency" (e.g. 1/10th or 1/100th). After appearing in TS it then goes through DS and GS. DS is reasonably costly as all attributes have to be, at minimum, interpolated - otherwise I guess DS functions as the main VS. GS can be pass-through - I'm not sure what other things one would do with GS after tessellating.

Is that how it works? I thought vertices were output by the domain shader and don't exist before then. The VS just manipulates control points of to-be-tessellated patches no?
 
Haven't all modern Capcom console->PC ports run better on NV hardware whilst featuring the TWIMTBP logo? I certainly remember that to be the case with Lost Planet anyway.

Don't know what Capcom is doing but their DX10 implementation obviously loves Nvidia's architecture. In DX9 AMD is mixing it up but once DX10 is turned on Nvidia gets a boost while AMD loses nearly half its performance.

http://www.pcgameshardware.com/aid,690413/Resident-Evil-5-Graphics-cards-benchmarks/Practice/
 
Sure, but my question was whether this increase in complexity actually leads to an increase in probability of reuse after being evicted from cache. I don't see why there would be.
If tessellation of a patch results in 120 vertices, but the PTVC holds less than that, then there could be a problem - I don't know how the vertices are ordered (strips?) and PTVC size is one of those "hidden" facts about hardware, like L1 texture size.

Is that how it works? I thought vertices were output by the domain shader and don't exist before then. The VS just manipulates control points of to-be-tessellated patches no?
Yes, VS manipulates control points for the patch.

Strictly speaking TS amplifies a patch to produce the entire set of coordinates for all the vertices according to the tessellation factors it's given, "naked vertices" in effect.

DS marries those coordinates with the control points and attribute data for the patch, to generate fully-functional vertices. DS is effectively interpolating the attributes across the naked vertices, according to the control points for the patch.

Jawed
 
Allow the layman here to ask a dumb question: isn't it typical that IHVs in general introduce with each new technology generation all new DX requirements and have the luxury over time to tweak efficiency as well as performance for those?

One of the future headaches IHVs might face with advanced tesselation or else a gazillion of miniscule triangles could be multisampling efficiency (compared to supersampling) to suffer quite a bit. One of the recent NV patents showed that they're probably researching in that direction also.

But for the next say 2-3 years after the X11 GPUs appear I can't imagine that games will torture those that much that they'll run into serious problems.
 
\me is all for a Tessellation-Slider in upcoming games, which, unlike current GPU-Physx implementations, lets you gracefully choose between a T-factor of 1 to 64. :)
 
\me is all for a Tessellation-Slider in upcoming games, which, unlike current GPU-Physx implementations, lets you gracefully choose between a T-factor of 1 to 64. :)


...and the driver taking the liberty to shut off AA past T-factor 32 *snicker*
 
Well, we don't want to puzzle consumers too much now, don't we? So it's easier, if the driver revises their decisions, when they're not for the consumers own good. ;)
 
If tessellation of a patch results in 120 vertices, but the PTVC holds less than that, then there could be a problem - I don't know how the vertices are ordered (strips?) and PTVC size is one of those "hidden" facts about hardware, like L1 texture size.

I thought strips were the "naive" approach to vertex ordering? I still don't understand why vertex count matters though, it's more the ordering that dictates cache efficiency right? At least that's what I got from Forsyth's paper - http://home.comcast.net/~tom_forsyth/papers/fast_vert_cache_opt.html
 
So throwing the 2 rumors together, we are looking at a two chip family.
RV810, Hemlock, on the bottom and the RV830/840, Juniper/Cedar, on the top.
RV870, Redwood, is made of two RV830/840 MCM(?) and is the largest "single" GPU.
R800, Cypress, is the x2 variant and is made up of two RV870 and is CF on a PCB.
 
Last edited by a moderator:
I thought strips were the "naive" approach to vertex ordering? I still don't understand why vertex count matters though, it's more the ordering that dictates cache efficiency right? At least that's what I got from Forsyth's paper - http://home.comcast.net/~tom_forsyth/papers/fast_vert_cache_opt.html
What is the order of the vertices coming out of TS? Seems likely to be strips ("rasterised" as it were).

If PTVC is 32 but a patch produces hundreds of vertices... What then are the chances of having to re-process the patch multiple times?

An algorithm like his could be built into TS, I guess, with cognisance of the varying tessellation factors that are possible along each edge.

Maybe this doesn't matter because the cost of re-processing a patch is much lower than fetching/shading vertices in a non-tessellated pipeline :???: Not to mention that tessellation, used to tackle LOD, should be a massive win due to general bandwidth savings.

Page 22 here:

http://developer.amd.com/gpu_assets/Real-Time_Tessellation_on_GPU.pdf

references PTVC but appears to be talking about cache space being taken with vertex attribute data (not just vertex index data), I think. Seems to suggest PTVC is working OK regardless.

Other things I've seen vaguely seem to suggest that tessellation, in producing "localised, intense" meshes will generally have good PTVC performance. I'm not convinced because it seems they're just going to come out in strips. So, ahem, that's better than lots of disparate patches of triangles in random order, but it's nothing like optimal for a mesh.

So I really don't know, one way or another, which is why I'm pondering it.

Jawed
 
Last edited by a moderator:
Back
Top