AMD: R8xx Speculation

Jawed · Jul 21, 2009

keritto said:
But i asked about 192bit (in the part you cut) not 256-bit cause 55nm RV770 has 256-bit and gddr5 needs only 30% more pins than gddr3/4 and it a full new half node since 55nm

The physical I/O for DDR interfacing can't shrink with successive process nodes - as driving an interface is mostly about using lots of power and lots of shielding to maintain signal integrity.

As for 192- versus 256-bits, I think there might be something in it. It could be possible that the die is too small for 256-bits + sideport, but big enough for 192-bits + sideport.

But right now I'm thinking that using chip perimeter as the constraint for quantity of I/O may be inaccurate. It may turn out that double-layering of PCI Express and sideport is enough with no need to double-layer DDR.

Jawed

Jawed · Jul 21, 2009

Charlie's heard rumblings

http://www.semiaccurate.com/forums/showpost.php?p=1122&postcount=10

There are rumblings of packaging problems on Cypress, but I can't confirm or deny them yet.

Being conservative, let's assume that AMD is only going to introduce MCM gradually. Presumably the right place to do this is in the top SKUs, what have been X2 cards.

So Cypress would be MCM. It would consist of 2x Redwood. Below Redwood (300mm²?) would be Juniper (181mm²) then Cedar (120mm²?) and some 64-bit? runt called Hemlock (is that a joke?).

Jawed

rpg.314 · Jul 21, 2009

So the packaging problems are true then. Sigh.

neliz · Jul 21, 2009

Jawed said:
runt called Hemlock (is that a joke?).

Jawed

Well.. it is a conifere after all.

seahawk · Jul 21, 2009

Jawed said:
http://www.semiaccurate.com/forums/showpost.php?p=1122&postcount=10

Being conservative, let's assume that AMD is only going to introduce MCM gradually. Presumably the right place to do this is in the top SKUs, what have been X2 cards.

So Cypress would be MCM. It would consist of 2x Redwood. Below Redwood (300mm²?) would be Juniper (181mm²) then Cedar (120mm²?) and some 64-bit? runt called Hemlock (is that a joke?).

Jawed

Or Cypress is 2x2xJuniper. Redwood is 2xJuniper on one package. Juniper is the 180mm² Chip.

Kef · Jul 21, 2009

rpg.314 said:
So the packaging problems are true then. Sigh.

It is a rumor. Nothing is cut in stone ...

Kef · Jul 21, 2009

These MCM rumors are most likely false and so out of this world. Why do people even bother speculate? They don't make any sense what-so-ever.

1. Ati had a smash hit with RV770 and it is a well balanced architecture, so you can bet their follow-up will be of similar architecture and not these silly MCM rumors. If it ain't broken, why fix it?

2. Side-port was never used, so again, if something is not absolutely necessary for high performance it will be removed from the die. Why spend millions of trannies on something you wont use and make the product more expensive than it has to be?

An educated guess of the next performance chip would be 1280 – 1600 shaders with 64 – 80 TMU's and 32 ROP's. 256-bit GDDR5 of course. If ATI has decided to give nVidia a real kick in the nuts they might go for 1920 shaders and 96 TMU's but such a chip would probably be bandwidth limited unless they go for a wider bus (or has some other tricks up their sleeve) and I'm quite sure they don't want to go that route. Both the single chip cards and the X2's would become more expensive and I think ATI would like to keep the 256-bit bus as it is. Makes most sense to me.

AlexV · Jul 21, 2009

Kef said:
2. Side-port was never used, so again, if something is not absolutely necessary for high performance it will be removed from the die. Why spend millions of trannies on something you wont use and make the product more expensive than it has to be?

Extrapolating based on that sole data point(RV770's Sideport drama) may prove to be suboptimal, IMHO.

Kef · Jul 21, 2009

AlexV said:
Extrapolating based on that sole data point(RV770's Sideport drama) may prove to be suboptimal, IMHO.

Maybe. :smile:

trinibwoy · Jul 21, 2009

Jawed said:
As geometry gets more complex (tessellation ) it becomes progressively more and more expensive to have to re-create vertices because PTVC missed.

Sure, but my question was whether this increase in complexity actually leads to an increase in probability of reuse after being evicted from cache. I don't see why there would be.

Now a vertex can "appear" as a result of TS, which means that pre-processing through VS and HS had to occur - both of these are "low-frequency" (e.g. 1/10th or 1/100th). After appearing in TS it then goes through DS and GS. DS is reasonably costly as all attributes have to be, at minimum, interpolated - otherwise I guess DS functions as the main VS. GS can be pass-through - I'm not sure what other things one would do with GS after tessellating.

Is that how it works? I thought vertices were output by the domain shader and don't exist before then. The VS just manipulates control points of to-be-tessellated patches no?

trinibwoy · Jul 21, 2009

ShaidarHaran said:
Haven't all modern Capcom console->PC ports run better on NV hardware whilst featuring the TWIMTBP logo? I certainly remember that to be the case with Lost Planet anyway.

Don't know what Capcom is doing but their DX10 implementation obviously loves Nvidia's architecture. In DX9 AMD is mixing it up but once DX10 is turned on Nvidia gets a boost while AMD loses nearly half its performance.

http://www.pcgameshardware.com/aid,690413/Resident-Evil-5-Graphics-cards-benchmarks/Practice/

Jawed · Jul 21, 2009

trinibwoy said:
Sure, but my question was whether this increase in complexity actually leads to an increase in probability of reuse after being evicted from cache. I don't see why there would be.

If tessellation of a patch results in 120 vertices, but the PTVC holds less than that, then there could be a problem - I don't know how the vertices are ordered (strips?) and PTVC size is one of those "hidden" facts about hardware, like L1 texture size.

Is that how it works? I thought vertices were output by the domain shader and don't exist before then. The VS just manipulates control points of to-be-tessellated patches no?

Yes, VS manipulates control points for the patch.

Strictly speaking TS amplifies a patch to produce the entire set of coordinates for all the vertices according to the tessellation factors it's given, "naked vertices" in effect.

DS marries those coordinates with the control points and attribute data for the patch, to generate fully-functional vertices. DS is effectively interpolating the attributes across the naked vertices, according to the control points for the patch.

Jawed

Ailuros · Jul 21, 2009

Allow the layman here to ask a dumb question: isn't it typical that IHVs in general introduce with each new technology generation all new DX requirements and have the luxury over time to tweak efficiency as well as performance for those?

One of the future headaches IHVs might face with advanced tesselation or else a gazillion of miniscule triangles could be multisampling efficiency (compared to supersampling) to suffer quite a bit. One of the recent NV patents showed that they're probably researching in that direction also.

But for the next say 2-3 years after the X11 GPUs appear I can't imagine that games will torture those that much that they'll run into serious problems.

CarstenS · Jul 21, 2009

\me is all for a Tessellation-Slider in upcoming games, which, unlike current GPU-Physx implementations, lets you gracefully choose between a T-factor of 1 to 64.

Ailuros · Jul 21, 2009

CarstenS said:
\me is all for a Tessellation-Slider in upcoming games, which, unlike current GPU-Physx implementations, lets you gracefully choose between a T-factor of 1 to 64.

...and the driver taking the liberty to shut off AA past T-factor 32 *snicker*

CarstenS · Jul 21, 2009

Well, we don't want to puzzle consumers too much now, don't we? So it's easier, if the driver revises their decisions, when they're not for the consumers own good.

OpenGL guy · Jul 21, 2009

Ailuros said:
...and the driver taking the liberty to shut off AA past T-factor 32 *snicker*

Increased tessellation won't decrease the need for antialiasing.

trinibwoy · Jul 21, 2009

Jawed said:
If tessellation of a patch results in 120 vertices, but the PTVC holds less than that, then there could be a problem - I don't know how the vertices are ordered (strips?) and PTVC size is one of those "hidden" facts about hardware, like L1 texture size.

I thought strips were the "naive" approach to vertex ordering? I still don't understand why vertex count matters though, it's more the ordering that dictates cache efficiency right? At least that's what I got from Forsyth's paper - http://home.comcast.net/~tom_forsyth/papers/fast_vert_cache_opt.html

LordEC911 · Jul 22, 2009

So throwing the 2 rumors together, we are looking at a two chip family.
RV810, Hemlock, on the bottom and the RV830/840, Juniper/Cedar, on the top.
RV870, Redwood, is made of two RV830/840 MCM(?) and is the largest "single" GPU.
R800, Cypress, is the x2 variant and is made up of two RV870 and is CF on a PCB.

Jawed · Jul 22, 2009

trinibwoy said:
I thought strips were the "naive" approach to vertex ordering? I still don't understand why vertex count matters though, it's more the ordering that dictates cache efficiency right? At least that's what I got from Forsyth's paper - http://home.comcast.net/~tom_forsyth/papers/fast_vert_cache_opt.html

What is the order of the vertices coming out of TS? Seems likely to be strips ("rasterised" as it were).

If PTVC is 32 but a patch produces hundreds of vertices... What then are the chances of having to re-process the patch multiple times?

An algorithm like his could be built into TS, I guess, with cognisance of the varying tessellation factors that are possible along each edge.

Maybe this doesn't matter because the cost of re-processing a patch is much lower than fetching/shading vertices in a non-tessellated pipeline :???:

Not to mention that tessellation, used to tackle LOD, should be a massive win due to general bandwidth savings.

Page 22 here:

http://developer.amd.com/gpu_assets/Real-Time_Tessellation_on_GPU.pdf

references PTVC but appears to be talking about cache space being taken with vertex attribute data (not just vertex index data), I think. Seems to suggest PTVC is working OK regardless.

Other things I've seen vaguely seem to suggest that tessellation, in producing "localised, intense" meshes will generally have good PTVC performance. I'm not convinced because it seems they're just going to come out in strips. So, ahem, that's better than lots of disparate patches of triangles in random order, but it's nothing like optimal for a mesh.

So I really don't know, one way or another, which is why I'm pondering it.

Jawed

AMD: R8xx Speculation

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

Within 1 or 2 weeks

Within a month

Within couple months

Very late this year

Not until next year

Jawed

Jawed

rpg.314

neliz

GIGABYTE Man

seahawk

Kef

Kef

AlexV

Heteroscedasticitate

Kef

trinibwoy

Meh

trinibwoy

Meh

Jawed

Ailuros

Epsilon plus three

CarstenS

Moderator

Ailuros

Epsilon plus three

CarstenS

Moderator

OpenGL guy

trinibwoy

Meh

LordEC911

Jawed

Similar threads