AMD: R8xx Speculation

Sound_Card · Sep 15, 2009

DegustatoR said:
Can a 128-bit GDDR5 card be faster than 256-bit GDDR5 card on average?.

Maybe. We don't know. I would say it could beat depending on clocks a 256bit GDDR3 part no doubt. I do think their is way more merit to be seen with that $199 price which is higher then the HD 4870. Unless ATi is charging a service fee for DX11, I would reckon it's faster then RV770.

nAo · Sep 15, 2009

Jawed said:
There you go then, as wasteful as I expected. If the DP2/3 wastage is no longer occurring then that means I can stop hunting through the assembly for them and striking off wasted lanes when determining utilisation. Still have to go hunting for NOPs and useless MOVs...

Just because their arch is designed for high ILP it doesn't mean they cannot transform their dot products into multi-cycle MUL, MAD..MAD sequences, potentially freeing up unused lanes. On the other hand they might have enough free lanes anyway in the average case that perhaps it's not worth the hassle to implement such an 'optimization'.

MfA · Sep 15, 2009

psolord said:
Do you think that they would actually ship a card with two 8pin connectors? And even then, wouldn't that mean there would be no way to overclock this baby? I mean the Pci-e+8pin+8pin=75W+150W+150W=375W, which is exactly the card's TDP!

If you are overclocking why do you care about design standard limits? So you drop another .1 volt over the wires and get them a little warmer ... care? It's not like there are circuit breakers in them.

Jawed · Sep 15, 2009

nAo said:
Little misunderstanding: I was talking about values interpolated in the pixel shader.
IIRC the fixed function tessellator doesn't directly generate new vertex positions per se (though it generates the final topology information..), it 'only' computes their position in the 'patch space', the domain shader will then compute the real position in space (plus pos displacement, normal, uv coords, etc..)

Thanks, yeah, I've been referring to "naked vertices" as these attribute-less points produced by TS, but not being clear on the fact that they're in patch space that can't be used as input to Setup, which is why DS converts back to the application's space and then clothes them in attributes.

Do you agree that the determination of the patch space location of a new vertex requires interpolation using basically the same kind of interpolation technique as that used in mapping attributes to pixels on a primitive?

I'm simply suggesting that if there is no longer an SPI unit then the ALU method of interpolation is usable not only for attribute interpolation for fragments but also for naked vertex position in patch space.

Jawed

no-X · Sep 15, 2009

DegustatoR said:
Can a 128-bit GDDR5 card be faster than 256-bit GDDR5 card on average?.

If the bandwidth of the 256bit model isn't fully utilized, then yes.

HD4870 is only 20% faster than HD4770. HD4770 uses 3200MHz GDDR5. 3800MHz GDDR5 would provide 20% more bandwidth - more than enough to provide performance equal to HD4870.

3dilettante · Sep 15, 2009

300 Watts is 300 Watts.
It certainly is possible to make a product that draws more.
Selling it with the implication that it is supposed to work in a certified PCI-E slot is another can of worms.

I think it would be easier and less of a PR black eye to skim off the thermally best chips for an X2 SKU that wouldn't have much volume.

The 190W for Cypress probably includes some pretty wide guard bands that properly binned top chips can fit well within, perhaps with reduced voltage or clocks.

rpg.314 · Sep 15, 2009

Jawed said:
Variable data amplification? A bit like varying triangle sizes result in varying numbers of fragments?

Yes, that's what I had in mind.

And a variable count of attributes per vertex, which results in a variable number of interpolations per fragment?

{Extrapolating from opengl} For a VBO/VAO (or their equivalent in dx), isn't the number of attributes per vertex fixed per draw call at the minimum?

Jawed · Sep 15, 2009

nAo said:
Just because their arch is designed for high ILP it doesn't mean they cannot transform their dot products into multi-cycle MUL, MAD..MAD sequences, potentially freeing up unused lanes.

That increases the latency of the DP, which will generally increase wasted lanes as that generally increases the latency of the entire shader.

On the other hand they might have enough free lanes anyway in the average case that perhaps it's not worth the hassle to implement such an 'optimization'.

Yes, you could argue that there's typically 1-1.5 lanes free per clock in pixel shading. I honestly haven't spent much time on vertex shaders to have a good idea what happens there.

Overall, of course it's possible that the freed lanes can be used - it just seems relatively unlikely.

Jawed

Jawed · Sep 15, 2009

no-X said:
HD4870 is only 20% faster than HD4770.

Really? HD4870 is normally considered ~30% faster than HD4850 with 4xMSAA. While HD4770 occasionally exceeds HD4850, it's normally a bit slower.

I'd say HD4870 is typically 40% faster than HD4770 with a range of 20-45% at 4xMSAA and much higher if 8xMSAA is used. That's very much a guesstimate based on old reviews from April. Has HD4770 performance increased dramatically with drivers?

Jawed

DegustatoR · Sep 15, 2009

no-X said:
If the bandwidth of the 256bit model isn't fully utilized, then yes.

HD4870 is only 20% faster than HD4770. HD4770 uses 3200MHz GDDR5. 3800MHz GDDR5 would provide 20% more bandwidth - more than enough to provide performance equal to HD4870.

Equal - yes, it can. But you can buy 4870 for as low as $130 now which gives Juniper XT a hefty $70 premium for DX11 support. That's a bit too much of a premuim from my point of view.

Jawed · Sep 15, 2009

rpg.314 said:
{Extrapolating from opengl} For a VBO/VAO (or their equivalent in dx), isn't the number of attributes per vertex fixed per draw call at the minimum?

It's fixed per draw call but the actual count of attributes per vertex can be as high as 16 (vec4) in D3D10, I believe. 10.1 and 11 support 32 if I remember right.

Jawed

Psycho · Sep 15, 2009

psolord said:
Fudzilla says that the 5870X2 will have a TDP of 376W and that ATI is trying to lower that.
Well if you double the 5870's TDP that's what you get!

I'm pretty sure he's just making up numbers. You can't just double the TDP of the single gpu card (for reference: 4870 160w (seems too low - 4890 is set to 190 and using about the same), 4870x2 286w). They can use better chips, the board power will probably be lower etc.

Even more funny is this: http://www.fudzilla.com/content/view/15508/1/
Linking the fake graphs based on fudo's own numbers - but at least he can claim to be right that way

Rangers · Sep 15, 2009

mczak said:
Well in some titles you can clearly see the GTX 295 not having enough vram (scaling worse going from 1920x1200 8xAA to 2560x1600 than what you'd expect), if you'd disregard these results it wouldn't be too bad. At least CoD: Waw, Devil May Cry 4, Riddick, Wolfenstein are obviously affected by this, there could be more. Still, the HD5870 sure looks good.

I see the same trend with 285 v. 5850? What's it's excuse?

Psycho · Sep 15, 2009

Rangers said:
I see the same trend with 285 v. 5850? What's it's excuse?

Seems like ATI has better memory management and/or compression.
I remember the trend from the 285sli/295 launch reviews. Ie same game at many resolutions and aa levels where the 4870x2 is behind until a certain point where the 295 drops and at the next step the 285sli drops. Ofcourse it's only in a few games you can see them both drop at different times. IIRC it was most clear in either the computerbase or pcgh review.

psolord · Sep 15, 2009

Sxotty said:
Do you know what a 295 is? I fail to see how you would be surprised that half the GPUs on a smaller process would use less power even if they have more transistors per GPU.

Did I expressed my surprise there? I just stated the facts!

MfA said:
If you are overclocking why do you care about design standard limits? So you drop another .1 volt over the wires and get them a little warmer ... care? It's not like there are circuit breakers in them.

I don't know how to do that!

Psycho said:
I'm pretty sure he's just making up numbers. You can't just double the TDP of the single gpu card (for reference: 4870 160w (seems too low - 4890 is set to 190 and using about the same), 4870x2 286w). They can use better chips, the board power will probably be lower etc.

Even more funny is this: http://www.fudzilla.com/content/view/15508/1/
Linking the fake graphs based on fudo's own numbers - but at least he can claim to be right that way

We'll see then. Thanks!:smile:

Dave Baumann · Sep 15, 2009

Psycho said:
Even more funny is this: http://www.fudzilla.com/content/view/15508/1/
Linking the fake graphs based on fudo's own numbers - but at least he can claim to be right that way

Actually the claim shifted between then and now. The orginal claim was RV770, now its moved to RV790...

MfA · Sep 15, 2009

psolord said:
I don't know how to do that!

There is nothing to do for it, if you OC the card and it pulls more current through the wires than what 6-pin power cables are rated that is exactly what happens. The system doesn't just magically stop working if you cross the line in the sand.

rpg.314 · Sep 15, 2009

Jawed said:
It's fixed per draw call but the actual count of attributes per vertex can be as high as 16 (vec4) in D3D10, I believe. 10.1 and 11 support 32 if I remember right.

Jawed

As long as the attributes per vertex is fixed, (for one draw call) does it matter what is the upper limit?

Sxotty · Sep 15, 2009

psolord said:
Did I expressed my surprise there? I just stated the facts!

You were comparing 2 gtx 295s you said to 1 5870x2. It just seemed silly to get all excited that the TDP was less. It is obvious it would be less so what was your point?

Bah whatever, I don't care about this anyway. *twiddles thumbs waiting for the 22nd*

no-X · Sep 15, 2009

Jawed said:
Really? HD4870 is normally considered ~30% faster than HD4850 with 4xMSAA. While HD4770 occasionally exceeds HD4850, it's normally a bit slower.

I'd say HD4870 is typically 40% faster than HD4770 with a range of 20-45% at 4xMSAA and much higher if 8xMSAA is used. That's very much a guesstimate based on old reviews from April. Has HD4770 performance increased dramatically with drivers?

Jawed

I compared 512MB models to avoid influence of different VRAM capacity (1680*1050, AA4x/AF16x, computerbase performancerating). HD4870 1GB is significantly faster than HD4770 512MB (up-to 40%) but that's due its VRAM capacity, not its bandwidth... HD4770 1GB results would be very interesting, I think it could outperform HD4850 512MB.

AMD: R8xx Speculation

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

Within 1 or 2 weeks

Within a month

Within couple months

Very late this year

Not until next year

Sound_Card

nAo

Nutella Nutellae

MfA

Jawed

no-X

3dilettante

rpg.314

Jawed

Jawed

DegustatoR

Jawed

Psycho

Rangers

Psycho

psolord

Dave Baumann

Gamerscore Wh...

MfA

rpg.314

Sxotty

no-X

Similar threads