AMD: R8xx Speculation

no-X · Sep 15, 2009

air_ii said:
You don't account for costs until you sell the stocks (cost of sales). That's not how it works in financial reporting...

I admit, that I'm definetely not a financial expert, but tell me please, why Jen Hsun explained part of their loss by a huge stock of unsold GT200/65nm GPUs, if their costs aren't accounted?

rpg.314 · Sep 15, 2009

Jawed said:
It then assembles vertex coordinates and weights for an interpolation kernel to consume.

This where I disagree. Due the variable data amplification nature of tesselator (among other things), I doubt of they do it in a tesselation kernel. Come to think of it, if this was the case, they could very well perform rasterization using the same setup. The tesselator would generate the interpolation weights that would be used to produce pixels from vertices.

I think we're agreeing, but you're seeing interpolation from the texture filtering point of view, while I'm seeing it from the vertex attribute rasterisation point of view.

No I am seeing vertex attribute interpolation and tesselation as two uses of the same operation, linear interpolation.

Demirug · Sep 15, 2009

It seems there is some confusion how DX11 tessellation works.

The fixed function part of the tesselator only calculates the domain location of each point based on the configuration that is calculated by the hull shader for each patch. The whole interpolation of the vertex attributes need to be done in the domain shader that is called for every calculated domain location onces.

mczak · Sep 15, 2009

Jawed said:
"More flexible dot products" implies to me that a DP3 will no longer waste a lane - currently DP3 is implemented as a DP4 with the fourth lane idle. I've never investigated what happens with the DP2 instructions (I think there's a couple).

There aren't any DP2/DP2A/DP3 instructions at all - all done with DP4. So as you say maybe this has changed to not waste 1 or 2 lanes. Also, DP4 only could write to the .x component, there's room for improvement there too I guess.

Rangers · Sep 15, 2009

5870 schooling a 295 for the most part

5850 punishing 285 might be even more impressive.

Looks like they could use some driver optimization in Crysis though.

Karoshi · Sep 15, 2009

stevem said:
Haven't looked at the specs of the PWM Volterra GDDR5 controller, but no probs with programmable input. Don't know if it's dynamic.

The guys at xtremesystems have identified it. They say it is a known product and should work out-of-the-box with current rivatuner(iirc). There was much happiness and the natives were dancing and drinking around the fire, singing songs of 1GHz plus.

Subtlesnake · Sep 15, 2009

Rangers said:
5870 schooling a 295 for the most part

Only with 8xAA, which has always been ATI's strong suit.

mczak · Sep 15, 2009

Rangers said:
5870 schooling a 295 for the most part

Well in some titles you can clearly see the GTX 295 not having enough vram (scaling worse going from 1920x1200 8xAA to 2560x1600 than what you'd expect), if you'd disregard these results it wouldn't be too bad. At least CoD: Waw, Devil May Cry 4, Riddick, Wolfenstein are obviously affected by this, there could be more. Still, the HD5870 sure looks good.

trinibwoy · Sep 15, 2009

Subtlesnake said:
Only with 8xAA, which has always been ATI's strong suit.

Yep, it's carried over from RV770. I suppose review conclusions are going to be somewhat different based on the games, resolution and settings used.

Sxotty · Sep 15, 2009

trinibwoy said:
But there's no practical value to blindly chasing a "sweet spot" strategy. It all depends on the competitive environment (an argument many have made in the past). Your disappointment is a product of a belief in ATI's generosity Besides, the 5850 is still $299.....

True, but I can tell you I would be far more likely to buy a card if it was 299. No I will happily wait until competition arrives even if it takes till after Xmas. I am not saying their strategy is bad, just that they will lose my contribution until later. That is perfectly fine though they can milk early adopters then lower the price and get the next wave and so on until the rest of the bottom dwellers crawl onboard.

psolord · Sep 15, 2009

Fudzilla says that the 5870X2 will have a TDP of 376W and that ATI is trying to lower that.

Well if you double the 5870's TDP that's what you get!

Are you annoyed by this guys?

The way i see it, two GTX 295s will have 600Watt TDP so there is a significant gain here.

Do you think that they would actually ship a card with two 8pin connectors? And even then, wouldn't that mean there would be no way to overclock this baby? I mean the Pci-e+8pin+8pin=75W+150W+150W=375W, which is exactly the card's TDP!

Do you think that they will be able to keep a low idle consumption on such a card or the minimum we should expect is 2X5870 idle?

Rangers · Sep 15, 2009

Subtlesnake said:
Only with 8xAA, which has always been ATI's strong suit.

hmm, you're right..

Jawed · Sep 15, 2009

Demirug said:
It seems there is some confusion how DX11 tessellation works.

The fixed function part of the tesselator only calculates the domain location of each point based on the configuration that is calculated by the hull shader for each patch. The whole interpolation of the vertex attributes need to be done in the domain shader that is called for every calculated domain location onces.

The "fixed function" tessellator is computing the location of the new vertex (interpolating multiple vertices based on control points and tessellation factors). The debate we're having is over the use of programmable ALUs (as Marco indicated) to perform the interpolation required by TS.

After TS, DS is used to interpolate the original control points' attributes (e.g. normal) for the new naked vertices (which are points only). DS is converting points into vertices by giving the vertices more properties than merely location. DS can also be used to perform displacement mapping, which then alters the computed location of the vertices - but not necessarily by using any kind of interpolation.

Rasterisation then requires a final interpolation of attributes, per pixel, derived solely from the per-vertex attributes.

So, apparently with R800, these three kinds of interpolation are all running on the ALUs. DS is programmable interpolation, but TS and RS require fixed-function interpolation.

So, the question is what is the mechanism by which ATI TS accesses the ALUs to obtain interpolated points for the new vertices. I'm merely suggesting that TS generates two input streams for an IN shader to consume and spit out vertex locations. This is similar to the way that NVidia's MI consumes two streams: the plane equation constants A, B and C (one set of these per primitive), plus attributes, to spit out an interpolated attribute for a pixel.

Similarly, during PS an attribute (e.g. normal) is interpolated on-demand, just like in NVidia. NVidia seemingly does this by generating extra instructions in the compiled kernel. ATI might do this, or might have an IN kernel that takes the place of the SPI unit in older architectures. SPI generates a full set of all required attributes before PS commences.

IN would then have an output buffer which it is allowed to fill. This output buffer would be a primary parameter for scheduling IN, basically presenting PS with a set of on-demand attributes.

Obviously, just guessing here.

Jawed

Sxotty · Sep 15, 2009

psolord said:
Fudzilla says that the 5870X2 will have a TDP of 376W and that ATI is trying to lower that.

Well if you double the 5870's TDP that's what you get!

Are you annoyed by this guys?

The way i see it, two GTX 295s will have 600Watt TDP so there is a significant gain here.

Do you think that they would actually ship a card with two 8pin connectors? And even then, wouldn't that mean there would be no way to overclock this baby? I mean the Pci-e+8pin+8pin=75W+150W+150W=375W, which is exactly the card's TDP!

Do you think that they will be able to keep a low idle consumption on such a card or the minimum we should expect is 2X5870 idle?

Do you know what a 295 is? I fail to see how you would be surprised that half the GPUs on a smaller process would use less power even if they have more transistors per GPU.

Jawed · Sep 15, 2009

rpg.314 said:
This where I disagree. Due the variable data amplification nature of tesselator (among other things), I doubt of they do it in a tesselation kernel.

Variable data amplification? A bit like varying triangle sizes result in varying numbers of fragments? And a variable count of attributes per vertex, which results in a variable number of interpolations per fragment?

Jawed

DegustatoR · Sep 15, 2009

Sxotty said:
True, but I can tell you I would be far more likely to buy a card if it was 299. No I will happily wait until competition arrives even if it takes till after Xmas. I am not saying their strategy is bad, just that they will lose my contribution until later. That is perfectly fine though they can milk early adopters then lower the price and get the next wave and so on until the rest of the bottom dwellers crawl onboard.

The real test for early adopters will come in the form of Juniper which if you believe the rumours will cost $199 -- but will it be faster than RV770?

Sound_Card · Sep 15, 2009

DegustatoR said:
The real test for early adopters will come in the form of Juniper which if you believe the rumours will cost $199 -- but will it be faster than RV770?

Are you saying thats hard to believe?

nAo · Sep 15, 2009

Jawed said:
The "fixed function" tessellator is computing the location of the new vertex (interpolating multiple vertices based on control points and tessellation factors). The debate we're having is over the use of programmable ALUs (as Marco indicated) to perform the interpolation required by TS.

Little misunderstanding: I was talking about values interpolated in the pixel shader.
IIRC the fixed function tessellator doesn't directly generate new vertex positions per se (though it generates the final topology information..), it 'only' computes their position in the 'patch space', the domain shader will then compute the real position in space (plus pos displacement, normal, uv coords, etc..)

Jawed · Sep 15, 2009

mczak said:
There aren't any DP2/DP2A/DP3 instructions at all - all done with DP4.

There you go then, as wasteful as I expected. If the DP2/3 wastage is no longer occurring then that means I can stop hunting through the assembly for them and striking off wasted lanes when determining utilisation. Still have to go hunting for NOPs and useless MOVs...

So as you say maybe this has changed to not waste 1 or 2 lanes. Also, DP4 only could write to the .x component, there's room for improvement there too I guess.

Good idea, that's conceivably interesting, perhaps it's in there too.

Jawed

DegustatoR · Sep 15, 2009

Sound_Card said:
Are you saying thats hard to believe?

Can a 128-bit GDDR5 card be faster than 256-bit GDDR5 card on average? I think it'll depend on what shader core frequencies Juniper XT will have. But in bandwidth limited situations it'll probably be slower anyway.

AMD: R8xx Speculation

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

Within 1 or 2 weeks

Within a month

Within couple months

Very late this year

Not until next year

no-X

rpg.314

Demirug

mczak

Rangers

Karoshi

Subtlesnake

mczak

trinibwoy

Meh

Sxotty

psolord

Rangers

Jawed

Sxotty

Jawed

DegustatoR

Sound_Card

nAo

Nutella Nutellae

Jawed

DegustatoR

Similar threads