Upcoming ATI Radeon GPUs (45/40nm)

CarstenS · Jul 18, 2008

nicolasb said:
AA was fixed in RV770. Z-fill rate was fixed in RV770. AF may have been fixed too, although that's less clear.

AFAIK AF was never broken - as well as AA/Z-Fill, according to AMD. But that's another story.

fellix · Jul 18, 2008

The depth/stencil rate was simply bumped up, not "fixed".

By the way, concerning the Z/S rate -- how viable is R800 to double it again, matching NV's rate now, or the Z/S Hierarchical buffering is already too good?
Judging from ArchMark numbers, there should be still room for improvement, from the prospect of the bandwidth limitations.

ShaidarHaran · Jul 18, 2008

chavvdarrr said:
DUH!
R600 was severely short on texturing capability, so short that it became bottleneck, so the relatively good shader-speed was unobservant.
Invcreasing Tex power the bottleneck was removed - so now shading power can be better utilized.
And fixed z-speed & AA/AF.
Thats what SH is implying imho.
Anyone claiming R600 has "enough" tex power should take a look at ATi's market&mind share.

Indeed. Thanks for elaborating for me. Dunno why I couldn't explain the situation properly...

ShaidarHaran · Jul 18, 2008

Chris123234 said:
Regardless of how underpowered one aspect of it was, if everything is basically increased in a 1:1 fashion, nothing is actually being "fixed". While it may be coincidence that texturing power is now much more robust, it was not singled out as a main source of "fixing" because both it, and the shading power, has increased at the same rate. I do not understand how that can be so easily construed as fixing.

It would be like saying that the r700 is somehow fixing something in the rv770 because it is now 100% more of both shading and texturing elements.

This has nothing to do with the ALU:TEX ratio being out-of-whack in RV770. It was out-of-whack in RV670 because there wasn't enough texturing hardware to perform the task at hand.

All of you arguing against this are hung up on the fact that the ratio remained unchanged, but the actual capability did not so it is unwise to ignore this. The 16 TMUs ATi had been using since R420 had finally become a bottleneck with RV670 (actually started with RV570, but this was far less obvious and a much better balanced architecture).

Shaders were going unused because there simply were not enough texture units to handle the workload. Don't take my word for it, fire up your favorite D3D app and MS' Pix and analyze for yourself. You can see the ALU:TEX ratio and the amount of time spent per frame on each instruction. If someone has an RV670 and an RV770 this would be blatantly obvious in testing. I'd be happy to do this testing myself but the only video hardware I've got in the house are G92 and RV560-based.

I do have to apologize at this point for starting my comments on the matter off by saying that the ratio was too high, I specfifically meant for RV670. Sorry for not clarifying earlier.

no-X · Jul 18, 2008

If shaders were unused on RV670, they are still unused on RV770, so nothing was fixed from this POV.

Anyway, G94 had twice as many TMUs as RV670, but it performance was about the same, so I don't think RV670 was bottlenecked by it's texturing core.

ShaidarHaran · Jul 18, 2008

no-X said:
If shaders were unused on RV670, they are still unused on RV770, so nothing was fixed from this POV.

There will almost always be idle execution units per clock cycle, usually ALUs. IOW: duh!

You're missing the point though. Let's say the current frame being rendered has a large amount of texture lookups per pixel in the shader core. What happens if you have too few TMUs to meet these requests? Your shader core goes idle because of the texturing bottleneck!

no-X said:
Anyway, G94 had twice as many TMUs as RV670, but it performance was about the same, so I don't think RV670 was bottlenecked by it's texturing core.

Different u-arch's, not a legitimate example for comparison.

Dave Baumann · Jul 18, 2008

ShaidarHaran said:
Shaders were going unused because there simply were not enough texture units to handle the workload.

And here, again, is where it all falls down, because this situation is entirely unchanged in RV770 due to the fact that it has exactly the same number of textures to ALU's; there is just more of all.

Again this line of thought has merit if the texture:ALU ratio went up at different rates, but simply it didn't.

ShaidarHaran said:
You're missing the point though. Let's say the current frame being rendered has a large amount of texture lookups per pixel in the shader core. What happens if you have too few TMUs to meet these requests? Your shader core goes idle because of the texturing bottleneck!

And? This is no different in RV770. But everything has bottlenecks on different workloads. Its easy to find shaders that are bottlenecked to ALU's and textures are underutilized.

ShaidarHaran · Jul 18, 2008

Dave Baumann said:
And here, again, is where it all falls down, because this situation is entirely unchanged in RV770 due to the fact that it has exactly the same number of textures to ALU's; there is just more of all.

Again this line of thought has merit if the texture:ALU ratio went up at different rates, but simply it didn't.

Wavey, you keep coming back to the same argument and I have already stated this is not my position.

If I were a programmer (where's jawed and humus when you need them?) I'd write an app to run on both RV670 and RV770 to demonstrate this fact. All I can do in the meantime is refer back to my suggestion to use Pix.

willardjuice · Jul 18, 2008

ShaidarHaran said:
I believe R7xx is a "correction" to the mistake that was R6xx and it's horrible lack of texturing/z-fill/and AA sample rates.

ShaidarHaran said:
Why then did your engineers increase texture filtering/sampling performance by 250% this generation?

willardjuice said:
Yeah they also increased their shading power by 250% too. Is SH suggesting the R6x0 was "severely short" on shading power?

Well, do you? Based on your logic I would say yes. What you're essentially telling us is that AMD's engineers thought the R6x0 was texture limited and their "correction" to this problem was to increase both shading and texturing power by 250%. I don't know if I get that logic. I think what is more likely was AMD's engineers found a way to optimize their new core so that it could have a 250% improvement of shading and texturing power over their previous generation (and as a side effect, their texture problem was "corrected").

Dave Baumann · Jul 18, 2008

ShaidarHaran said:
If I were a programmer (where's jawed and humus when you need them?) I'd write an app to run on both RV670 and RV770 to demonstrate this fact. All I can do in the meantime is refer back to my suggestion to use Pix.

And, assuming no other bottlenecks, RV770 would show exactly the same untilization but just be faster to the degree that that the overall engine scaled up.

3dilettante · Jul 18, 2008

In terms of hardware units, RV770 is no more texture limited than RV670.
When referenced with actual workloads, however, it appears there is a return on investment curve when it comes to having X number of units of any type.

We can see diminishing returns for adding tons of extra texture units. Nvidia's higher TMU counts didn't translate into linear increases in performance.

On the other hand, it could be argued that R6xx's base level of texturing capability--irrespective of ALU:TEX--was pathologically small compared to the texturing threshold many game workloads would need before other bottlenecks (bandwidth, setup, etc.) started to take precedence.
Diminishing returns can occur with an excess of capability, but escalating costs can be associated with falling below the baseline that many workloads would consider adequate.

In that regard, R600 could be considered too small for some workloads, and RV770 is what happens when a design grows up to match its tasks.

ShaidarHaran · Jul 18, 2008

Dave Baumann said:
And, assuming no other bottlenecks, RV770 would show exactly the same untilization but just be faster to the degree that that the overall engine scaled up.

I disagree wholeheartedly, and were the aforementioned hardware available to me I would be more than happy to prove it.

ShaidarHaran · Jul 18, 2008

3dilettante said:
In terms of hardware units, RV770 is no more texture limited than RV670.
When referenced with actual workloads, however, it appears there is a return on investment curve when it comes to having X number of units of any type.

We can see diminishing returns for adding tons of extra texture units. Nvidia's higher TMU counts didn't translate into linear increases in performance.

On the other hand, it could be argued that R6xx's base level of texturing capability--irrespective of ALU:TEX--was pathologically small compared to the texturing threshold many game worloads would need before other bottlenecks (bandwidth, setup, etc.) started to take precedence.
Diminishing returns can occur with an excess of capability, but escalating costs can be associated with falling below the baseline that many workloads would consider adequate.

Thanks 3d, you've explained this far better than I.

3dilettante said:
In that regard, R600 could be considered too small for some workloads, and RV770 is what happens when a design grows up to match its tasks.

Yes! Perfect summary explanation.

Dave Baumann · Jul 18, 2008

3dilettante said:
In that regard, R600 could be considered too small for some workloads, and RV770 is what happens when a design grows up to match its tasks.

And you've just said no different from: the new generation is better than the prior because it has more engine to cope with the workloads.

ShaidarHaran · Jul 18, 2008

Dave Baumann said:
And you've just said no different from: the new generation is better than the prior because it has more engine to cope with the workloads.

Again, it's about base ability. You're hung up on the ratio because that's all ATi's preached for years now

willardjuice · Jul 18, 2008

In that regard, R600 could be considered too small for some workloads, and RV770 is what happens when a design grows up to match its tasks.

All that means is the RV770 improved performance over R6x0. How is that evidence that AMD specifically made (went out of their way) a "correction" to their "texturing problems"?

ShaidarHaran · Jul 18, 2008

willardjuice said:
All that means is the RV770 improved performance over R6x0. How is that evidence that AMD specifically made (went out of their way) a "correction" to their "texturing problems"?

Why is this topic so hard for some to grasp?

Someone needs to do some testing here, because the answer seems blatantly obvious to me but words obviously aren't enough to convince everyone.

Dave Baumann · Jul 18, 2008

ShaidarHaran said:
Again, it's about base ability. You're hung up on the ratio because that's all ATi's preached for years now

In realtion to R600 the base ability does not change, other than the fact the entire engine has scaled upto a different baseline - this is not a correction in any principles adopted with R600, this is taking advantage of newer processes and engineering to increase the performance of the new architecture, much like other arhictectural generations have before.

willardjuice · Jul 18, 2008

ShaidarHaran said:
Why is this topic so hard for some to grasp?

Someone needs to do some testing here, because the answer seems blatantly obvious to me but words obviously aren't enough to convince everyone.

Explain to me why on the RV770 AMD increased their shading power by the same rate as their texturing power if R6x0's was only limited on texturing power and not shading power.

ShaidarHaran · Jul 18, 2008

Dave Baumann said:
In realtion to R600 the base ability does not change, other than the fact the entire engine has scaled upto a different baseline - this is not a correction in any principles adopted with R600, this is taking advantage of newer processes and engineering to increase the performance of the new architecture, much like other arhictectural generations have before.

What do you mean the base ability doesn't change? 40 > 16, the last time I checked... Also, you're now getting into the reasoning behind said uarch changes, any improvement that addresses a severe bottleneck is a correction. I don't see how it could be defined any other way.

Question: was RV670 tex-bound a majority of the time (excepting for extremely AA-heavy or CFAA scenarios)?
Answer: definitively, resoundingly, yes.
Now apply the same question to RV770 and what happens? We see the opposite. RV770 is far from tex-bound.

Going by your logic (ratio remains the same between the two) this does not fit your framework. When real world results don't conform to your hypothesis, your hypothesis is wrong

Upcoming ATI Radeon GPUs (45/40nm)

CarstenS

Moderator

fellix

ShaidarHaran

hardware monkey

ShaidarHaran

hardware monkey

no-X

ShaidarHaran

hardware monkey

Dave Baumann

Gamerscore Wh...

ShaidarHaran

hardware monkey

willardjuice

super willyjuice

Dave Baumann

Gamerscore Wh...

3dilettante

ShaidarHaran

hardware monkey

ShaidarHaran

hardware monkey

Dave Baumann

Gamerscore Wh...

ShaidarHaran

hardware monkey

willardjuice

super willyjuice

ShaidarHaran

hardware monkey

Dave Baumann

Gamerscore Wh...

willardjuice

super willyjuice

ShaidarHaran

hardware monkey