Upcoming ATI Radeon GPUs (45/40nm)

3dilettante · Jul 18, 2008

Dave Baumann said:
And you've just said no different from: the new generation is better than the prior because it has more engine to cope with the workloads.

It's only identical if we assume RV770 had to scale up all elements in the engine to the same degree. Because of the constraints of the design, that's pretty much how it went, but I didn't see as much strident criticism about RV670 lacking ALU capacity.

The ROI curves for ALU capacity and texture capability do not match, as G80 and its ilk indicate.

Chris123234 · Jul 18, 2008

ShaidarHaran said:
Why is this topic so hard for some to grasp?

Someone needs to do some testing here, because the answer seems blatantly obvious to me but words obviously aren't enough to convince everyone.

In other words, you're saying that, should an rv770 be cut down (disabled?) to the same number number of units (in a 1:1 fashion, just as it was raised) to the rv670 and set to the same clocks, that the rv770 is going to show some specific improvements due to certain fixes? Or am I misunderstanding something?

Dave Baumann · Jul 18, 2008

3dilettante said:
It's only identical if we assume RV770 had to scale up all elements in the engine to the same degree. Because of the constraints of the design, that's pretty much how it went, but I didn't see as much strident criticism about RV670 lacking ALU capacity.

You're assuming too much about the design constraints there. RV7xx still retains the 2D scalability of RV6xx, so it would have been easy to scale textures at a disproportionate rate to ALU's.

fellix · Jul 18, 2008

TMU quads in RV770 are a sort of scaled down versions of R600's ones (more precisely - with altered priorities), so in that regard there should be no "perfect" scaling across the board, depending on what is being stressed. But don't forget, that there is a significant improvements in the caching hierarchy, so a careful factoring of all these parameters is vital to take into account.

Dave Baumann · Jul 18, 2008

ShaidarHaran said:
What do you mean the base ability doesn't change? 40 > 16, the last time I checked... Also, you're now getting into the reasoning behind said uarch changes, any improvement that addresses a severe bottleneck is a correction. I don't see how it could be defined any other way.

If texture was a bottleneck texture could have increased at a different rate to ALU. This did not happen.

AlNom · Jul 18, 2008

ShaidarHaran said:
What do you mean the base ability doesn't change?

It's still a 4:1 ALU:TEX ratio.

3dilettante · Jul 18, 2008

Dave Baumann said:
You're assuming too much about the design constraints there. RV7xx still retains the 2D scalability of RV6xx, so it would have been easy to scale textures at a disproportionate rate to ALU's.

Neat. I think such a product would be an interesting data point.

ShaidarHaran · Jul 18, 2008

Chris123234 said:
In other words, you're saying that, should an rv770 be cut down (disabled?) to the same number number of units (in a 1:1 fashion, just as it was raised) to the rv670 and set to the same clocks, that the rv770 is going to show some specific improvements due to certain fixes? Or am I misunderstanding something?

Well there have been some uarch changes, specifically a reorganization of the TMUs, as well as a slight change in how they operate (no more point samplers) so we would see some minor changes, but that's not what I'm saying.

Again, I really don't know how to explain this to convince anyone without the aid of some relevant data from a profiling tool such as MS Pix. Let me try again:

The increase in shader processor count is irrelevant, since RV670 was already tex-bound and thus incapable of utilizing all of its SPs in most situations. It is the "base" increase in TMU count to 40 that has finally allowed the SPs to stretch their legs a bit more with a higher utilization rate (i.e. they're not sitting idle waiting on texture lookups).

The ratio is fine, the actual functional unit count is (was) the problem.

Dave Baumann · Jul 18, 2008

fellix said:
TMU quads in RV770 are a sort of scaled down versions of R600's ones (more precisely - with altered priorities), so in that regard there should be no "perfect" scaling across the board, depending on what is being stressed. But don't forget there is a significant improvements in the caching hierarchy, so a careful factoring of all these parameters is vital to take into account.

This is the point: comparing R600 and RV770 and pointing at the texture/shader core and using that to explain R600's performance is somewhat of a blind alley - there are more important areas to compare in RV770 that will yeild more fruitful results (and actually, its not necessarily ROPS because relative 3D engine:colour or 3D engine ratio:Z actually decreases with RV770).

ShaidarHaran · Jul 18, 2008

AlStrong said:
It's still a 4:1 ALU:TEX ratio.

Again, this has zero bearing on base ability. Base ability in this case = functional unit count.

Chris123234 · Jul 18, 2008

ShaidarHaran said:
Well there have been some uarch changes, specifically a reorganization of the TMUs, as well as a slight change in how they operate (no more point samplers) so we would see some minor changes, but that's not what I'm saying.

Again, I really don't know how to explain this to convince anyone without the aid of some relevant data from a profiling tool such as MS Pix. Let me try again:

The increase in shader processor count is irrelevant, since RV670 was already tex-bound and thus incapable of utilizing all of its SPs in most situations. It is the "base" increase in TMU count to 40 that has finally allowed the SPs to stretch their legs a bit more with a higher utilization rate (i.e. they're not sitting idle waiting on texture lookups).

The ratio is fine, the actual functional unit count is (was) the problem.

I think I see what you are saying, that there is some minimum baseline for the texture lookup functionality to not hinder everything. If so, then either you or Dave is wrong about the utilization being the same between the two architectures? If they are still the same, then nothing has been specifically "fixed". It's merely coincidence of a beefier chip.

ShaidarHaran · Jul 18, 2008

Chris123234 said:
I think I see what you are saying, that there is some minimum baseline for the texture lookup functionality to not hinder everything. If so, then either you or Dave is wrong about the utilization being the same between the two architectures?

This is precisely what I'm saying. In order to utilize all those SPs, the TMU count needed to increase. And yes, the utilization rate between RV670 and RV770 will show this if someone would be so kind as to take the time to do the testing.

If anyone feels they can help, please visit this thread: Call for testing

Chris123234 said:
If they are still the same, then nothing has been specifically "fixed". It's merely coincidence of a beefier chip.

I understand why you and Wavey share this perspective, but IMHO anytime you address a bottleneck would be cause to call that a correction.

Dave Baumann · Jul 18, 2008

ShaidarHaran said:
This is precisely what I'm saying. In order to utilize all those SPs, the TMU count needed to increase. And yes, the utilization rate between RV670 and RV770 will show this if someone would be so kind as to take the time to do the testing.

For a given app the workload does not change. If an architecture has a set number of execution units, but scales those execution units uniformly then the level of utilization of the overall architecture does not change when running that app.

3dilettante · Jul 18, 2008

Chris123234 said:
If they are still the same, then nothing has been specifically "fixed". It's merely coincidence of a beefier chip.

We have competing designs that show a lower ALU:TEX ratio can be competitive.
Going from 4:1 to 8:2 is not the same if workloads like having the 8 on the left side, but desperately needed something bigger than 1 on the right.

In a parallel universe where RV770 was designed as a smaller jump design-wise from RV670, either with 320 ALUs and 32 TMUs, or one with 640 ALUs and 16 TMUs, which one would have done better?

3dilettante · Jul 18, 2008

Dave Baumann said:
For a given app the workload does not change. If an architecture has a set number of execution units, but scales those execution units uniformly then the level of utilization of the overall architecture does not change when running that app.

Amdahl would disagree.

Rys · Jul 18, 2008

3dilettante said:
Amdahl would disagree.

Come on, we're talking about graphics here :!:

Wavey's assertion is correct given the context, and this thread is starting to suck. Cleaning it up...

Dave Baumann · Jul 18, 2008

3dilettante said:
Amdahl would disagree.

RV770 increases paralellism, it does nothing different for sequential processing in relation to R600. The only thing different that RV770 does here is increase the number of compeletely separate and unrelated threads that can be executed simulataneously.

In fact, with RV770's TMU's being tied to the SIMD's in the manner that they are the overall utilization on a per SIMD basis is even more transparent.

Geo · Jul 18, 2008

If you're going to cite any external-to-the-cores limiter (bandwidth, z-culling, thread management. . .whatever) which is what Amdahl's law would require re a gpu that is gpu-limited for a given app, then tell us which one you think it is.

3dilettante · Jul 18, 2008

It wasn't my intent to be that snarky.

It's just my contention that if a design is already really good at providing ALU resources, the relative impact of increasing texture throughput is higher.

I think RV770 would have done well even if it hadn't increased ALU counts as much as it did, not that I'm not jazzed about the extra math.

Geo · Jul 18, 2008

Rys said:
Come on, we're talking about graphics here

Wavey's assertion is correct given the context, and this thread is starting to suck. Cleaning it up...

Oh, I think Admdahl's Law would apply to gpus too, but you'd have to look at the non-parellized portions of the chip to find and point at your bottleneck.

Upcoming ATI Radeon GPUs (45/40nm)

3dilettante

Chris123234

Dave Baumann

Gamerscore Wh...

fellix

Dave Baumann

Gamerscore Wh...

AlNom

Moderator

3dilettante

ShaidarHaran

hardware monkey

Dave Baumann

Gamerscore Wh...

ShaidarHaran

hardware monkey

Chris123234

ShaidarHaran

hardware monkey

Dave Baumann

Gamerscore Wh...

3dilettante

3dilettante

Rys

Graphics @ AMD

Dave Baumann

Gamerscore Wh...

Geo

Mostly Harmless

3dilettante

Geo

Mostly Harmless