RV730 - where are the 32 TMUs?

Since Arun seems not to be online and i need to leave soon, a preliminary word on the results:

A pattern is emerging, which closely resembles the behaviour of RV770.
 
Are you guys sure the diagram is correct?
What about then if the 80 way SIMD units are have only one TMU blokk? ;)
 
Clusters are 8-wide. I checked that with branching granularity testing.
My testings show there are 32 texture units but only 16 interpolators.
 
Just look at the RV770 official diagram:
rv770_block.jpg


If this is correct we have 10 160 way SIMD units in the chip ... 1600SP :smile:
 
Clusters are 8-wide. I checked that with branching granularity testing.
My testings show there are 32 texture units but only 16 interpolators.
Makes a lot of sense. Though I begin to wonder why it isn't faster than 3850 - it's got plenty of improvements, I guess it's just memory bandwidth limited?
Compared to rv670 it got:
- twice the TF units (granted they are simpler and with float formats they won't be faster than the rv670 half-as-many units), same amount of TA units
- half the amount of interpolators (16 vs 32)
- better (half as large) branching granularity
- half as many rop units, but probably doesn't really matter since z units have been beefed up so same amount of z tests, only color rops are really halved (and possibly only a quarter throughput in some fp16 render target cases?)

The pcgh results though indicate (scales almost fully proportional with memory clock in some cases with the simulated 4650) it could indeed be quite bandwidth limited. Well I guess that's not really a huge surprise for a card with such a low bandwidth / computational resources ratio...
 
Last edited by a moderator:
Yes RV730 FP16 blending rate is 0.25x the RV670 rate.
I'm also wondering where is the bottleneck and I guess it's the memory bandwidth but it could also be that some internal buffers etc are smaller. That could reduce performances a little bit there and there.
 
Oh yes, looks like even the hd4670 scales very well indeed with memory frequency. Poor 4650 which has to work with half that...
It would be nice to see RV730 with GDDR5, but I have the feeling, that MC does not support it.
Probably just not cost-effective in that market. At least from a pin count perspective, it should be possible (gddr5 requires some more pins than gddr3, but rv730 is larger than rv635), so it shouldn't be pad limited.
 
It would be nice to see RV730 with GDDR5, but I have the feeling, that MC does not support it.
RV740?

Something else RV730 appears to be missing is the CrossFireX Sideport, no surprise there.

Oh and RV670 has 32 interpolators:

http://forum.beyond3d.com/showpost.php?p=1193433&postcount=184

So it'd be no surprise if RV730 has a 2:1 interpolator:fragment ratio like its bigger brothers.

Oh and if people scan down the ixbt page you'll see plenty of TEX-dominated shader tests. If that isn't all the evidence needed for 32 TUs, then I don't know what is.

Jawed
 
This thing runs circles around the R300 but only has 60% more bandwidth. Was it over kill all these years or just todays designs that much efficient in using it.
 
This thing runs circles around the R300 but only has 60% more bandwidth. Was it over kill all these years or just todays designs that much efficient in using it.
R300 also only has ~110 million transistors @ ~320 MHz. ;)
 
This thing runs circles around the R300 but only has 60% more bandwidth. Was it over kill all these years or just todays designs that much efficient in using it.
Well it is more efficient since buffer compression schemes got better. It's probably got larger caches etc. too.
But as someone said, newer cards aren't so much about "more pixels" but rather "smarter pixels". The arithmetic part of a fragment shader doesn't really require any memory bandwidth...
 
The GPU core speed was also increased...
Too bad yes. However only by 9% whereas memory increased by 23%, and the performance improvement (in this game) is a lot more than the core speed increase. But you're right this will (likely) play some part too.
 
What's even more interesting is that the 4670 manages to beat even a 2900 XT in COD4 at just about any settings, maintaining only a single fps difference @ 1920x1200 w/4xAA. Same for ET QW.

So 128-bit $80 card beats 512-bit $400 card. Yeah... I think we can go ahead and call the 2900 XT a bust now ;)
 
Last edited by a moderator:
The interpolator issue isn't much of a problem. Remember that with only 8 ROPs, at peak pixel throughput we have two Vec4 interpolators. Texture fetches sometimes use the same interpolator, and dependent fetches don't need any. Finally, even when you are interpolator limited (usually it only happens in old code written when interpolation ability was never an issue, as G80 would have changed the mindset of devs), you can still use the texture units for faster filtering.

As for BW, I'd expect the 4670 to be BW limited 40-50% of the time, i.e. a 20% mem overclock (without a core overclock) would net you 8-10% more fps. When you're building a budget card, though, there isn't much else you can do.
 
What's even more interesting is that the 4670 manages to beat even a 2900 XT in COD4 at just about any settings, maintaining only a single fps difference @ 1920x1200 w/4xAA.

So 128-bit $80 card beats 512-bit $400 card. Yeah... I think we can go ahead and call the 2900 XT a bust now ;)
It's a bit unfair to compare price due to the difference in launch dates. Still, the 4670 has fewer transistors and is 1/3 the size, which is a far greater reduction than 80nm -> 55nm would allow.

More telling is RV635 vs. RV730. The latter is <25% bigger and on the same process too, but probably over twice as fast. ATI's low end was really crap last gen.
 
...and still very competitive to nVidias products (in price/performance, performance/watt and performance/square mm, too)
Yeah they priced them properly at least. But 2400/2600/3450/3650 really didn't offer performance that was worth paying for IMO. Their primary benefits were HDV playback and power consumption, but a gamer could easily pick a better card from the previous generation.

8600GT was a better card for gaming, even if it too was somewhat of a disappointment as a new mid-range card. Once 8600GT hit $100, I thought it became a pretty good deal in the market of 12 months ago or so. RV670 changed that eventually, of course.
 
Last edited by a moderator:
Back
Top