RV730 - where are the 32 TMUs?

Discussion in 'Architecture and Products' started by CarstenS, Sep 10, 2008.

  1. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Since Arun seems not to be online and i need to leave soon, a preliminary word on the results:

    A pattern is emerging, which closely resembles the behaviour of RV770.
     
  2. Abu85

    Newcomer

    Joined:
    Apr 29, 2008
    Messages:
    5
    Likes Received:
    0
    Are you guys sure the diagram is correct?
    What about then if the 80 way SIMD units are have only one TMU blokk? :wink:
     
  3. Tridam

    Regular Subscriber

    Joined:
    Apr 14, 2003
    Messages:
    541
    Likes Received:
    47
    Location:
    Louvain-la-Neuve, Belgium
    Clusters are 8-wide. I checked that with branching granularity testing.
    My testings show there are 32 texture units but only 16 interpolators.
     
  4. Abu85

    Newcomer

    Joined:
    Apr 29, 2008
    Messages:
    5
    Likes Received:
    0
    Just look at the RV770 official diagram:
    [​IMG]

    If this is correct we have 10 160 way SIMD units in the chip ... 1600SP :smile:
     
  5. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    Well yes but this is a wildly known error there. Plus, this is a "simple" error the overall structure of the chip is still correct.
     
  6. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    Makes a lot of sense. Though I begin to wonder why it isn't faster than 3850 - it's got plenty of improvements, I guess it's just memory bandwidth limited?
    Compared to rv670 it got:
    - twice the TF units (granted they are simpler and with float formats they won't be faster than the rv670 half-as-many units), same amount of TA units
    - half the amount of interpolators (16 vs 32)
    - better (half as large) branching granularity
    - half as many rop units, but probably doesn't really matter since z units have been beefed up so same amount of z tests, only color rops are really halved (and possibly only a quarter throughput in some fp16 render target cases?)

    The pcgh results though indicate (scales almost fully proportional with memory clock in some cases with the simulated 4650) it could indeed be quite bandwidth limited. Well I guess that's not really a huge surprise for a card with such a low bandwidth / computational resources ratio...
     
    #26 mczak, Sep 10, 2008
    Last edited by a moderator: Sep 11, 2008
  7. Tridam

    Regular Subscriber

    Joined:
    Apr 14, 2003
    Messages:
    541
    Likes Received:
    47
    Location:
    Louvain-la-Neuve, Belgium
    Yes RV730 FP16 blending rate is 0.25x the RV670 rate.
    I'm also wondering where is the bottleneck and I guess it's the memory bandwidth but it could also be that some internal buffers etc are smaller. That could reduce performances a little bit there and there.
     
  8. AnarchX

    Veteran

    Joined:
    Apr 19, 2007
    Messages:
    1,559
    Likes Received:
    34
  9. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    Oh yes, looks like even the hd4670 scales very well indeed with memory frequency. Poor 4650 which has to work with half that...
    Probably just not cost-effective in that market. At least from a pin count perspective, it should be possible (gddr5 requires some more pins than gddr3, but rv730 is larger than rv635), so it shouldn't be pad limited.
     
  10. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    RV740?

    Something else RV730 appears to be missing is the CrossFireX Sideport, no surprise there.

    Oh and RV670 has 32 interpolators:

    http://forum.beyond3d.com/showpost.php?p=1193433&postcount=184

    So it'd be no surprise if RV730 has a 2:1 interpolator:fragment ratio like its bigger brothers.

    Oh and if people scan down the ixbt page you'll see plenty of TEX-dominated shader tests. If that isn't all the evidence needed for 32 TUs, then I don't know what is.

    Jawed
     
  11. TheAlSpark

    TheAlSpark Moderator
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    22,146
    Likes Received:
    8,533
    Location:
    ಠ_ಠ
  12. wishiknew

    Regular

    Joined:
    May 19, 2004
    Messages:
    341
    Likes Received:
    9
    This thing runs circles around the R300 but only has 60% more bandwidth. Was it over kill all these years or just todays designs that much efficient in using it.
     
  13. swaaye

    swaaye Entirely Suboptimal
    Legend

    Joined:
    Mar 15, 2003
    Messages:
    9,045
    Likes Received:
    1,119
    Location:
    WI, USA
    R300 also only has ~110 million transistors @ ~320 MHz. ;)
     
  14. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    Well it is more efficient since buffer compression schemes got better. It's probably got larger caches etc. too.
    But as someone said, newer cards aren't so much about "more pixels" but rather "smarter pixels". The arithmetic part of a fragment shader doesn't really require any memory bandwidth...
     
  15. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    Too bad yes. However only by 9% whereas memory increased by 23%, and the performance improvement (in this game) is a lot more than the core speed increase. But you're right this will (likely) play some part too.
     
  16. ShaidarHaran

    ShaidarHaran hardware monkey
    Veteran

    Joined:
    Mar 31, 2007
    Messages:
    4,027
    Likes Received:
    90
    What's even more interesting is that the 4670 manages to beat even a 2900 XT in COD4 at just about any settings, maintaining only a single fps difference @ 1920x1200 w/4xAA. Same for ET QW.

    So 128-bit $80 card beats 512-bit $400 card. Yeah... I think we can go ahead and call the 2900 XT a bust now ;)
     
    #36 ShaidarHaran, Sep 11, 2008
    Last edited by a moderator: Sep 11, 2008
  17. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    The interpolator issue isn't much of a problem. Remember that with only 8 ROPs, at peak pixel throughput we have two Vec4 interpolators. Texture fetches sometimes use the same interpolator, and dependent fetches don't need any. Finally, even when you are interpolator limited (usually it only happens in old code written when interpolation ability was never an issue, as G80 would have changed the mindset of devs), you can still use the texture units for faster filtering.

    As for BW, I'd expect the 4670 to be BW limited 40-50% of the time, i.e. a 20% mem overclock (without a core overclock) would net you 8-10% more fps. When you're building a budget card, though, there isn't much else you can do.
     
  18. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    It's a bit unfair to compare price due to the difference in launch dates. Still, the 4670 has fewer transistors and is 1/3 the size, which is a far greater reduction than 80nm -> 55nm would allow.

    More telling is RV635 vs. RV730. The latter is <25% bigger and on the same process too, but probably over twice as fast. ATI's low end was really crap last gen.
     
  19. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,455
    Likes Received:
    471
    ...and still very competitive to nVidias products (in price/performance, performance/watt and performance/square mm, too)
     
  20. swaaye

    swaaye Entirely Suboptimal
    Legend

    Joined:
    Mar 15, 2003
    Messages:
    9,045
    Likes Received:
    1,119
    Location:
    WI, USA
    Yeah they priced them properly at least. But 2400/2600/3450/3650 really didn't offer performance that was worth paying for IMO. Their primary benefits were HDV playback and power consumption, but a gamer could easily pick a better card from the previous generation.

    8600GT was a better card for gaming, even if it too was somewhat of a disappointment as a new mid-range card. Once 8600GT hit $100, I thought it became a pretty good deal in the market of 12 months ago or so. RV670 changed that eventually, of course.
     
    #40 swaaye, Sep 11, 2008
    Last edited by a moderator: Sep 11, 2008
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...