ATI's idea on transistor budgets

Discussion in 'Architecture and Products' started by superguy, Feb 22, 2006.

  1. superguy

    Banned

    Joined:
    Jan 27, 2006
    Messages:
    472
    Likes Received:
    9
    I violently trashed ATI for their whole decoupled TMU's strategy. The performance of a 48 pipe part did not seem to be there and I blamed a lack of TMU's. However, a huge silver lining just occured to me.

    It seems they saved massive transistors.

    They went from 16 pixel shader pipes to 48 with just 63m more transistors.
    Nvidia took 68 million to go up just 8 pipes.

    I'm thinking ATI can now scale math power at will with much smaller transitor loss. Perhaps due at least in part to that they can target it narrowly. Going forward this will only increase the relative strength of X1900 series.

    Smart, perhaps very smart.
     
  2. _xxx_

    Banned

    Joined:
    Aug 3, 2004
    Messages:
    5,008
    Likes Received:
    86
    Location:
    Stuttgart, Germany
    ...and mentioned like 5,000 times already :)
     
  3. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,499
    Likes Received:
    414
    Location:
    Varna, Bulgaria
    I'm glad, that you finally invented the boiled water (for yourself). :D

    And the story doesn't end just to the plain transistor count but a general marchitecture approach. ;)
     
  4. superguy

    Banned

    Joined:
    Jan 27, 2006
    Messages:
    472
    Likes Received:
    9
    Mentioned where? I had not seen it. I would be interested in that discussion.

    Anyways are we sure there's a huge size difference between ATI and Nvidia pipes? The only difference being the non-decoupled TMU on a Nvidia pipe. And well I guess the mini-Alu's may actually be a transistor performance loss not win for Nvidia when contrasted with just adding more pipes.
     
  5. RejZoR

    Regular

    Joined:
    May 9, 2004
    Messages:
    300
    Likes Received:
    3
    Location:
    Europe\Slovenia\Ljubljana
    I'll say once again.

    NV PS != ATi PS
     
  6. _xxx_

    Banned

    Joined:
    Aug 3, 2004
    Messages:
    5,008
    Likes Received:
    86
    Location:
    Stuttgart, Germany
    It was mentioned in some of the countless R520/R580 threads, there were also some interviews with ATI where they said that the new momry controller is the best thing since sliced bread, since it will enable them to expand on the architecture and that it was their "investment in the future" etc. You may try searching, but not much more than that was mentioned.
     
  7. yacoub

    Newcomer

    Joined:
    Feb 13, 2006
    Messages:
    25
    Likes Received:
    0
    So let me ask this in this thread then: Is the 7900 series release by NVidia essentially the last high-end DX9 release or will ATI come back with an R590 release as a refresh on the X1900 series? Or is the R590 only for new lower-end cards like an X1900XL and perhaps an X1800GTO?
     
  8. rwolf

    rwolf Rock Star
    Regular

    Joined:
    Oct 25, 2002
    Messages:
    968
    Likes Received:
    54
    Location:
    Canada
    ATI doesn't have 48 shader pipelines. It has 16. There are now 48 pixel shader PROCESSORS. ie. Three pixel shader processors per pipe.
     
  9. stepz

    Newcomer

    Joined:
    Dec 11, 2003
    Messages:
    66
    Likes Received:
    3
    Define shader pipeline.
     
  10. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    Hrm, here's one thing I've been wondering: are the three shader processors in ATI's pixel shader pipelines independent of one another? That is to say, do you need three ALU ops in a row to keep them all full? Would Tex, ALU, Tex, ALU force the 2/3rds of the ALU pipelines to remain idle (assuming ALU ops are dependent on the texture results, of course), or can ATI fill the units independently from different threads?
     
  11. Humus

    Humus Crazy coder
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,217
    Likes Received:
    77
    Location:
    Stockholm, Sweden
    No, they can work "independently" on 48 different pixels.
     
  12. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    Ah, thanks. Well, then, it's rather pointless to talk about pixel pipelines at all with the R5xx architecture. It just doesn't have them.

    It has arrays of units, each of which is pipelined, of course, but independently-addressable.

    A pipeline is a different beast entirely: data flows through a pipeline in sequence, with various bits of work done along the way.

    It really is time to throw all of this nomenclature out the window and just look at performance.
     
  13. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,436
    Likes Received:
    264
    Same thread, different/parallel pixels.
     
  14. Shadowmage

    Newcomer

    Joined:
    Sep 30, 2005
    Messages:
    60
    Likes Received:
    3
    Let's not forget that the R580 features larger Z-buffers as well, which is also in the transistor count.

    EDITED for typo
     
    #14 Shadowmage, Feb 23, 2006
    Last edited by a moderator: Feb 23, 2006
  15. Geo

    Geo Mostly Harmless
    Legend

    Joined:
    Apr 22, 2002
    Messages:
    9,116
    Likes Received:
    213
    Location:
    Uffda-land
    C'mon now, respect the power of epiphany! :lol: That blinding flash of light comes when it will. . .
     
  16. JF_Aidan_Pryde

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    601
    Likes Received:
    3
    Location:
    New York
    This thread is shared between 12 pixel shaders, not 16 right? That was the impression I got from Beyond3D's review; four dispatch processors each outing to 12 shader cores which themselves are grouped as quads.
     
  17. ManicOne

    Newcomer

    Joined:
    Feb 9, 2006
    Messages:
    7
    Likes Received:
    0
    Yup; In truth R580 does not have 3 ALUs per TMU-ROP pipe, but rather 12 quads. The 3:1 ratio is purely the total ALUs:Total TMU-ROPs.
     
  18. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,436
    Likes Received:
    264
    R580 has 3x the thread/batch size of R520. I think that's what you're asking.
     
  19. JF_Aidan_Pryde

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    601
    Likes Received:
    3
    Location:
    New York
    I meant to confirm: given a thread, how many shader cores are working on it at once. For the R580, it should be 12 shader cores (3 quads). For the G70 it's 4 shader cores (1 quad).

    So the R580 has four threads active at anytime, with a maximum of 512 in flight.
    The G70 has six threads active at a time, with a maximum of 'hundreds' (according to NV).

    Both are SIMD architectures; for a given clock, all active threads are executing the same shader program.

    Am I interpreting the two architectures correctly?
     
  20. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    Yes. Although if you include texturing, then it's possible for each's shader core's corresponding texture pipe to be working on a different thread - hence 8 concurrent threads are possible.

    Six threads, yes, but it's always just six. There's 1024 fragments in each thread.

    Each of R5xx's quad-shader cores has its own shader state and runs independent of the other quad-shader cores.

    While each of G70's quads has its own shader state, the scan conversion assigns fragment-quads to shader-quads in a round-robin fashion. So two adjacent fragment-quads on a single triangle will be shaded by two different shader-quads:

    11223344
    11223344
    556611
    556611
    2233
    2233
    44
    44

    Though the pattern is prolly more fiendish than that! I don't know how to take account of G70's ability to shade multiple triangles lumped together in one thread. It prolly walks one triangle at a time though.

    Most of the time, with shaders that have no dynamic branching, all G70's shader-quads will progress together.

    Jawed
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...