AMD: R7xx Speculation

Discussion in 'Architecture and Products' started by Unknown Soldier, May 18, 2007.

Thread Status:
Not open for further replies.
  1. Sound_Card

    Regular

    Joined:
    Nov 24, 2006
    Messages:
    936
    Likes Received:
    4
    Location:
    San Antonio, TX
    That was a mere 24 million transistors(and in todays GPU's means a lot less) coming from a full node step downwards(110nm ---> 90nm)mainly shortening the pipeline because of the higher clocks 90nm allows(faster switching). So I hardly think a half step(65nm---> 55nm) will yield the same regardless if the same to what happened to G71 is even applicable to G94 to begin with.
    And control logic is somehow not counted in the transistor budget? Which brings me right back to my question that I asked Arun, is their some sort of differences between the actual transistor sizes them selves(theoretically speaking if G94 was on 55nm, but the comparison does not have to be just RV670 and G94) or does chip layout play a significant role here?

    Twice the filtering and ?address units?, less samplers, same amount of ROPs(or pixel capability that is), not exactly sure what you mean by more "robust" TMU's either, and not exactly sure how that even fits in with the topic at all.
     
  2. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,059
    Likes Received:
    3,119
    Location:
    New York
    Less samplers?
     
  3. Farhan

    Newcomer

    Joined:
    May 19, 2005
    Messages:
    152
    Likes Received:
    13
    Location:
    in the shade
    It really depends. On smaller processes you probably spend more transistors on repeaters for wires.


    Why would it be less in today's GPUs? GPU clocks and functionality have been growing even with faster transistor switching speeds. Also the speed improvement from scaling is slowing down these days. In the G70->G71 case it could just be that they were very conservative with the G70 and less so with the G71. Or they could have engineered faster math circuits. Or some combination of the above. It's not just a function of the manufacturing process.


    The transistor density could certainly be different for different designs, and it could also be a choice for yield optimization. The smallest/densest layout may not have the best yields.
     
  4. Sound_Card

    Regular

    Joined:
    Nov 24, 2006
    Messages:
    936
    Likes Received:
    4
    Location:
    San Antonio, TX
    FP32 texture sampling units. I'm not sure if they are coupled with the address units on G80 and which would be no more than 64 or if they even exist at all on G80!!! But in any case, R600/RV670 have 20 FP32 Texture samplers per texture block for a total of 80.

    I was talking about the diminishing importance of 24 million transistors on a 500m chip as compared to a 304m chip and never mind the node differences!

    Thanks, thats seems like a reasonable perspective and makes plenty sense.
     
  5. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,059
    Likes Received:
    3,119
    Location:
    New York
    Those 80 samplers correspond to 80 texels per clock retrieved giving a total of 16 bilerps and 16 point samples. Now consider how many texels G80 retrieves per clock in order to produce 64 bilerps. So in terms of "samplers" G80 has far more than R600. Granted each sampling unit on R600 is a bit beefier as it does full speed FP16 but G80 more than makes up for that by having four times as many.
     
  6. Sound_Card

    Regular

    Joined:
    Nov 24, 2006
    Messages:
    936
    Likes Received:
    4
    Location:
    San Antonio, TX
    :???: Thanks.. I was strangely confused.
     
  7. Farhan

    Newcomer

    Joined:
    May 19, 2005
    Messages:
    152
    Likes Received:
    13
    Location:
    in the shade
    Ah, sorry, i misunderstood that.
    If you look at it in terms of reducing pipeline stages however, a single pipeline stage in a that 500m gpu would probably have more transistors in it than the 300m chip.
     
  8. Sound_Card

    Regular

    Joined:
    Nov 24, 2006
    Messages:
    936
    Likes Received:
    4
    Location:
    San Antonio, TX
  9. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY

    sorry bout that the last sentence was like what was running around in my mind, wasn't too coherent ;)
     
  10. LordEC911

    Regular

    Joined:
    Nov 25, 2007
    Messages:
    877
    Likes Received:
    208
    Location:
    'Zona
    Finalized RV770 specs?

    [​IMG]
    [​IMG]
    [​IMG]

    Source- VrZone

    I'm liking the clocks and like everyone else said, these seem a bit more realistic than the 800SP rumor.
     
  11. ECH

    ECH
    Regular

    Joined:
    May 24, 2007
    Messages:
    692
    Likes Received:
    30
    we await final specs
     
    #931 ECH, Mar 28, 2008
    Last edited by a moderator: Mar 28, 2008
  12. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,245
    Likes Received:
    4,465
    Location:
    Finland
    Aren't those the EXACT same as posted ages ago by few sites?
     
  13. w0mbat

    Newcomer

    Joined:
    Nov 18, 2006
    Messages:
    234
    Likes Received:
    5
  14. aca

    aca
    Newcomer

    Joined:
    May 4, 2007
    Messages:
    44
    Likes Received:
    0
    Location:
    Delft, The Netherlands
    yep.
     
  15. itaru

    Newcomer

    Joined:
    May 27, 2007
    Messages:
    156
    Likes Received:
    15
    Perhaps, it is a fake.
     
  16. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    I did a possible configuration based on this rumour back in February:

    [​IMG]

    In the past I've described it as 12 SIMDs. I dislike this idea because that's a lot of control overhead and results in relatively coarse-grained redundancy (60 redundant ALU lanes as compared with 20 in RV670). Alternatively, I suppose, it's possible to implement it as 4 SIMDs - each set of 96 SPs sharing a program counter. That would have 20 redundant ALU lanes - but now the issue is the batch size of 96...

    This arrangement is the same type as seen inside R580, where each SIMD is 3 quad ALUs (12 pipes) sharing a single TMU.

    So, as a 4 SIMD design I'm not unhappy. Still a bit dubious about it being a 3:1 ALU:TEX ratio, though.

    Jawed​
     
  17. whocares

    Newcomer

    Joined:
    Mar 28, 2008
    Messages:
    25
    Likes Received:
    0
    Today rumour from chiphell says :

    RV770 final specifications
    480SP (RV670 320)
    Framework used R600, 4D +1 D and D for every 96 (RV670 every 64 D)
    32TMU (RV670 than doubled)
    Frequency 800 ~ 900MHz, depending on the final outcome of TSMC volume production scheduled listing price (RV670 reference listed prices)
    Finally tell you that the version of RV770-how do not think it is RV670 twins, the future price trend can also RV670 reference to the current series.

    4D+1D looks like a Xenos core design . But other rumours reject all speculative RV770 specs so far .
     
  18. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    Well, I certainly won't complain for the doubled bilerp rate. ;)
    The million dol... euro question here is how the batch preprocessing is done, at the top level. Or may be, there will be two-level "distributed" design. ATi really loves round square based structures, here. ;)
     
  19. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    I interpret that last "D" as a reference to the sequencer, i.e. a sequencer controls groups of 96 SPs, while in R670 a sequencer controls groups of 64 SPs.

    The 1D in Xenos is a transcendental unit, unable to do MAD, with 2 instructions per clock. In R600 and all later GPUs it's 5D MAD with one lane also doing transcendental (and extra integer instructions), making upto 5 different instructions per clock.

    Jawed
     
  20. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    What do you mean by batch preprocessing?

    In R600 there's an interesting hierarchy of processors:

    Code:
             Sequencer
             |   |   |
        ------   |   -------
        |        |         |
       ALU    Vertex    Texture
    A shader program consists of Sequencer instructions, with some Sequencer instructions being calls to subroutines of type ALU, Vertex or Texture. So you can think of the "shader" as being a network of four types of programmable processor.

    Jawed
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...