AMD: Southern Islands (7*** series) Speculation/ Rumour Thread

Discussion in 'Architecture and Products' started by UniversalTruth, Dec 17, 2010.

  1. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    19,426
    Likes Received:
    10,320
    First, AMD and Nvidia count transistors differently. So you can't just directly compare transistor counts.

    And yes, AMD packs more AMD transistors per MM^2 than Nvidia does with Nvidia transistors.

    And different things can be packed more densely than other things. So just differences in architecture can mean one IHV can pack more of a certain "thing" per mm^2 than the other.

    As one example, I believe AMD packs their ALU's much more densely than Nvidia does. But they are also less capable per ALU I believe.

    So really, transistors are largely meaningless when comparing the two. Die size is more meaningful but that isn't even terribly meaningful. Part of why GF110 is so much larger than Cayman is that it devoted a lot more die area to compute capabilities and other things that are relatively meaningless for graphics workloads.

    Regards,
    SB
     
  2. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    Basically if you want a higher clock, you have to increase your pipeline length, because you have less time to process each pipeline stage. Longer pipeline of course requires more transistors.

    A really simple example:
    Building a car takes 16 hours to complete. If you have a "pipeline" with single stage, you can clock the pipeline at one clock per 16 hours. If you now split the car manufacturing for example to 5 stages (structure = 4 hours, tires/drivedrain/suspension = 3 hours, internals = 4 hours, windows/lights = 3 hours, painting = 2 hours) you can clock the pipeline at one clock per 4 hours (time required to complete the stage with the longest time). So in this case the longer pipeline gives you 4x thoughput (4x more cars ready in the same time). Increased pipeline length causes additional latency. In the example with the 5 stage pipeline one car takes 4*5 = 20 hours to complete (from start to beginning). Moving work from pipeline stage to the next of course also takes some time (so pipeline stages have slightly less time than a full clock to do real work). More stages = more wasted time (and wasted transistors) on doing things that do not contribute to the result.

    Pipeline length is only one factor of transistor budget. NVidia's cards have considerably higher 64 bit floating point calculation thoughput than last generation AMD products, and have more sophisticated caches targeted towards high performance GPU computing. Things like these take transistors as well.
     
  3. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    No surprise there given the 50% bandwidth increase. It's only natural that some of us expected more with a new architecture.
     
  4. flopper

    Newcomer

    Joined:
    Nov 10, 2006
    Messages:
    150
    Likes Received:
    6
    I would assume if computing wasnt needed they could have built a faster card but then also face down the line a lack of features for the professional arena.
    a ton of money is made there and if the new gcn series adress that field with good tech and features, the series is a winner no matter what we gamers think ;)
     
  5. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,541
    Likes Received:
    964
    As do I, this would be quite embarrassing! :D

    But seriously, I mostly hope it's more than a few boxes. Of course, at $550, demand might not be all that high.
     
  6. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
  7. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,360
    Likes Received:
    1,377
    Small addendum: Optimum pipeline length will depend heavily on the demands of the software you're running, and the amount of resources you dedicate to alleviating the effects of pipeline stalls/flushes. Not only that, but as Sebbbi implied above, silicon process is also a major factor. Still, x86 CPUs we've seen from the Pentium Pro in 1995 and onwards have fluctuated back and forth between 10 and up to 25 pipeline stages. Which, all things considered, isn't much given that for instance the number of transistors on a CPU has grown with almost a factor 1000.

    Specific application areas, such as graphics used to be, could have drastically different optimum pipeline length due to the limited scope of the target code, allowing the processor to be tailored to its task.
     
  8. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,360
    Likes Received:
    1,377
    Uhm, no, not really. nVidia as opposed to AMD does get some nice revenue from their Quadro/Tesla, to the tune of $200million per quarter. However, that would seem to be strongly dominated by the Quadro (graphics) products rather than Tesla (computation). Compared to the overall computer market, or even graphics market, computation revenue is very, very small. There just isn't that much gold in those hills. Pretty safe market, but volumes are pitiful, and there is no way that computation has been even close to paying for its engineering and software costs for the graphics vendors. It's an investment. (Paid by gamers, ironically.)
     
  9. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,213
    I presume the problem is not entirely fixed , not yet any way .
     
  10. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    If AMD can't reproduce the issue in a proper manner, then they really wouldn't have a clue what and how to fix, no matter the amount of user complaints.
     
  11. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    [How many years have we been doing this...?] in the absence of other changes 50% improvements in bandwidth or core clocks rarely translate into linear scaling, in fact it's generally in the range of half, or a little more, of the improvement. Given that there is a 33% improvement in execution units to be seeing this performance differential is showing that the architecture is benefitting games very well already. Add to this cases where things are genuinely doubled or more, such as tessellation and various compute cases, it's clear that the architecture is doing what it says on the tin; the fact that there is more to come through better undersanding is just gravy.
     
  12. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,360
    Likes Received:
    1,377
    Rephrasing a bit, what you are saying is that given modern benchmarking codes, the cards are not wholly limited by either memory bandwidth nor ALU capabilities, meaning that improvements in either of those areas typically do not achieve linear scaling. And that this further implies that the cards are at least reasonably balanced - in order to yield a linear scaling, they would have had to be totally limited by that feature, which in turn would have indicated a design that was quite unbalanced in relation to the benchmarking load.
    Makes sense.

    However, if the improvement is less than either of the improvements in base capabilities would suggest, this would imply that there is either other architectural factors coming into play that a very simplistic analysis cannot take into account, or that the software layer still isn't quite as mature as for the previous architecture. To be honest, I haven't seen much of that, but those would be the interesting cases to dig deeper into. Or for that matter, on a positive note, where the improvement is greater than either! :)
     
  13. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,059
    Likes Received:
    3,119
    Location:
    New York
    We've already seen examples of massive improvements in compute workloads that far exceed the theoretical numbers. I don't share Dave's view that Tahiti's current scaling in games is that great though. A 35-40% improvement from a 40-50% bump in theoreticals on a brand new architecture isn't something to brag about. Geometry got an even bigger boost. GCN could mature to 50-60% faster than Cayman on avg.
     
  14. swaaye

    swaaye Entirely Suboptimal
    Legend

    Joined:
    Mar 15, 2003
    Messages:
    9,045
    Likes Received:
    1,119
    Location:
    WI, USA
    OMG. Demers feeds the conspiracy theory of Intel having alien process technology! ;)
     
  15. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    I have strong suspition that all those Heaven benchamrks are somewhat tampered by the driver's tessellation profiling. I saw one test with Nvidia's Island demo, and the numbers were much less optimistic. Does Catalyst already ships with TS factor profiles, anyway?
     
  16. Broken Hope

    Regular

    Joined:
    Jul 13, 2004
    Messages:
    483
    Likes Received:
    1
    Location:
    England
    With AMD's current GPU pricing strategy I dread to think what the 7990 is going to cost in the UK, I'm already expecting the 7970 to be around £450-500, so that will put the 7990 at around £800-1000, stupid pricing.

    I guess the only hope is that Nvidia drop the price on the GTX 580 and force AMD to do the same.
     
  17. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    Trini - from a "big ticket" point of view only bandwidth increases 50%. CU's is 33% more, ROPs are the same number, vertex engines are the same number and clock delta is small.

    Sent from my SGH-i917R using Board Express
     
  18. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,059
    Likes Received:
    3,119
    Location:
    New York
    I understand but I factored in the clock increase too.
     
  19. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    When I last ran Heaven (2.5), Tahiti was faster than GF110 - and we had the tessellation switch moved away from AMD optimized (I actually think, "application decides" and 64x max are about the same - you get what the app requests).
     
  20. Tridam

    Regular Subscriber

    Joined:
    Apr 14, 2003
    Messages:
    541
    Likes Received:
    47
    Location:
    Louvain-la-Neuve, Belgium
    Mmm with my sample I could also go to quite high clocks.... but it would be stable only in very high power consumption cases such as 3DMark and Furmark... which means reduced clock through PowerTune (even in set to +20%). To get stability in games where the power consumption was not hitting the limit I had to go back to 1075 MHz which is still nice.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...